In this course you will learn how to create models for decision making. We start with cluster analysis, a data reduction technique that is very useful for market segmentation. Next, learn the basics of Monte Carlo simulation. This helps model the uncertainty inherent in many business decisions.

### Business Analytics for Decision Making Week 01 Quiz Answers

Q1. Which of the following is true of cluster analysis?

- It is a data analysis technique to discover trends in time-series data
**It is a data mining tool that is used to create homogeneous groups**- It is a data visualization tool in market research
- It is model for customer behavior in the organic and natural products industry

Q2. Which of the following settings are appropriate applications of cluster analysis? (select all that apply)

**A recommender system that seeks to predict the rating or preference that a user would give to an item (e.g., a movie, a book, or a restaurant).****A delivery scheduling system that assigns delivery trucks to customers in the same general**

geographical area**A cable company seeking to identify the number and type of TV packages to offer (e.g.,**

Basic, Sports, Entertainment, or Premium).- An inventory management system for retail pharmacies that attempts to minimize both the probability of

running out of stock and the inventory carrying cost.

Q3. Which of the following statements is true of principal component analysis (PCA) and cluster analysis?

- PCA and cluster analysis are incompatible techniques, only one of them can be applied to the same data
- PCA is a data reduction technique and cluster analysis is a dimensionality reduction technique
**Cluster analysis is a data reduction technique and PCA is a dimensionality reduction technique**- The main goal of cluster analysis is to identify redundant variables and the main goal of PCA is to create homogeneous groups of observations

Q4. Cluster analysis is considered an unsupervised learning technique because it operates on historical observations that are not labeled. That is, it is not known to which group historical observations belong and therefore it is not known how many groups there are.

**True**- False

Q5. If the Euclidean distance were to be represented in a right angle triangle, which of the following would be considered the distance between two objects of a cluster?

**Hypotenuse**- Small leg
- Long leg
- Average of the sum of both legs

Q6. Which of the following is the definition of distance between two clusters in a complete linkage clustering?

- The average of distances between all pairs of objects, where each pair is made up of one object of each group
**The distance between the most distant pair of objects, one from each group**- The sum of squares of the distance between clusters
- The distance between the value of the shortest link between the clusters

Q7. Which of the following is true of hierarchical clustering?

- All clusters must have the same number of objects
- No single cluster can have all objects
**Each step of the procedure consists of merging the two closest clusters**- All clusters must have more than one object in them

Q8. Which of the following is true of clustering methods?

- The k-means method is an exact procedure that finds the optimal (i.e., the best)
- The best clustering approach when dealing with very large data sets is to solve the optimization problem using Excel’s Solver
- The k-means method and hierarchical clustering always arrive at the same solution, that is, they always produce the same set of clusters
**Finding the best set of clusters is complicated because the number of ways of partitioning the observations into k groups is very large and this is why approximation methods such as k-means and hierarchical clustering are used.**;

#### Quiz 2

Q1. Assignment Overview

In this assignment you will practice what we learned in video 5 of this module. In Part 1 of the assignment, which is optional, you will be provided with a set of demographic data on 49 of America’s largest cities and will have an opportunity apply k-means clustering to city groups for marketing purposes. In the Part 2 of the assignment, you will be asked a series of questions that will prompt you to describe demographic structure of the clusters, and identify cities where to conduct a test for a new product.

Assignment Prompt

A large consumer goods company wants to select 4 U.S. cities where to test a new product. The company wants each city to represent a particular market segment, as defined by their demographic structure. The company has collected demographic data on 49 of America’s largest cities (see the Cities Excel file below). The demographic data consist of six attributes: 1) percentage of African-American population (% Black), 2) percentage of Hispanic population (% Hispanic), 3) percentage of Asian-American population (% Asian), 4) median age, 5) unemployment rate, and 6) per capita income.

Cities

XLSX File

Download file

Part 1 (optional): Practice using k-means clustering in XLMiner. If you have access to Analytic Solver Platform (which contains XLMiner) and you want to practice using it, proceed with this part of the assignment. Otherwise, skip to Part 2.;

You’ll need to download and work with the Cities.xlsx file to complete this assignment. The file has 3 worksheets: Data, KMC_Output, and KMC_Clusters. You will be referred to different worksheets throughout the assignment.

From the Data worksheet, use the k-means clustering tool in XLMiner to identify 4 city groups that the company can treat as market segments. Use the following settings in the XLMiner Cluster Tool: check normalized input data, set # clusters to 4, set # iterations to 50, set random starts to 10, set seed to 951. Also check both output options: show data summary and show distances from each cluster center. XLMiner will produce two additional worksheets: KMC_Output1 and KMC_Clusters1.

Once you complete this part of the assignment, your output with normalized coordinates should look similar to the output presented in Part 2 below, but the numbers won’t be exactly the same.

Part 2: Analyze clusters. The next few questions will prompt you to describe the demographic structure of the clusters (groups of cities) found by the k-means clustering tool in XLMiner. As we discussed in this module, the Normalized Coordinates table (created by the XLMiner in the KMC_Output worksheet) can be used to interpret each cluster. Recall that a normalized value of zero represents the average. Therefore, “above average” is indicated by a positive normalized value and “below average” is indicated by a negative normalized value. For this analysis, we will consider that a value of 1 or larger represent a significant positive deviation from the average and a value of -1 or smaller represents a significant negative deviation from the average. The Normalized Coordinates in the KMC_Output worksheet are:

Which cluster represents cities with no particular dominant minority group, with average age, employment rate, and income?;

- Cluster 1
- Cluster 2
**Cluster 3**- Cluster 4

Q2. Which cluster consists of Hispanic cities with high

unemployment rate and low per capita income.

**Cluster 1**- Cluster 2
- Cluster 3
- Cluster 4

Q3. Which

cluster includes cities with a large population of African-Americans.

- Cluster 1
**Cluster 2**- Cluster 3
- Cluster 4

Q4. The company would like to choose one city to represent each market in order to test

the new product. As discussed in the module, a representative object for a

cluster could be chosen as the one that is closest to the centroid. The

worksheet KMC_Clusters generated by XLMiner contains a table with the distances

from each city to the centroid of each cluster. To identify the city to

represent each cluster, we just need to find the city with the minimum distance

to each of the centroids. Which cities

would you recommend to choose to represent each cluster?

- Cluster 1: Seattle, Cluster 2: Memphis,

Cluster 3: Las Vegas, and Cluster 4: San Antonio **Cluster 1: San Francisco, Cluster 2: Philadelphia, Cluster 3: Toledo, and Cluster 4:**

Los Angeles- Cluster 1: San Francisco, Cluster 2:

Philadelphia, Cluster 3: Omaha, and Cluster 4: Los Angeles - Cluster 1: San Jose, Cluster 2: Detroit,

Cluster 3: Las Vegas, and Cluster 4: El Paso

#### Business Analytics for Decision Making Week 02 Quiz Answers

Q1. Which of the following best defines Monte Carlo simulation?

- It’s a tool for building statistical models that characterize relationships among a dependent variable and one or more independent variables.
- It’s a collection of techniques that seeks to group or segment a collection of objects into subsets.
- It’s the process of selecting values of decision variables that minimizes or maximizes some quantity of interest.
**It’s the process of generating random values for uncertain inputs in a model and computing the output variables of interest.**

Q2. If chance or uncertainty is present in a system then there is an element of **__** in the decision-making problem.

- danger
- security
**risk**- difficulty

Q3. Which of the following are weaknesses of manual what-if analysis? (select all that apply)

**biased sample values of performance measures****hard to do many what-if scenarios****does not provide distribution information**

Q4. Which of the following is a parameter of the Poisson distribution?

- maximum value
**mean**- minimum value
- most likely value

Q5. In the Analytic Solver Platform, “Psi” functions are used to add uncertainty to a spreadsheet model.

**true**- false

Q6. Why would a manager be interested in analyzing risk?

- to determine a most likely outcome
- to determine a range of outcomes
**to determine a distribution of outcomes**;- to determine a confidence interval on most likely outcomes

Q7. The PsiOutput function of the Analytic Solver Platform is used to collect simulation data to create an empirical distribution of an output variable.

**true**- false

Q8. Historical data is used in simulation to:

**perform a worst-case analysis**- optimize the outcomes
- estimate a probability distribution function for critical inputs to the model
- simplify the model

Q9. Distribution fitting is the process of gathering historical data.

- true
**false**

Q10. Adding a correlation matrix to a simulation model is necessary when:;

- the uncertain input variables in the model are independent
- the model is deterministic (i.e., it does not have any uncertain inputs)
**two or more of the uncertain input variables in the model are not independent**- an output variable is related to an uncertain input variable

Q11. Which of the following statements is false:

- correlation is a measure of the strength of the relationship between two variables
**correlation values are always positive**- the correlation between two variables can be positive or negative
- the correlation between two independent variables is zero

Q12. The Analytic Solver Platform ** __** allows you to determine the influence that each uncertain input variable has on an output variable based on the correlation between the input and the output variable.

- trend chart
- overlay chart
- box-whisker chart
**sensitivity chart**;

Q13. The Analytic Solver Platform ** __** allows you to superimpose the frequency distributions of selected output variables in order to compare them.

- trend chart
**overlay chart**- box-whisker chart
- sensitivity chart

Q14. The Flaw of Averages typically results when a single number, the average value, is used in a spreadsheet model to represent an uncertain future quantity.

**true**- false

Q15. The average value for an output cell in a deterministic spreadsheet model that uses average values for uncertain input cells is always the same as the average value for the same output cell obtained with a Monte Carlo simulation.

- true
**false**;

#### Quiz 2

Q1. A technology company has $2 million to invest in new research and development projects. The following table summarizes the initial cost, probability of success, and revenue potential for each of the projects under consideration.

<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/sOk4GY8YEeaHGRLYaNA_Dw_b4b324e43e30233486554a237b613da9_C3_M2_Application_Assignment_Table.png?expiry=1641081600000&hmac=wf_UM_UuoclLkkGnp-v06Hk98AhTwXJOmL_XyghygN0>

Management has built the Monte Carlo simulation model in the Excel file *Project Selection* and would like to use it to compare various portfolio alternatives. The probability of making at least $1 million in total profit is the criterion that management wants to use. Based on this criterion, **which of the eight project portfolios should the company fund? Portfolio Selection**XLSX FileDownload file

(*Hint*: Enter 1 in the “Select?” column to indicate that a project is included in the portfolio. Turn on the Simulation Bulb in the Solver Action group of the Analytic Solver Platform. Run the simulation by clicking on the green “play” button in the Solver Options panel. Double-click on cell K14 to display the Frequency Chart of total profit and set the right marker to 1000.)

- Projects 1, 2, 3, 6, 7, and 8
**Projects 1, 2, 3, 4, and 7**- Projects 2, 4, 5, 6, and 8
- Projects 1, 3, 4, 5, 6, and 8

#### Business Analytics for Decision Making Week 03 Quiz Answers

#### Quiz 1

Q1. Which of the following statements are true? (select all that apply)

**Optimization has been defined as the process of selecting the values of decision variables that minimize or maximize some quantity of interest.****Optimization started in the area of operations management but it is now used in all areas of business.****Optimization models are prescriptive because their outcome is a recommendation of what to do.**

Q2. In an optimization model, decision variables are:

**The unknowns for which the optimization process will find the best values.**- The functions to be maximized or minimized.
- The restrictions or limitations that are either related to technical and practical considerations or they are imposed by managerial policies.
- The parameter values provided by the analyst.

Q3. In a linear programming model both the objective function and the constraints are formulated as linear functions of the decision variables.

**True**- False

Q4. What is the goal in optimization of the transportation problem?

- Find the values of the decision variables that use all supplier capacities.
**Find the decision variable values (i.e., the shipment quantities) that result in the best objective function (i.e., lowest total cost) and satisfy all constraints.**- Find the values of the decision variables that satisfy all the demand constraints.
- None of these.

Q5. What does the Excel “=SUMPRODUCT(A1:A3,B1:B3)” function do?

- Sums each range and multiplies the sums. That is, (A1+A2+A3)*(B1+B2+B3).
- Sums each pair of cells and multiples each sum. That is, (A1+B1)
*(A2+B2)*(A3+B3). - Multiplies each range and sums the products. That is, (A1
*A2*A3)+(B1*B2*B3) **Multiplies each pair of cells and sums the products. That is, (A1***B1)+(A2*B2)+(A3*B3).

Q6. What function is used to add the contents of cells A1, A2, and A3?;

- =ADD(A1:A3).
- =TOTAL(A1:A3).
**=SUM(A1:A3).**- =PRODUCT(A1:A3).

Q7. Suppose that three decision variables are in cells A1, A2, and A3. To add nonnegativity constraints with the Analytic Solver Platform, you click on Constraints in the Optimization Model group, then choose Variable Type/Bound, click on “>=”, and fill out the dialogue as follows:

**True**- False

Q8. What is true about the ASP optimization model shown below? :

- The model has 6 decision variables, three in cells A1 to A3 and three in cells C4 to C6.
- The model enforces the following constraint: C4+C5+C6 <= D4+D5+D6.
**The model minimizes the value of C1 by changing the nonnegative values in cells A1 to A3.**

Q9. Which of the following statements are true about a Sensitivity Report?;

**It provides very useful information for pricing decisions, the value of resources, and the robustness of the optimal solution.****It’s not able to provide answers to what-if questions that involve multiple changes in the model, such as simultaneously changing the coefficient of a decision variable and a the right-hand-side of a constraint.****It provides information about decision variables (reduced costs) and constraints (shadow prices).**

Q10. If the shadow price for a resource constraint is 0, the allowable increase is 200 units, and 150 units of the resource are added, what happens to the objective function value?

- It increases by 150
- It increases by more than 0 but less than 150
**No change**- It increases but by an unknown amount

Q11. Which of the following approaches provided by the Analytic Solver Platform can automatically run multiple optimization while varying model parameters (e.g., the right hand side of a constraint) within a prespecified range?

- Breakdown analysis
**Parameter analysis**- Uncertainty analysis
- Sensitivity analysis

Q12. A bar chart is an effective way of visualizing the use of a resource in an optimal solution, where colors represent how the resource is used and the height represents how much of the resource is used.

**True**- False;

#### Quiz 2

Q1. A paper recycling company converts newspaper, mixed paper, white office paper, and cardboard into pulp for newsprint, packaging paper, and print stock quality paper. The following table summarizes the yield for each kind of pulp recovered from each ton of recycled material.

<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/CS06kI_PEeacAAocUZL6NQ_5828e96c3a3bad4b724163cdca9bf680_coursera.PNG?expiry=1641081600000&hmac=_-bJk1UWGm2Nf7az-FDC1WzeTVHc6ZBuZq3qFSn_4l0>

This table shows that, for instance, a ton of newspaper can produce either 0.85 tons of newsprint pulp or 0.80 tons of packaging pulp. The following table shows the processing costs per ton, the purchase cost, and the availability of the recycled material.

<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/MVmnEo_PEeacAAocUZL6NQ_87239e2b3b45835e73c00f6562b5e43c_coursera.PNG?expiry=1641081600000&hmac=E5Gc1p147xuzOsS5QjzbNsUgEzbp8jg4NEP2ZdCWJCE>

The recycling company wants to create an optimization model to determine how to produce 500 tons of newspaper pulp, 600 tons of packaging paper pulp, and 300 tons of print stock quality pulp at a minimum cost. The following questions are based on the Excel file Paper Recycling.**Paper Recycling**XLSX FileDownload file

The decision variables in the optimization model are:

- Used tons (F23:F26)
**Processed tons (C23:E26)**- Pulp production(C27:E27)
- Purchase and production costs (C30:C31);

Q2. The constraints in the optimization model are:

**Pulp production >= Required pulp (C27:E27 >= C18:E18), Used tons <= Available tons (F23:F26 <= G14:G17), and Processed tons >= 0 (C23:E26 >= 0)**- Pulp production <= Required pulp (C27:E27 <= C18:E18), Used tons >= Available tons (F23:F26 >= G14:G17), and Processed tons >= 0 (C23:E26 >= 0)
- Production cost >= Purchase cost (C31 >= C30)
- There are no constraints in the problem.

Q3. The objective function in the optimization model is:

- Maximize total cost (Max C32)
- Minimize Production cost (Min C31)
**Minimize total cost (Min C32)**- Maximize pulp production (Max SUM(C27:E27))

Q4. Solve the optimization model that results from your answers to questions 13, 14, and 15. What is the total cost for the optimal solution?

- $41,841.91
**$44,067.74**- $35,692.86
- None of the above

Q5. Generate the Sensitivity Report for the optimal solution and use it to figure out how much should the the recycling company be willing to pay for an additional ton of recycled newspaper. (Hint: To generate the report, go to the Analysis group of the Analytic Solver Platform tab and click on Reports -> Optimization -> Sensitivity. If the report is not there, make sure that the Standard LP Engine was chosen to solve the model.)

**No more than $3.10**- No more than $4.20
- No more than $28.99
- $0.00;

#### Business Analytics for Decision Making Week 04 Quiz Answers

Q1. Which of the following is not a benefit of using binary variables?

**Models are easy to solve (i.e., the solvers can find optimal solutions faster) because the variables can only be zero or one.**- Binary variables are useful in selection problems.
- Binary variables can be used to model yes/no decisions.
- Binary variables can enforce logical conditions.

Q2. An optimization model has 5 binary decision variables. How many possible integer solutions are there to this problem?

- 5
- 10
- 25
**32**

Q3. A company wants to select no more than 2 projects from a set of 4 possible projects. Which of the following constraints ensures that no more than 2 will be selected, assuming that the P variables are binary and represent whether a project is selected (value of 1) or not (value of 0)?;

- P1+P2+P3+P4 = 2
**P1+P2+P3+P4 ≤ 2**- P1+P2+P3+P4 ≥ 2
- P1+P2+P3+P4 ≥ 0

Q4. A company must invest in project 1 in order to invest in project 2. P1 is a binary variable representing whether project 1 is chosen (value of 1) or not (value of 0). P2 has the same interpretation for project 2. Which of the following constraints ensures that if project 2 is chosen then project 1 must also be chosen?

- P1+P2 = 0
- P1+P2 = 1
**P1-P2 ≥ 0**- P1-P2 ≤ 0

Q5. An optimization model for a production process must deal with the following situation. The model must decide whether or not to produce a product. If the decision is to produce the product, then the policy is that at least 100 units of this product must be produced. The following Excel cells are part of a spreadsheet model for this problem:

Cell B1 contains a binary decision variable, where 1 = produce and 0 = not produce. B4 is a decision variable indicating the amount to produce. Which of the following combination of an Excel function for B3 and a solver constraint enforces the production policy?;

**=B1*B2 and B4 >= B3**- =B1*B3 and B3 >= B4
- =B1+B2 and B4 >= B3
- =B1*B4 and B3 >= B2

Q6. Which of the following statements is not true about metaheuristic optimization?

- Metaheuristics provide great modeling flexibility.
- Metaheuristics can solve optimization models with nonlinear and/or non-smooth functions.
- The metaheuristic solver in the Analytic Solver Platform is called the Evolutionary Engine.
**Metaheuristics are exact procedures that guarantee finding an optimal solution.**

Q7. In market basket analysis, the Lift Ratio tells us how much more likely it is for item Y to be purchased given that item X has been purchased ?

**True**- False

Q8. A chance constraint is a special type of constraint that it is satisfied only in a fraction of the trials in a simulation.

**True**- False

Q9. An optimization model includes a chance constraint to satisfy demand of a particular product. The demand is uncertain and is modeled with an integer uniform distribution with parameter value of 0 and 4. That is, the probability that the demand is 0, 1, 2, 3, or 4 is exactly the same. A decision is made to order 2 units of the product from a supplier in order to satisfy the uncertain demand. What is the value at risk (VaR) for the demand constraint?

- 30%
**40%**- 50%
- 60%;

#### Quiz 2

Q1. A technology company has $2 million to invest in new research and development projects. The following table summarizes the initial cost, probability of success, and revenue potential for each of the projects under consideration.

<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/FBj_MI_UEeaHGRLYaNA_Dw_4d64648d724772ba648ce6da298b966e_coursera.PNG?expiry=1641081600000&hmac=P9geC1UgeW4EHpwwXPb9vKHPchvpNHiqAXBqmooMHJo>

Management has built the Monte Carlo simulation model in the Excel file Project Selection SO and would like to find the portfolio that maximizes the probability of making at least $1 million in profits. Questions 1, 2, and 3 guide you through the implementation of an optimization model. Add the optimization model as you answer these questions. (Hint: The three elements of the optimization model, decision variables, constraints, and the objective function, are of the “Normal” type. Also turn on the Simulation Bulb in the Solver Action group of the Analytic Solver Platform.)**Portfolio Selection SO**XLSX FileDownload file

The decision variables in the optimization model are:

**Select? (H5:H12)**- Success? (I5:I12)
- Revenue (J5:J12)
- Profit (K5:K12)

Q2. The constraints in the optimization model are:;

- Revenue >= Profit (J5:J12 >= K5:K12)
- Select? >= Success? (H5:H12 >= I5:I12)
**Total cost <= Available funds (H14 <= H15) and binary variables**- Total profit >= Probability that the total profit is at least $1 million (K14 >= K15)

Q3. The objective function in the optimization model is:

- Minimize total cost (Min H14)
**Maximize the probability that the total profit is at least $1 million (Max K15)**- Maximize total profit (Max K14)
- Maximize available funds (Max H15)

Q4. Use the Evolutionary Engine to solve the optimization model that results from your answers to questions 1, 2, and 3. (Make sure that the number of trials is set to 10000.) Compare the solution that you found with the following solutions. Which of the following solutions is the best? (Hint: The Evolutionary Solver might not have found the best solution, so try all these solutions in your model before answering the question.)

- Projects 1, 2, 3, 6, 7, and 8
- Projects 2, 4, 5, 6, and 8
**Projects 1, 2, 3, 5, 6, and 8**- Projects 1, 2, 3, 4, and 7

**Review: **

Based on our knowledge, we urge you to enroll in this course so you can pick up new skills from specialists. It will be worthwhile, we trust.