### Week 1 Quiz Answers

#### Quiz 1: Quiz 1

Q1. Consider the data set given below

`x <- c(0.18, -1.54, 0.42, 0.95)`

And weights given by

`w <- c(2, 1, 3, 1)`

Give the value of \muμ that minimizes the least squares equation

- 0.300
- 1.077
**0.1471**- 0.0025

Q2. Consider the following data set

x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)

y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)

Fit the regression through the origin and get the slope treating y

as the outcome and x as the regressor. (Hint, do not center the data since we want regression through the origin, not through the means of the data.)

**0.8263**- -1.713
- -0.04462
- 0.59915

Q3. Do \verb|data(mtcars)|data(mtcars) from the datasets package and fit the regression

model with mpg as the outcome and weight as the predictor. Give

the slope coefficient.

- -9.559
- 30.2851
**-5.344**- 0.5591

Q4. Consider data with an outcome (Y) and a predictor (X). The standard deviation of the predictor is one half that of the outcome. The correlation between the two variables is .5. What value would the slope coefficient for the regression model with YY as the outcome and XX as the predictor?

- 4
**1**- 3
- 0.25

Q5. Students were given two hard tests and scores were normalized to have empirical mean 0 and variance 1. The correlation between the scores on the two tests was 0.4. What would be the expected score on Quiz 2 for a student who had a normalized score of 1.5 on Quiz 1?

- 0.16
- 1.0
**0.6**- 0.4

Q6. Consider the data given by the following

`x <- c(8.58, 10.46, 9.01, 9.64, 8.86)`

What is the value of the first measurement if x were normalized (to have mean 0 and variance 1)?

- 8.86
- 8.58
- 9.31
**-0.9719**

Q7. Consider the following data set (used above as well). What is the intercept for fitting the model with x as the predictor and y as the outcome?

```
x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)
y <- c(1.39, 0.72, 1.55, 0.48, 1.19, -1.59, 1.23, -0.65, 1.49, 0.05)
```

- 2.105
- 1.567
**-1.713**- 1.252

Q8. You know that both the predictor and response have mean 0. What

can be said about the intercept when you fit a linear regression?

- Nothing about the intercept can be said from the information given.
- It is undefined as you have to divide by zero.
**It must be identically 0.**- It must be exactly one.

Q9. Consider the data given by

`x <- c(0.8, 0.47, 0.51, 0.73, 0.36, 0.58, 0.57, 0.85, 0.44, 0.42)`

What value minimizes the sum of the squared distances between these points and itself?

- 0.8
- 0.36
**0.573**- 0.44

Q10. Let the slope having fit Y as the outcome and X as the predictor be denoted as \beta_1 β . Let the slope from fitting X as the outcome and Y as the predictor be denoted as \gamma_1 γ . Suppose that you divide \beta_1β by \gamma_1γ ; in other words consider \beta_1 / \gamma_1 β /γ . What is this ratio always equal to?

- Cor(Y, X)Cor(Y,X)
**Var(Y) / Var(X)Var(Y)/Var(X)**- 2SD(Y) / SD(X)2SD(Y)/SD(X)
- 1

### Week 2 Quiz Answers

#### Quiz 1: Quiz 2

Q1. Consider the following data with x as the predictor and y as as the outcome.

```
x <- c(0.61, 0.93, 0.83, 0.35, 0.54, 0.16, 0.91, 0.62, 0.62)
y <- c(0.67, 0.84, 0.6, 0.18, 0.85, 0.47, 1.1, 0.65, 0.36)
```

Give a P-value for the two sided hypothesis test of whether \beta_1 β

from a linear regression model is 0 or not.

- 0.025
- 2.325
- 0.391
**0.05296**

Q2. Consider the previous problem, give the estimate of the residual standard deviation.

- 0.05296
- 0.4358
- 0.3552
**0.223**

Q3. In the \verb|mtcars|mtcars data set, fit a linear regression model of weight (predictor) on mpg (outcome). Get a 95% confidence interval for the expected mpg at the average weight. What is the lower endpoint?

**18.991**- -4.00
- 21.190
- -6.486

Q4. Refer to the previous question. Read the help file for \verb|mtcars|mtcars. What is the weight coefficient interpreted as?

- The estimated 1,000 lb change in weight per 1 mpg increase.
**The estimated expected change in mpg per 1,000 lb increase in weight.**- The estimated expected change in mpg per 1 lb increase in weight.
- It can’t be interpreted without further information

Q5. Consider again the \verb|mtcars|mtcars data set and a linear regression model with mpg as predicted by weight (1,000 lbs). A new car is coming weighing 3000 pounds. Construct a 95% prediction interval for its mpg. What is the upper endpoint?

- 21.25
- -5.77
- 14.93
**27.57**

Q6. Consider again the \verb|mtcars|mtcars data set and a linear regression model with mpg as predicted by weight (in 1,000 lbs). A “short” ton is defined as 2,000 lbs. Construct a 95% confidence interval for the expected change in mpg per 1 short ton increase in weight. Give the lower endpoint.

**-12.973**- -9.000
- -6.486
- 4.2026

Q7. If my X from a linear regression is measured in centimeters and I convert it to meters what would happen to the slope coefficient?

**It would get multiplied by 100.**- It would get multiplied by 10
- It would get divided by 10
- It would get divided by 100

Q8. I have an outcome, YY, and a predictor, XX and fit a linear regression model with Y = \beta_0 + \beta_1 X + \epsilonY=β. What would be the consequence to the subsequent slope and intercept if I were to refit the model with a new regressor, X + cX+c for some constant, cc?

- The new slope would be c \hat \beta_1
*cβ*^1 - The new intercept would be \hat \beta_0 + c \hat \beta_1
*β*^0+*cβ*^1 **The new intercept would be \hat \beta_0 – c \hat \beta_1***β*^0−*cβ*^1- The new slope would be \hat \beta_1 + c
*β*^1+*c*

Q9. Refer back to the mtcars data set with mpg as an outcome and weight (wt) as the predictor. About what is the ratio of the the sum of the squared errors, \sum_{i=1}^n (Y_i – \hat Y_i)^2 ∑

i=1

when comparing a model with just an intercept (denominator) to the model with the intercept and slope (numerator)?

- 0.75
- 4.00
- 0.50
**0.25**

Q10. Do the residuals always have to sum to 0 in linear regression?

- If an intercept is included, the residuals most likely won’t sum to zero.
**If an intercept is included, then they will sum to 0.**- The residuals must always sum to zero.
- The residuals never sum to zero.

### Week 3 Quiz Answers

#### Quiz 1: Quiz 3

Q1. Consider the \verb| mtcars| mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as confounder. Give the adjusted estimate for the expected change in mpg comparing 8 cylinders to 4.

**-6.071**- -3.206
- -4.256
- 33.991

Q2. Consider the \verb| mtcars| mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight as a possible confounding variable. Compare the effect of 8 versus 4 cylinders on mpg for the adjusted and unadjusted by weight models. Here, adjusted means including the weight variable as a term in the regression model and unadjusted means the model without weight included. What can be said about the effect comparing 8 and 4 cylinders after looking at models with and without weight included?.

**Holding weight constant, cylinder appears to have less of an impact on mpg than if weight is disregarded.**- Within a given weight, 8 cylinder vehicles have an expected 12 mpg drop in fuel efficiency.
- Holding weight constant, cylinder appears to have more of an impact on mpg than if weight is disregarded.
- Including or excluding weight does not appear to change anything regarding the estimated impact of number of cylinders on mpg.

Q3. Consider the \verb|mtcars|mtcars data set. Fit a model with mpg as the outcome that considers number of cylinders as a factor variable and weight as confounder. Now fit a second model with mpg as the outcome model that considers the interaction between number of cylinders (as a factor variable) and weight. Give the P-value for the likelihood ratio test comparing the two models and suggest a model using 0.05 as a type I error rate significance benchmark.

**The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms may not be necessary.**- The P-value is small (less than 0.05). Thus it is surely true that there is no interaction term in the true model.
- The P-value is small (less than 0.05). So, according to our criterion, we reject, which suggests that the interaction term is not necessary.
- The P-value is small (less than 0.05). So, according to our criterion, we reject, which suggests that the interaction term is necessary
- The P-value is larger than 0.05. So, according to our criterion, we would fail to reject, which suggests that the interaction terms is necessary.
- The P-value is small (less than 0.05). Thus it is surely true that there is an interaction term in the true model.

Q4. Consider the \verb|mtcars|mtcars data set. Fit a model with mpg as the outcome that includes number of cylinders as a factor variable and weight inlcuded in the model as

`lm(mpg ~ I(wt * 0.5) + factor(cyl), data = mtcars)`

How is the wt coefficient interpretted?

- The estimated expected change in MPG per half ton increase in weight.
- The estimated expected change in MPG per half ton increase in weight for the average number of cylinders.
- The estimated expected change in MPG per half ton increase in weight for for a specific number of cylinders (4, 6, 8).
- The estimated expected change in MPG per one ton increase in weight.
**The estimated expected change in MPG per one ton increase in weight for a specific number of cylinders (4, 6, 8).**

Q5. Consider the following data set

```
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
Give the hat diagonal for the most influential point
```

**0.9946**- 0.2287
- 0.2025
- 0.2804

Q6. Consider the following data set

```
x <- c(0.586, 0.166, -0.042, -0.614, 11.72)
y <- c(0.549, -0.026, -0.127, -0.751, 1.344)
```

Give the slope dfbeta for the point with the highest hat value.

- -.00134
**-134**- -0.378
- 0.673

Q7. Consider a regression relationship between Y and X with and without adjustment for a third variable Z. Which of the following is true about comparing the regression coefficient between Y and X with and without adjustment for Z.

- The coefficient can’t change sign after adjustment, except for slight numerical pathological cases.
- Adjusting for another variable can only attenuate the coefficient toward zero. It can’t materially change sign.
- For the the coefficient to change sign, there must be a significant interaction term.
**It is possible for the coefficient to reverse sign after adjustment. For example, it can be strongly significant and positive before adjustment and strongly significant and negative after adjustment.**

#### Quiz 2: (OPTIONAL) Data analysis practice with immediate feedback (NEW! 10/18/2017)

Q1. You are being asked to participate in a research experiment with the purpose of better understanding how people analyze data. If you complete this quiz, you are giving your consent to participate in the study. This quiz involves a short data analysis that gives you a chance to practice the regression concepts you have learned so far.

We anticipate that this will take about 15 minutes to complete. You will be receiving feedback on your work immediately after submission. For this reason, we ask that you do not post on the forums about this quiz to maintain the integrity of this experiment.

Thank you for helping us learn more about data science! -Brian, Roger, Jeff

Your assignment is to study how income varies across different categories of college majors. You will be using data from a study of recent college graduates. Make sure to use good practices that you have learned so far in this course and previous courses in the specialization.

If you will proceed with the analysis, click “Yes”. Otherwise, click “No” and exit the quiz.

- Yes
- No

Q2. Your assignment is to study how income varies across college major categories. Specifically answer: “Is there an association between college major category and income?”

To get started, start a new R/RStudio session with a clean workspace. To do this in R, you can use the q() function to quit, then reopen R. The easiest way to do this in RStudio is to quit RStudio entirely and reopen it. After you have started a new session, run the following commands. This will load a data.frame called college for you to work with.

```
install.packages("devtools")
devtools::install_github("jhudsl/collegeIncome")
library(collegeIncome)
data(college)
```

Next download and install the matahari R package with the following commands:

```
devtools::install_github("jhudsl/matahari")
library(matahari)
```

This package allows a record of your analysis (your R command history) to be documented. You will be uploading a file containing this record to GitHub and submitting the link as part of this quiz.

Before you start the analysis for this assignment, enter the following command to begin the documentation of your analysis:

`dance_start(value = FALSE, contents = FALSE)`

You can then proceed with the rest of your analysis in R as usual. When you have finished your analysis, use the following command to save the record of your analysis on your desktop:

`dance_save("~/Desktop/college_major_analysis.rds")`

Please upload this college_major_analysis.rds file to a public GitHub repository. In question 4 of this quiz, you will share the link to this file.

A codebook for the dataset is given below:

- rank: Rank by median earnings
- major_code: Major code
- major: Major description
- major_category: Category of major
- total: Total number of people with major
- sample_size: Sample size of full-time, year-round individuals used for income/earnings estimates: p25th, median, p75th
- p25th: 25th percentile of earnings
- median: Median earnings of full-time, year-round workers
- p75th: 75th percentile of earnings
- perc_men: % men with major (out of total)
- perc_women: % women with major (out of total)
- perc_employed: % employed (out of total)
- perc_employed_fulltime: % employed 35 hours or more (out of employed)
- perc_employed_parttime: % employed less than 35 hours (out of employed)
- perc_employed_fulltime_yearround: % employed at least 50 weeks and at least 35 hours (out of employed and full-time)
- perc_unemployed: % unemployed (out of employed)
- perc_college_jobs: % with job requiring a college degree (out of employed)
- perc_non_college_jobs: % with job not requiring a college degree (out of employed)
- perc_low_wage_jobs: % in low-wage service jobs (out of total)

Question: Based on your analysis, would you conclude that there is a significant association between college major category and income?

- Yes
- No

Q3. Please type a few sentences describing your results.

What do you think?

Q4. Please upload the file generated by matahari (college_major_analysis.rds) to a public GitHub repository and paste the link to that file here.

What do you think?

### Week 4 Quiz Answers

#### Quiz 1: Quiz 4

Q1. Consider the space shuttle data \verb|?shuttle|?shuttle in the \verb|MASS|MASS library. Consider modeling the use of the autolander as the outcome (variable name \verb|use|use). Fit a logistic regression model with autolander (variable auto) use (labeled as “auto” 1) versus not (0) as predicted by wind sign (variable wind). Give the estimated odds ratio for autolander use comparing head winds, labeled as “head” in the variable headwind (numerator) to tail winds (denominator).

- -0.031
- 0.031
- 1.327
**0.969**

Q2. Consider the previous problem. Give the estimated odds ratio for autolander use comparing head winds (numerator) to tail winds (denominator) adjusting for wind strength from the variable magn.

- 1.00
- 0.684
**0.969**- 1.485

Q3. If you fit a logistic regression model to a binary variable, for example use of the autolander, then fit a logistic regression model for one minus the outcome (not using the autolander) what happens to the coefficients?

- The coefficients get inverted (one over their previous value).
- The coefficients change in a non-linear fashion.
**The coefficients reverse their signs.**- The intercept changes sign, but the other coefficients don’t.

Q4. Consider the insect spray data \verb|InsectSprays|InsectSprays. Fit a Poisson model using spray as a factor level. Report the estimated relative rate comapring spray A (numerator) to spray B (denominator).

**0.9457**- 0.136
- 0.321
- -0.056

Q5. Consider a Poisson glm with an offset, tt. So, for example, a model of the form \verb|glm(count ~ x + offset(t), family = poisson)|glm(count ~ x + offset(t), family = poisson) where \verb|x|x is a factor variable comparing a treatment (1) to a control (0) and \verb|t|t is the natural log of a monitoring time. What is impact of the coefficient for \verb|x|x if we fit the model \verb|glm(count ~ x + offset(t2), family = poisson)|glm(count ~ x + offset(t2), family = poisson) where \verb|2 <- log(10) + t|2 <- log(10) + t? In other words, what happens to the coefficients if we change the units of the offset variable. (Note, adding log(10) on the log scale is multiplying by 10 on the original scale.)

- The coefficient estimate is multiplied by 10.
- The coefficient is subtracted by log(10).
- The coefficient estimate is divided by 10.
**The coefficient estimate is unchanged**

Q6. Consider the data

```
x <- -5:5
y <- c(5.12, 3.93, 2.67, 1.87, 0.52, 0.08, 0.93, 2.05, 2.54, 3.87, 4.97)
```

Using a knot point at 0, fit a linear model that looks like a hockey stick with two lines meeting at x=0. Include an intercept term, x and the knot point term. What is the estimated slope of the line after 0?

- -0.183
- -1.024
**1.013**- 2.037