### Week 1: Foundations of Sports Analytics: Data, Representation, and Models in Sports Quiz Answers

#### Quiz 1: Week 1 Quiz

Q1. How many EPL games from this season were played in 2018?

- 380
- 190
- 171
- 209

Q2. Which team scored the highest number of goals while playing at home in the first half of the season?

- Watford
- Liverpool
- Manchester City
- Stoke;

Q3. Which team conceded the highest number of goals while playing away in the first half of the season?

- West Ham
- Stoke
- Liverpool
- Watford

Q4. Which of the following teams had the smallest difference between their win percentage and Pythagorean expectation in the first half of the season?

- Manchester United
- Manchester City
- Arsenal
- Liverpool

Q5. Which of the following teams had the smallest difference between their win percentage and Pythagorean expectation in the first half of the season?

- Stoke
- Brighton
- Bournemouth
- Leicester

Q6. Which of the following teams had the highest value for away wins (awinvalue) for in the first half of the season?

- West Ham
- West Brom
- Crystal Palace
- Stoke

Q7. Which team had the largest gap between home points won (hwinvalue) and away points won (awinvalue) in the second half the season?

- West Ham
- Brighton
- Arsenal
- Watford

Q8. What was the correlation between win percentage and the Pythagorean expectation in the first half of the season?

- 1.000
- 0.956
- 0.968
- 0.796

Q9. What was the correlation between win percentage in the first half of the season and the second half of the season?

- 0.968
- 0.796
- 1.000
- 0.757

Q10. What was the correlation between win percentage in the second half of the season and the Pythagorean expectation in the first half of the season?

- .757
- 0.746
- .796
- 1.000;

### Week 2: Foundations of Sports Analytics: Data, Representation, and Models in Sports Quiz Answers

#### Quiz 1: Week 2 – Quiz 1

Q1. What are the number of observations and the number of variables in the NHL_Game dataframe after performing the first 7 steps?

- 18506; 13
- 18,946; 13
- 18,946; 24
- 18506; 24

Q2. What is the time range of the NHL_Game dataframe after you performed step 8?

- 2010-10-07 to 2018-06-08
- 2015-03-08 to 2018-06-08
- 2010-10-07 to 2015-03-08
- 2010-10-07 to 2018-06-14

Q3. After performing step 9 above, what are the values of the “gid” variable of the fifth, tenth, and fifteenth observations by date in ascending order in the prepared NHL_Game dataframe?

- 2725, 2720, 2716
- 5662, 5668, 5683
- 5666, 5662, 5668
- 2730, 2725, 2720;

### Week 3: Foundations of Sports Analytics: Data, Representation, and Models in Sports Quiz Answers

#### Quiz 1: Week 3 – Quiz 1

Q1. Which of the four players had the greatest number of shots in the season?

- Kevin Durant
- Russell Westbrook
- Dwight Howard
- DeAndre Jordan

Q2. Comparing the plots of Russell Westbrook and Kevin Durant, which of the following statements is best supported by the images?

- Russell Westbrook has a higher success rate than Kevin Durant
- Kevin Durant takes more three point shots that Russell Westbrook
- Kevin Durant has a higher success rate Russell Westbrook;
- Russell Westbrook shoots more than Kevin Durant

Q3. Comparing the plots of DeAndre Jordan and Dwight Howard, which of the following statements is best supported by the images?

- DeAndre Jordan takes more shots at a distance from the basket than Dwight Howard
- Dwight Howard is a better shooter than DeAndre Jordan
- Dwight Howard takes more shots at a distance from the basket than DeAndre Jordan
- DeAndre Jordan is a better shooter than Dwight Howard

Q4. Comparing the plots of Brook Lopez and Brian Lopez, which of the following statements is best supported by the images?

- The Lopez twins never attempt 3 point shots
- Both of the Lopez twins attempt 3 point shots
- Robin Lopez attempts 3-point shots but Brook Lopez doesn’t
- Brook Lopez attempts 3-point shots but Robin Lopez doesn’t

Q5. Based on these plots, which of the six players seems most likely to found shooting from the left hand side of the basket rather than the right hand side?

- Dwight Howard
- Russell Westbrook
- Kevin Durant
- Brook Lopez;

### Week 4: Foundations of Sports Analytics: Data, Representation, and Models in Sports Quiz Answers

#### Quiz 1: Week 4 – Quiz 1

Q1. Which of the following about the NFL_Game dataframe is incorrect?

- The dataframe covers games from 1966 to 2019 (1966-1967 season to 2019-2020 season).
- The dataframe has 24,314 observations and 28 variables.
- NFL teams earned a higher average score at home than away.
- The highest score achieved by an away team was 72.

Q2. Which of the following statements regarding the correlation coefficients calculated in step 3 is incorrect?

- The correlation coefficient between “score” and “weather_temperature” is -0.03. This means that teams earned slightly lower scores when the temperature got higher.
- The correlation coefficient between “win” and “home” is 0.15. This means that teams won more at home games than away games.
- The correlation coefficient between “score_diff” and “home” is 0.17. This means that the difference between own score and opponent’s score tend to be higher for home team than away team.
- The correlation coefficient between “score” and “weather_wind_mph” is -0.079. This means there is a weak negative relationship between a team’s final score and wind speed.;

### Week 5: Foundations of Sports Analytics: Data, Representation, and Models in Sports Quiz Answers

#### Quiz 1: Week 5 Quiz

Q1. What was the value of the sum of all salaries in 2014?

- $82 million
- $84 million
- $65 million
- $74 million

Q2. In the first regression, what is the coefficient on relsal?

- 0.403
- 0.509
- 0.4752
- 0.2050;

Q3. Based on the first regression, which of the following statements is the most accurate

- We cannot say if relsal significantly affects win percentage
- The coefficient of relsal is insignificant
- The coefficient of relsal is insignificantly different from zero at the 5% level (p-value)
- The regression implies that increasing salaries causes higher win percentage

Q4. In the second regression, which of the following statements is true:

- Lagged win percentage is statistically significant at the 5% level but relsal is not
- Neither relsal nor lagged win percentage are statistically significant at the 5% level
- Both relsal and lagged win percentage are statistically significant at the 5% level
- Relsal is statistically significant at the 5% level but lagged win percentage is not

Q5. Based on the second regression, which of the following statements is the most accurate

- The regression suggests that relsal is more important than lagged wpc in explaining performance
- The regression is an improvement on the first regression
- The addition of lagged win percentage has not significantly improved on the explanatory power of the regression compared to the first regression
- Relsal cannot explain win percentage

Q6. The third regression includes fixed effects. Not every team in a regression can have a fixed effect, there must be one “reference” team, relative to which each fixed effect is defined. Thus each fixed effect listed should be understood as meaning the performance of the team relative to the reference team. In this case, what is the reference team?

- Deccan Chargers
- Kochi Tuskers Kerala
- Sunrisers Hyderabad
- Chennai Super Kings;

Q7. Ignoring Rising Pune Supergiants (for which there is only one observation) all of the fixed effects are negative. What does this tell us about the reference team?

- All else equal, it is the worst team in the league.
- The team has the highest salaries in the league
- The team has the lowest salaries in the league
- All else equal, it is the best team in the league.

Q8. Looking at the coefficient and standard error for the Mumbai Indians fixed effect, which would you say best describes the statistical inference that can be drawn?

- Mumbai is statistically the second best team in the IPL
- Mumbai is statistically better than the reference team
- Mumbai is statistically worse than the reference team
- Mumbai is statistically no better or worse than the reference team

Q9. Based on the third regression, which of the following statements is the most accurate

- The negative coefficient on lagged win percentage suggests that higher win percentage last year leads to lower win percentage this year
- The negative coefficient on relsal suggests that higher salary spending leads to lower win percentage
- The increased R2 means that we should have more confidence in the coefficient value in this regression than the previous two regressions
- The addition of fixed effect has not significantly improved on the explanatory power of relsal or lagged win percentage compared to the first two regressions

Q10. In the three regressions considered here, relsal is not statistically significant in any of them. Which of the following would NOT be a good explanation for this?

- Salaries are not measured accurately and so the true value of relsal is not being tested
- The market for cricketers is not efficient
- Players play for the love of the game and not for money
- Because there is an effective salary cap, salaries do not in fact vary enough to make a difference;

### Week 6: Foundations of Sports Analytics: Data, Representation, and Models in Sports Quiz Answers

#### Quiz 1: Week 6 – Quiz 1

Q1. Which of the following statements about the 2014-2015 season NBA data is incorrect?

- The shotlog dataframe covers information on 128,069 shots.
- There is information on 120 games in the shotlog dataframe.
- There is information on 281 NBA players.
- There is missing value in the “shot_clock” variable.

Q2. Which of the following statements regarding the prediction errors calculated in step 4 is incorrect

- There are 113,726 observations with real value in the prediction error for the previous period.
- The standard deviation for the current period prediction error is 0.49.
- The maximum for both prediction errors is 0.69.
- The average current period prediction error is -1.099;

.

