Coursera was launched in 2012 by Daphne Koller and Andrew Ng with the goal of giving life-changing learning experiences to students all around the world. In the modern day, Coursera is a worldwide online learning platform that provides anybody, anywhere with access to online courses and degrees from top institutions and corporations.

### Data Analysis with R Week 01 Quiz Answers

#### Graded Quiz Answers

Q1. What is the purpose of the Data Asset eXchange?

**Provides data that you can explore to conduct data analysis.**- Provides data that you can use for a small fee.
- Helps you exchange data with others.
- Provides data that is only useful for learning purposes.

Q2. In the Airline Performance dataset from the Asset Data eXchange, which of the following variables is a target for predicting on-time arrivals?

- CarrierDelay
**Distance**- SecurityDelay
- ArrDelay

Q3. What is the purpose of the pipe (%>%) operator?

- Assigns a value to a variable.
- Assigns a value to a global variable.
**Combines two functions into a single operation.**- Combines multiple functions into a single operation.

Q4. Which function can you use to read a text file that uses the “%” character as a delimiter?

**read_delim()**- read_tsv()
- read_csv()
- read_any()

Q5. What is the main similarity between the **summarize()** and **group_by()** functions?

- Both return a statistical summary of the data.
**Both group data by the specified variables.**- Both compute summary statistics.
- There is no similarity between the
**summarize()**and**group_by()**functions.

### Data Analysis with R Week 02 Quiz Answers

#### Graded Quiz answers

Q1. You want to access the “Date” column of a data frame called sales_data so you can perform an operation on it. What is the correct way to refer to this column?

- sales_data%Date
- sales_data$Date
**sales_data.Date**- sales_data#Date

Q2. Which function replaces missing values in a dataset?

- drop_na()
**replace_na()**- is.na()
- drop_columns
**()**

Q3. You have a variable called “Status” that contains a status code in the format “error_type-severity_level”, for example “10-07”, and you want to reformat the column so that the “error_type” and “severity_level” are in different columns. What is the correct function to do this**?**

- dataframe %>% mutate_if(Status, sep = “-“,

into = c(“error_type”, “severity_level”)

**dataframe %>% separate(Status, sep = “-“,**

** into = c(“error_type”, “severity_level”)**

- dataframe %>% mutate_all(Status, sep = “-“,

into = c(“error_type”, “severity_level”)

- dataframe %>% sapply(Status, sep = “-“,

into = c(“error_type”, “severity_level”)

Q4. What are two benefits of data normalization?

- Helps you better understand data distribution.
**Brings data into a common standard of expression that allows you to make meaningful comparisons.****Minimize the effects of outliers, which can influence the result more.**- Enables a fair comparison between the different features and making sure they have the same impact.

Q5. To visualize its distribution, binned data is often plotted in which of the following type of chart?

**Scatter plot**- Histogram
- Line chart
- Bar chart

Q6. Which of the following can you accomplish using the **spread()** function? Select two answers.

- Reformat the categorical variable that its contents are in two or more columns.
- Convert categorical variables to dummy variables.
**Convert categorical variables to dummy variables and assign the value of another variable to each category.**- Size down three variables into one.

### Data Analysis with R Week 03 Quiz Answers

#### Graded Quiz Answers

Q1. Which of the following forms of exploratory data analysis generates short summaries about the sample and measures of the data?

- Correlation
- Pearson correlation
- Analysis of variance (ANOVA)
**Descriptive statistics**

Q2. When conducting exploratory data analysis, which visualizations are particularly useful for plotting the target variable over multiple variables to get visual clues of the relationship between these variables and the target.

- Scatter plots
- Histograms
- Heatmaps
**Boxplots**- Q3. Which of the following statements about the ANOVA F-test score are true? Select two answers.
- A large F-test score implies a strong correlation between variable categories and the target variable.
**A large F-test score implies a poor correlation between variable categories and the target variable.**- A small F-test score implies a strong correlation between variable categories and the target variable.
**A small F-test score implies a poor correlation between variable categories and the target variable.**

Q4. You can visualize the correlation between two variables by plotting them on a scatter plot and then doing which of the following?

- Nothing. The scatter plot alone can show the correlation completely.
- Add a correlation line.
**You should not use a scatter plot for visualizing the correlation between two variables.**- Add a regression line.

Q5. When using the Pearson method to evaluate the correlation between two variables, how do you know you can have strong certainty in the result?

- The P value is greater than 0.1.
**The P value is less than 0.05.**- The P value is less than 0.1.
- The P value is less than 0.001.

### Data Analysis with R Week 04 Quiz Answers

#### Graded Quiz Answers

Q1. In model development, you can develop more accurate models when you have which of the following?

- Relevant data.
- Larger quantities of data.
**Fewer independent variables.**- More dependent variables.

Q2. Assume you have a dataset called “new_dataset”, a predictor variable called X, and a target called Y, and you want to fit a simple linear regression model. Which command should you use?

- linear_model <- predict(Y ~ Z, data = new_dataset)
- linear_model <- lm(X ~ Y, data = new_dataset)
- linear_model <- lm(Y ~ X, data = new_dataset)
**linear_model <- predict(X ~ Y, data = new_dataset)**

Q3. When using the **predict()** function in R, what is the default confidence level?

- 95%
- 100%
**85%**- 90%

Q4. Which plot type helps you validate assumptions about normality?

- Q-Q plot
- Residual plot
**Scale-location plot**- Regression plots

Q5. A third order polynomial regression model is described as which of the following?

**Quadratic, meaning that the predictor variable in the model is squared.**- Cubic, meaning that the predictor variable in the model is cubed.
- Squared, meaning that the predictor variable in the model is squared.
- Simple linear regression.

Q6. How should you interpret an R-squared result of 0.89?

- The X variable causes the Y variable to positively change 89% of the time.
- 89% of the response variable variation is explained by a linear model.
- There is a strong negative correlation between the variables.
**89% of the response variable variation is explained by a polynomial model.**

Q7. When comparing linear regression models, when will the mean squared error (MSE) be smaller?

- When using a simple linear regression (SLR) model.
**When using a polynomial regression model.**- When using a multiple linear regression (MLR) model.
- This depends on your data. The model that fits the data better has the smaller MSE.

### Data Analysis with R Week 05 Quiz Answers

#### Graded Quiz Answers

Q1. Which situations are helped by using the cross-validation method to train your model? Select two answers.

- Working with models with small amounts of data.
**Determining if a model can be generalized for a broader group.**- Working with models with large amounts of data.
- Working with models that are underfit.

Q2. What is a strategy you can employ to address an underfit model?

- Reduce model complexity.
- Use regularization.
- Increase model complexity.
**Reduce the number of features in the training data.**

Q3. What is the difference between Ridge and Lasso regression?

- Ridge regression penalizes the sum of the absolute values of the coefficients while Lasso regression penalizes the sum of squared coefficients.
**There is no major difference between Ridge and Lasso regression.**- Lasso regression penalizes the sum of the absolute values of the coefficients while Ridge regression penalizes the sum of squared coefficients.
- Lasso regression increases or decreases the value of Lambda to penalize complex models more or less.

Q4. Which tidymodels function do you use to create the grid for a grid search?

- tune()
- grid_regular()
**tune_grid()**- add_model()

**Review: **

Based on our knowledge, we urge you to enroll in this course so you can pick up new skills from specialists. It will be worthwhile, we trust.