Coursera was launched in 2012 by Daphne Koller and Andrew Ng with the goal of giving life-changing learning experiences to students all around the world. In the modern day, Coursera is a worldwide online learning platform that provides anybody, anywhere with access to online courses and degrees from top institutions and corporations.
Week 1: Machine Learning Foundations: A Case Study Approach Quiz Answer
Quiz 1: S Frames
Q 1:Download the Wiki People SFrame. Then open a new Jupyter notebook, import TuriCreate, and read the SFrame data.
Answer: Click here
Q 2: How many rows are in the SFrame? (Do NOT use commas or periods.)
Answer: 59071
Q 3: Which name is in the last row?
- Conradign Netzer
- Cthy Caruth
- Fawaz Damrah
Q 4: Read the text column for Harpdog Brown. He was honored with:
- A Grammy award for his latest blues album.
- A gold harmonica to recognize his innovative playing style.
- A lifetime membership in the Hamilton Blues Society.
Q 5: Sort the SFrame according to the text column, in ascending order. What is the name entry in the first row?
- Zygfryd Szo
- Digby Morrell
- 007 James Bond
- 108 (artist)
- 8 Ball Aitken
Week 2: Machine Learning Foundations: A Case Study Approach Quiz Answer
Quiz 1: Regression
Q 1: Which figure represents an overfitted model?
Answer:
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/czbfW1vMEeWVtgr31Ad8Fw_e76f287b4f43f46f9afd6a29ccae1ead_Reg1a.png?expiry=1658620800000&hmac=x156QPF_Btk-yNR45SL6zO9sIUjsN6f2nMLs2tBIhGM>
Q 2: True or false: The model that best minimizes training error is the one that will perform best for the task of prediction on new data.
- True
- False
Q 3: The following table illustrates the results of evaluating 4 models with different parameter choices on some data set. Which of the following models fits this data the best?
Model index | Parameters (intercept, slope) | Residual sum of squares (RSS) |
1 | (0,1.4) | 20.51 |
2 | (3.1,1.4) | 15.23 |
3 | (2.7, 1.9) | 13.67 |
4 | (0, 2.3) | 18.99 |
- Model 1
- Model 2
- Model 3
- Model 4
Q 4: Assume we fit the following quadratic function: f(x) = w0+w1*x+w2*(x^2) to the dataset shown (blue circles). The fitted function is shown by the green curve in the picture below. Out of the 3 parameters of the fitted function (w0, w1, w2), which ones are estimated to be 0? (Note: you must select all parameters estimated as 0 to get the question correct.)
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/8CcG-lvREeWzLwrzeFOkAw_5c39244e7608d47a3a43d6019c0df631_Reg4a.png?expiry=1658620800000&hmac=rX__IhYjQyRZNqr6-abWP3aLnbB2EsxqlnbGVuNvEbE>
- w0
- w1
- w2
- none of the above;
Q 5: Assume we fit the following quadratic function: f(x) = w0+w1*x+w2*(x^2) to the dataset shown (blue circles). The fitted function is shown by the green curve in the picture below. Out of the 3 parameters of the fitted function (w0, w1, w2), which ones are estimated to be 0? (Note: you must select all parameters estimated as 0 to get the question correct.)
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/Em2X3FvSEeWMhg7baGhc3w_3187d5cb269bf4e998d6f92493793a88_Reg4b.png?expiry=1658620800000&hmac=d8-AWZlSVy2LL00Fel3bLIJxrGraECEA1wnf176E_bs>
- w0
- w1
- w2
- none of the above
Q 6: Assume we fit the following quadratic function: f(x) = w0+w1*x+w2*(x^2) to the dataset shown (blue circles). The fitted function is shown by the green curve in the picture below. Out of the 3 parameters of the fitted function (w0, w1, w2), which ones are estimated to be 0? (Note: you must select all parameters estimated as 0 to get the question correct.)
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/LD5UH1vSEeWhtQ48PjS6Pw_9ac59a77ea836dd248a38ebde9f2d11f_Reg4c.png?expiry=1658620800000&hmac=tqED_QOOZtkR1F5aQCVc3pgO0Xu2HVtDF9_NMbvR1u0>
- w0
- w1
- w2
- none of the above
Q 7: Assume we fit the following quadratic function: f(x) = w0+w1*x+w2*(x^2) to the dataset shown (blue circles). The fitted function is shown by the green curve in the picture below. Out of the 3 parameters of the fitted function (w0, w1, w2), which ones are estimated to be 0? (Note: you must select all parameters estimated as 0 to get the question correct.)
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/RIlaI1vSEeWVtgr31Ad8Fw_b7f7b633af94820bc5992c6975d8dc4d_Reg4d.png?expiry=1658620800000&hmac=XBE_l2ZDn-T0jJQGlWbqmy6Hp4yVaI4vsp1nz2zLEn4>
- w0
- w1
- w2
- none of the above;
Q 8: Which of the following plots would you not expect to see as a plot of training and test error curves?
Answer:
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/yCCWHFvNEeWSuhJSxsy6bQ_33196504673be40e26a66fe9994b80f7_Reg5b.png?expiry=1658620800000&hmac=b0scdHXjrxJqdfyS4DxmTDVGOqfpnSuVk4xd9Trs4dQ>
Q 9: True or false: One always prefers to use a model with more features since it better captures the true underlying process.
- True
- False
Quiz 2: Predicting house prices
Q 1: Selection and summary statistics: We found the zip code with the highest average house price. What is the average house price of that zip code?
- $75,000
- $7,700,000
- $540,088
- $2,160,607;
Q 2: Filtering data: What fraction of the houses have living space between 2000 sq.ft. and 4000 sq.ft.?
- Between 0.2 and 0.29
- Between 0.3 and 0.39
- Between 0.4 and 0.49
- Between 0.5 and 0.59
- Between 0.6 and 0.69
Q 3: Building a regression model with several more features: What is the difference in RMSE between the model trained with my_features and the one trained with advanced_features?
- the RMSE of the model with advanced_features lower by less than $25,000
- the RMSE of the model with advanced_features lower by between $25,001 and $35,000
- the RMSE of the model with advanced_features lower by between $35,001 and $45,000
- the RMSE of the model with advanced_features lower by between $45,001 and $55,000
- the RMSE of the model with advanced_features lower by more than $55,000
Week 3: Machine Learning Foundations: A Case Study Approach Quiz Answer
Quiz 1: Classification;
Q 1: The simple threshold classifier for sentiment analysis described in the video (check all that apply):
- Must have pre-defined positive and negative attributes
- Must either count attributes equally or pre-define weights on attributes
- Defines a possibly non-linear decision boundary
Q 2: For a linear classifier classifying between “positive” and “negative” sentiment in a review x, Score(x) = 0 implies (check all that apply):
- The review is very clearly “negative”
- We are uncertain whether the review is “positive” or “negative”
- We need to retrain our classifier because an error has occurred
Q 3: For which of the following datasets would a linear classifier perform perfectly?
Answer:
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/D_IigVvQEeWVtgr31Ad8Fw_267aaadfe8ea97a30533a6712d23b0de_Class3b.png?expiry=1658620800000&hmac=OnudUFhXcrzCU-3gBveA322w4shtxUpBhScnIFSD1rE>
Q 4: True or false: High classification accuracy always indicates a good classifier.
- True;
- False
Q 5: True or false: For a classifier classifying between 5 classes, there always exists a classifier with accuracy greater than 0.18.
- True
- False
Q 6: True or false: A false negative is always worse than a false positive.
- True
- False
Q 7: Which of the following statements are true? (Check all that apply)
- Test error tends to decrease with more training data until a point, and then does not change (i.e., curve flattens out)
- Test error always goes to 0 with an unboundedly large training dataset
- Test error is never a function of the amount of training data
Quiz 2: Analyzing product sentiment;
Q 1: Out of the 11 words in selected_words, which one is most used in the reviews in the dataset?
- awesome
- love
- hate
- bad
- great
Q 2: Out of the 11 words in selected_words, which one is least used in the reviews in the dataset?
- wow
- amazing
- terrible
- awful
- love
Q 3: Out of the 11 words in selected_words, which one got the most positive weight in the selected_words_model?
(Tip: when printing the list of coefficients, make sure to use print_rows(rows=12) to print ALL coefficients.)
- amazing;
- awesome
- love
- fantastic
- terrible
Question 4: Out of the 11 words in selected_words, which one got the most negative weight in the selected_words_model?
(Tip: when printing the list of coefficients, make sure to use print_rows(rows=12) to print ALL coefficients.)
- horrible
- terrible
- awful
- hate
- love
Q 5: Which of the following ranges contains the accuracy of the selected_words_model on the test_data?
- 0.811 to 0.841
- 0.841 to 0.871
- 0.871 to 0.901
- 0.901 to 0.931
Q 6: Which of the following ranges contains the accuracy of the sentiment_model in the IPython Notebook from lecture on the test_data?
- 0.811 to 0.841
- 0.841 to 0.871
- 0.871 to 0.901
- 0.901 to 0.931
Q 7: Which of the following ranges contains the accuracy of the majority class classifier, which simply predicts the majority class on the test_data?
- 0.811 to 0.843
- 0.843 to 0.871
- 0.871 to 0.901
- 0.901 to 0.931;
Q 8: How do you compare the different learned models with the baseline approach where we are just predicting the majority class?
- They all performed about the same.
- The model learned using all words performed much better than the one using the only the selected_words. And, the model learned using the selected_words performed much better than just predicting the majority class.
- The model learned using all words performed much better than the other two. The other two approaches performed about the same.
- Predicting the simply majority class performed much better than the other two models.
Q 9: Which of the following ranges contains the ‘predicted_sentiment’ for the most positive review for ‘Baby Trend Diaper Champ’, according to the sentiment_model from the IPython Notebook from lecture?
- Below 0.7
- 0.7 to 0.8
- 0.8 to 0.9
- 0.9 to 1.0
Q 10: Consider the most positive review for ‘Baby Trend Diaper Champ’ according to the sentiment_model from the IPython Notebook from lecture. Which of the following ranges contains the predicted_sentiment for this review, if we use the selected_words_model to analyze it?;
- Below 0.7
- 0.7 to 0.8
- 0.8 to 0.9
- 0.9 to 1.0
Q 11: Why is the value of the predicted_sentiment for the most positive review found using the sentiment_model much more positive than the value predicted using the selected_words_model?
- The sentiment_model is just too positive about everything.
- The selected_words_model is just too negative about everything.
- This review was positive, but used too many of the negative words in selected_words.
- None of the selected_words appeared in the text of this review.
Week 4: Machine Learning Foundations: A Case Study Approach Quiz Answer
Quiz 1: Clustering and Similarity;
Q 1:A country, called Simpleland, has a language with a small vocabulary of just “the”, “on”, “and”, “go”, “round”, “bus”, and “wheels”. For a word count vector with indices ordered as the words appear above, what is the word count vector for a document that simply says “the wheels on the bus go round and round.”
Please enter the vector of counts as follows: If the counts were [“the”=1, “on”=3, “and”=2, “go”=1, “round”=2, “bus”=1, “wheels”=1], enter 1321211.
Answer: 21112111
Question 2: In Simpleland, a reader is enjoying a document with a representation: [1 3 2 1 2 1 1]. Which of the following articles would you recommend to this reader next?
- [7 0 2 1 0 0 1]
- [1 7 0 0 2 0 1]
- [1 0 0 0 7 1 2]
- [0 2 0 0 7 1 1]
Question 3: A corpus in Simpleland has 99 articles. If you pick one article and perform 1-nearest neighbor search to find the closest article to this query article, how many times must you compute the similarity between two articles?
- 98;
- 98*2 = 196
- 98/2 = 49
- (98)^2
- 99
Question 4: For the TF-IDF representation, does the relative importance of words in a document depend on the base of the logarithm used? For example, take the words “bus” and “wheels” in a particular document. Is the ratio between the TF-IDF values for “bus” and “wheels” different when computed using log base 2 versus log base 10?
- Yes
- No
Question 5:Which of the following statements are true? (Check all that apply):
- Deciding whether an email is spam or not spam using the text of the email and some spam / not spam labels is a supervised learning problem.
- Dividing emails into two groups based on the text of each email is a supervised learning problem.
- If we are performing clustering, we typically assume we either do not have or do not use class labels in training the model.
Question 6: Which of the following pictures represents the best k-means solution? (Squares represent observations, plus signs are cluster centers, and colors indicate assignments of observations to cluster centers.)
Answer
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/AW6FxVvVEeWzLwrzeFOkAw_3e7caa843845e525f9275753265c0900_Clust5b.png?expiry=1658620800000&hmac=XDtBpsTCunhlQ9O9-DRPncW6PNGZ83Dd9PQFRx1O-Go>
Quiz 2: Retrieving Wikipedia articles;
Q 1: Top word count words for Elton John
- (the, john, singer)
- (england, awards, musician)
- (the, in, and)
- (his, the, since)
- (rock, artists, best)
Question 2: Top TF-IDF words for Elton John
- (furnish,elton,billboard)
- (john,elton,fivedecade)
- (the,of,has)
- (awards,rock,john)
- (elton,john,singer)
Question 3: The cosine distance between ‘Elton John’s and ‘Victoria Beckham’s articles (represented with TF-IDF) falls within which range?
- 0.1 to 0.29;
- 0.3 to 0.49
- 0.5 to 0.69
- 0.7 to 0.89
- 0.9 to 1.0
Question 4: The cosine distance between ‘Elton John’s and ‘Paul McCartney’s articles (represented with TF-IDF) falls within which range?
- 0.1 to 0.29
- 0.3 to 0.49
- 0.5 to 0.69
- 0.7 to 0.89
- 0.9 to 1
Question 5: Who is closer to ‘Elton John’, ‘Victoria Beckham’ or ‘Paul McCartney’?
- Victoria Beckham
- Paul McCartney
Question 6: Who is the nearest cosine-distance neighbor to ‘Elton John’ using raw word counts?;
- Billy Joel
- Cliff Richard
- Roger Daltrey
- George Bush
Question 7: Who is the nearest cosine-distance neighbor to ‘Elton John’ using TF-IDF?
- Roger Daltrey
- Rod Stewart
- Tommy Haas
- Elvis Presley
Question 8: Who is the nearest cosine-distance neighbor to ‘Victoria Beckham’ using raw word counts?
- Stephen Dow Beckham
- Louis Molloy
- Adrienne Corri
- Mary Fitzgerald (artist);
Question 9: Who is the nearest cosine-distance neighbor to ‘Victoria Beckham’ using TF-IDF?
- Mel B
- Caroline Rush
- David Beckham
- Carrie Reichardt
Week 5: Machine Learning Foundations: A Case Study Approach Quiz Answer
Quiz 1: Recommender Systems
Q 1: Recommending items based on global popularity can (check all that apply):
- provide personalization
- capture context (e.g., time of day)
- none of the above
Question 2: Recommending items using a classification approach can (check all that apply):;
- provide personalization
- capture context (e.g., time of day)
- none of the above
Question 3:Recommending items using a simple count based co-occurrence matrix can (check all that apply):
- provide personalization
- capture context (e.g., time of day)
- none of the above
Question 4:Recommending items using featurized matrix factorization can (check all that apply):
- provide personalization
- capture context (e.g., time of day)
- none of the above
Question 5:Normalizing co-occurrence matrices is used primarily to account for:
- people who purchased many items
- items purchased by many;
- eliminating rare products
- none of the above
Question 6: A store has 3 customers and 3 products. Below are the learned feature vectors for each user and product. Based on this estimated model, which product would you recommend most highly to User #2?
User ID | Feature vector |
1 | (1.73, 0.01, 5.22) |
2 | (0.03, 4.41, 2.05) |
3 | (1.13, 0.89, 3.76) |
Product ID | Feature vector |
1 | (3.29, 3.44, 3.67) |
2 | (0.82, 9.71, 3.88) |
3 | (8.34, 1.72, 0.02) |
- Product #1
- Product #2
- Product #3;
Question 7: For the liked and recommended items displayed below, calculate the recall and round to 2 decimal points. (As in the lesson, green squares indicate recommended items, magenta squares are liked items. Items not recommended are grayed out for clarity.) Note: enter your answer in American decimal format (e.g. enter 0.98, not 0,98)
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/C0Ri1FvZEeWMhg7baGhc3w_290d82e965c33e663968151f43a71743_Rec8.png?expiry=1658620800000&hmac=ro8CVcehdzhMoZDhUaIZXJOqieK7dJ0XcGNb2DHCFzw>
Answer: 0.33
Question 8: For the liked and recommended items displayed below, calculate the precision and round to 2 decimal points. (As in the lesson, green squares indicate recommended items, magenta squares are liked items. Items not recommended are grayed out for clarity.) Note: enter your answer in American decimal format (e.g. enter 0.98, not 0,98)
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/QkZrJ1vZEeWZgBLZEKssZQ_f80562a68423c8ffe11565327abee8c8_Rec8.png?expiry=1658620800000&hmac=wdW97z3_apaxidVHhNYrLVtPmk6ryAf1fNgOSyvdLjw>
Answer: 0.25
Question 9: Based on the precision-recall curves in the figure below, which recommender would you use?
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/JaMj1VvYEeWSuhJSxsy6bQ_648fbff528d436fc414fd485af5cb56d_Rec9.png?expiry=1658620800000&hmac=TdvA-JDmDM9SVzTbUD9UEMPc-crG42GgkFl6spDyve8>
- RecSys #1
- RecSys #2
- RecSys #3
Quiz 2: Recommending songs
Question 1: Which of the artists below have had the most unique users listening to their songs?
- Kanye West
- Foo Fighters
- Taylor Swift
- Lady GaGa
Question 2: Which of the artists below is the most popular artist, the one with highest total listen_count, in the data set?
- Taylor Swift
- Kings of Leon
- Coldplay
- Lady GaGa
Question 3: Which of the artists below is the least popular artist, the one with smallest total listen_count, in the data set?
- William Tabbert
- Velvet Underground & Nico
- Kanye West
- The Cool Kids;
Week 6: Machine Learning Foundations: A Case Study Approach Quiz Answer
Quiz 1: Deep Learning
Question 1: Which of the following statements are true? (Check all that apply)
- Linear classifiers are never useful, because they cannot represent XOR.
- Linear classifiers are useful, because, with enough data, they can represent anything.
- Having good non-linear features can allow us to learn very accurate linear classifiers.
- none of the above
Question 2: A simple linear classifier can represent which of the following functions? (Check all that apply)
- x1 OR x2 OR NOT x3
- x1 AND x2 AND NOT x3
- x1 OR (x2 AND NOT x3)
- none of the above
Question 3: Which of the the following neural networks can represent the following function? Select all that apply.
(x1 AND x2) OR (NOT x1 AND NOT x2)
Answer:
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/FG_Wy1vaEeWhtQ48PjS6Pw_d8ed3b37fc1e16f793f6a3c7fbb1531b_Deep3d.png?expiry=1658620800000&hmac=Y13fXXF0RyLZ9QsOvSEhdLZ25HwPcUk6Ek3VVhjTCMs>
Question 4: Which of the following statements is true? (Check all that apply)
- Features in computer vision act like local detectors.
- Deep learning has had impact in computer vision, because it’s used to combine all the different hand-created features that already exist.
- By learning non-linear features, neural networks have allowed us to automatically learn detectors for computer vision.
- none of the above
Question 5: If you have lots of images of different types of plankton labeled with their species name, and lots of computational resources, what would you expect to perform better predictions:
- a deep neural network trained on this data.
- a simple classifier trained on this data, using deep features as input, which were trained using ImageNet data.
Question 6: If you have a few images of different types of plankton labeled with their species name, what would you expect to perform better predictions:
- a deep neural network trained on this data.
- a simple classifier trained on this data, using deep features as input, which were trained using ImageNet data.
Quiz 2: Deep features for image retrieval
Question 1: What’s the least common category in the training data?
- bird
- dog
- cat
- automobile
Question 2: Of the images below, which is the nearest ‘cat’ labeled image in the training data to the the first image in the test data (image_test[0:1])?
Answer:
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/xlEzz2DcEeW67AqL8VPUFQ_b58f25deeeb2bb4b4603fee6597ad3fd_cat_correct.png?expiry=1658620800000&hmac=Gn9tCJyaaZlS-Yj4IBx711HGqJQvdOTiJwrmA1cfM-I>
Question 3: Of the images below, which is the nearest ‘dog’ labeled image in the training data to the the first image in the test data (image_test[0:1])?
Answer:
<image: https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/2KmNYGDcEeWSthLJWZH1gw_302e98a3196d8bf12bf7be8950ad77dd_dog_correct.png?expiry=1658620800000&hmac=MwbQ389JZJvXqH8bPWBjWmZJa-z7vdqxsEXShL2XYCI>
Question 4: :For the first image in the test data, in what range is the mean distance between this image and its 5 nearest neighbors that were labeled ‘cat’ in the training data?
- 33 to 35
- 35 to 37
- 37 to 39
- 39 to 41
- Above 41
Question 5: For the first image in the test data, in what range is the mean distance between this image and its 5 nearest neighbors that were labeled ‘dog’ in the training data?
- 33 to 35
- 35 to 37;
- 37 to 39
- 39 to 41
- Above 41
Question 6: On average, is the first image in the test data closer to its 5 nearest neighbors in the ‘cat’ data or in the ‘dog’ data?
- cat
- dog
Question 7: In what range is the accuracy of the 1-nearest neighbor classifier at classifying ‘dog’ images from the test set?
- 50 to 60
- 60 to 70
- 70 to 80
- 80 to 90
- 90 to 100
Review:
Based on our knowledge, we urge you to enroll in this course so you can pick up new skills from specialists. It will be worthwhile, we trust.