M121 The MongoDB Aggregation Framework ALL Chapters

The MongoDB Aggregation Framework Chapter 0 – Introduction and Aggregation Concepts

The Concept of Pipelines

Which of the following is true about pipelines and the Aggregation
Framework?
  • Pipelines must consist of at least two stages.
  • Stages cannot be configured to produce our desired output.
  • Documents flow through the pipeline, passing from one stage to the
    next
  • The Aggregation Framework provides us many stages to filter and
    transform our data

Aggregation Structure and Syntax

Which of the following statements is true?
  • Only one expression per stage can be used.
  • An aggregation pipeline is an array of stages.
  • Some expressions can only be used in certain stages.

The MongoDB Aggregation Framework Chapter 1 – Basic Aggregation – $match and $project

 

$match: Filtering documents

Which of the following is/are true of the $match stage?
  • $match can only filter documents on one field.
  • It uses the familiar MongoDB query language.
  • It should come very early in an aggregation pipeline.
  • $match can use both query operators and aggregation expressions.
 

Shaping documents with $project

Which of the following statements are true of the $project stage?
  • $project can only be used once within an Aggregation pipeline.
  • $project cannot be used to assign new values to existing fields.
  • Beyond simply removing and retaining fields, $project lets us add new fields.
  • Once we specify a field to retain or perform some computation in a $project stage, we must specify all fields we wish to retain. The only exception to this is the _id field.
 
 
 

Optional Lab – Expressions with $project

Let’s find how many movies in our movies collection are a “labor of love”, where the same person appears in cast, directors, and writers
 
Note that you may have a dataset that has duplicate entries for some films. Don’t worry if you count them few times, meaning you should not try to find those duplicates.
 
To get a count after you have defined your pipeline, there are two simple methods.
 
// add the $count stage to the end of your pipeline
// you will learn about this stage shortly!
db.movies.aggregate([
  {$stage1},
  {$stage2},
  …$stageN,
  { $count: “labors of love” }
])
 
// or use itcount()
db.movies.aggregate([
  {$stage1},
  {$stage2},
  {…$stageN},
]).itcount()
 
How many movies are “labors of love”?
 
  • 1259
  • 1595
  • 1263
  • 1597

The MongoDB Aggregation Framework Chapter 2 – Basic Aggregation – Utility Stages

Lab: SUsing Cursor-like Stages

MongoDB has another movie night scheduled. This time, we polled employees for their favorite actress or actor, and got these results
 
favorites = [
  “Sandra Bullock”,
  “Tom Hanks”,
  “Julia Roberts”,
  “Kevin Spacey”,
  “George Clooney”]
 
For movies released in the USA with a tomatoes.viewer.rating greater than or equal to 3, calculate a new field called num_favs that represets how many favorites appear in the cast field of the movie.
 
Sort your results by num_favs, tomatoes.viewer.rating, and title, all in descending order.
 
What is the title of the 25th film in the aggregation result?
  • Recount
  • The Heat
  • Erin Brockovich
  • Wrestling Ernest Hemingway
 
 

Lab: Setting Up the Vagrant Environment

Calculate an average rating for each movie in our collection where English is an available language, the minimum imdb.rating is at least 1, the minimum imdb.votes is at least 1, and it was released in 1990 or after. You’ll be required to rescale (or normalize) imdb.votes. The formula to rescale imdb.votes and calculate normalized_rating is included as a handout.
 
What film has the lowest normalized_rating?
  • DMZ
  • The Christmas Tree
  • Twilight
  • Avatar: The Last Airbender

 

 

The MongoDB Aggregation Framework Chapter 3 – Core Aggregation – Combining Information

The $lookup Stage
Which of the following statements is true about the $lookup stage?
  • The collection specified in from cannot be sharded
  • You can specify a collection in another database to from
  • $lookup matches between localField and foreignField with an equality
    match
  • Specifying an existing field name to as will overwrite the the
    existing field
 
 

$graphLookup Introduction

Which of the following statements apply to $graphLookup operator? check
all that apply
  • Provides MongoDB with graph or graph-like capabilities
  • $graphLookup provides MongoDB a transitive closure
    implementation
  • $graphLookup depends on $lookup operator. Cannot be used without
    $lookup
  • $lookup and $graphLookup stages require the exact same fields in
    their specification.
  • $graphLookup is a new stage of the aggregation pipeline introduced in
    MongoDB 3.2
 

$graphLookup: Simple Lookup

Which of the following statements is/are correct? Check all that
apply.
  • connectToField will be used on recursive find operations
  • as determines a collection where $graphLookup will store the stage
    results
  • startWith indicates the index that should be use to execute the
    recursive match
  • connectFromField value will be use to match connectToField in a
    recursive match
 
 

$graphLookup: maxDepth and depthField

Which of the following statements are incorrect? Check all that
apply
  • maxDepth only takes $long values
  • maxDepth allows to specify the number of recursive lookups
  • depthField determines a field, in the result document, which
    specifies the number of recursive lookup needed to reach that
    document
  • depthField determines a field, which contains the value number of
    documents matched by the recursive lookup
 
 

$graphLookup: General Considerations

Consider the following statement:
 
“$graphLookup“ is required to be the last element on the
pipeline.
 
Which of the following is true about the statement?
  • This is correct because of the recursive nature of $graphLookup we
    want to save resources for last.
  • This is incorrect. graphLookup needs to be the first element of the
    pipeline, regardless of other stages needed to perform the desired
    query.
  • This is incorrect. $graphLookup can be used in any position of the
    pipeline and acts in the same way as a regular $lookup.
  • This is correct because $graphLookup pipes out the results of
    recursive search into a collection, similar to $out stage.

The MongoDB Aggregation Framework | Chapter 4 – Core Aggregation – Multidimensional Grouping

 
 

Facets: Single Facet Query

Which of the following aggregation pipelines are single facet
queries?
[
  {“$match”: { “$text”: {“$search”: “network”}}},
  {“$sortByCount”: “$offices.city”},
]
[
  {“$unwind”: “$offices”},
  {“$project”: { “_id”: “$name”, “hq”: “$offices.city”}},
  {“$sortByCount”: “$hq”},
  {“$sort”: {“_id”:-1}},
  {“$limit”: 100}
]
 
[
  {“$match”: { “$text”: {“$search”: “network”}}},
  {“$unwind”: “$offices”},
  {“$sort”: {“_id”:-1}}
]
 

Facets: Manual Buckets

Assuming that field1 is composed of double values, ranging between 0
and Infinity, and field2 is of type string, which of the following
stages are correct?
  • {‘$bucket’: { ‘groupBy’: ‘$field1’, ‘boundaries’: [ 0.4, Infinity
    ]}}
  • {‘$bucket’: { ‘groupBy’: ‘$field1’, ‘boundaries’: [ “a”, 3, 5.5
    ]}}
  • {‘$bucket’: { ‘groupBy’: ‘$field2’, ‘boundaries’: [ “a”, “asdas”,
    “z” ], ‘default’: ‘Others’}}
 
 

Facets: Auto Buckets

Auto Bucketing will …
  • adhere bucket boundaries to a numerical series set by the
    granularity option.
  • randomly distributed documents accross arbitrarily defined bucket
    boundaries.
  • given a number of buckets, try to distribute documents evenly
    accross buckets.
  • count only documents that contain the groupBy field defined in the
    documents.
 
 

Facets: Multiple Facets

Which of the following statement(s) apply to the $facet stage?
  • The $facet stage allows several sub-pipelines to be executed to
    produce multiple facets.
  • The $facet stage allows the application to generate several different
    facets with one single database request.
  • The output of the individual facetsub −
    pipelinescanbesharedusingtheexpression $FACET.$.
  • We can only use facets stages ($sortByCount, $bucket and
    $bucketAuto) as sub-pipelines of $facet stage.

The MongoDB Aggregation Framework | Chapter 5 – Miscellaneous Aggregation

 

The $out Stage
Which of the following statements is true regarding the $out stage?
  • $out will overwrite an existing collection if specified.
  • $out removes all indexes when it overwrites a collection.
  • If a pipeline with $out errors, you must delete the collection specified to the $out stage.
  • Using $out within many sub-piplines of a $facet stage is a quick way to generate many differently shaped collections.
 
 

Views

Which of the following statements are true regarding MongoDB Views?
  • A view cannot be created that contains both horizontal and vertical slices.
  • Inserting data into a view is slow because MongoDB must perform the pipeline in reverse.
  • Views should be used cautiously because the documents they contain can grow incredibly large.
  • View performance can be increased by creating the appropriate indexes on the source collection.

 

 

The MongoDB Aggregation Framework | Chapter 6 – Aggregation Performance and Pipeline Quiz Answer

 

Aggregation Performance

With regards to aggregation performance, which of the following are true?
  • You can increase index usage by moving $match stages to the end of your pipeline
  • When $limit and $sort are close together a very performant top-k sort can be performed
  • Passing allowDiskUsage to your aggregation queries will seriously increase their performance
  • Transforming data in a pipeline stage prevents us from using indexes in the stages that follow

Aggregation Pipeline on a Sharded Cluster

What operators will cause a merge stage on the primary shard for a database?
  • $lookup
  • $out
  • $group

Pipeline Optimization – Part 2

Which of the following statements is/are true?
  • The query in a $match stage can be entirely covered by an index
  • The Aggregation Framework will automatically reorder stages in certain conditions
  • The Aggregation Framework can automatically project fields if the shape of the final document is only dependent upon those fields in the input document.
  • Causing a merge in a sharded deployment will cause all subsequent pipeline stages to be performed in the same location as the merge

 

 

The MongoDB Aggregation Framework Final Exam Quiz Answer

Final Exam Quiz

 
Question 1)
Consider the following aggregation pipelines:
 
Pipeline 1
db.coll.aggregate([
  {“$match”: {“field_a”: {“$gt”: 1983}}},
  {“$project”: { “field_a”: “$field_a.1”, “field_b”: 1, “field_c”:
1  }},
  {“$replaceRoot”:{“newRoot”: {“_id”: “$field_c”, “field_b”:
“$field_b”}}},
  {“$out”: “coll2”},
  {“$match”: {“_id.field_f”: {“$gt”: 1}}},
  {“$replaceRoot”:{“newRoot”: {“_id”: “$field_b”, “field_c”:
“$_id”}}}
])
 
Pipeline 2
db.coll.aggregate([
  {“$match”: {“field_a”: {“$gt”: 111}}},
  {“$geoNear”: {
    “near”: { “type”: “Point”, “coordinates”: [ -73.99279 ,
40.719296 ] },
    “distanceField”: “distance”}},
  {“$project”: { “distance”: “$distance”, “name”: 1, “_id”: 0
}}
])
 
Pipeline 3
db.coll.aggregate([
  {
    “$facet”: {
      “averageCount”: [
        {“$unwind”: “$array_field”},
        {“$group”: {“_id”: “$array_field”, “count”:
{“$sum”: 1}}}
      ],
      “categorized”: [{“$sortByCount”:
“$arrayField”}]
    },
  },
  {
    “$facet”: {
      “new_shape”: [{“$project”: {“range”:
“$categorized._id”}}],
      “stats”: [{“$match”: {“range”: 1}}, {“$indexStats”:
{}}]
    }
  }
])
 
Which of the following statements are correct?
  • Pipeline 3 executes correctly
  • Pipeline 2 fails because we cannot project distance field
  • Pipeline 3 fails since you can only have one $facet stage per
    pipeline
  • Pipeline 1 is incorrect because you can only have one $replaceRoot
    stage in your pipeline
  • Pipeline 1 fails since $out is required to be the last stage of the
    pipeline
  • Pipeline 2 is incorrect because $geoNear needs to be the first stage
    of our pipeline
  • Pipeline 3 fails because $indexStats must be the first stage in a
    pipeline and may not be used within a $facet
 
 
Question 2)
Consider the following collection:
 
db.collection.find()
{
  “a”: [1, 34, 13]
}
The following pipelines are executed on top of this collection, using a
mixed set of different expression accross the different stages:
 
Pipeline 1
db.collection.aggregate([
  {“$match”: { “a” : {“$sum”: 1}  }},
  {“$project”: { “_id” : {“$addToSet”: “$a”}  }},
  {“$group”: { “_id” : “”, “max_a”: {“$max”: “$_id”}  }}
])
 
Pipeline 2
db.collection.aggregate([
    {“$project”: { “a_divided” : {“$divide”: [“$a”, 1]}
}}
])
 
Pipeline 3
db.collection.aggregate([
    {“$project”: {“a”: {“$max”: “$a”}}},
    {“$group”: {“_id”: “$$ROOT._id”, “all_as”: {“$sum”:
“$a”}}}
])

Given these pipelines, which of the following statements are
correct?
  • Pipeline 3 is correct and will execute with no error
  • Pipeline 2 fails because the $divide operator only supports numeric
    types
  • Pipeline 1 will fail because $max can not operator on _id field
  • Pipeline 2 is incorrect since $divide cannot operate over field
    expressions
  • Pipeline 1 is incorrect because you cannot use an accumulator
    expression in a $match stage.
 
 
Question 3)
Consider the following collection documents:
db.people.find()
{ “_id” : 0, “name” : “Bernice Pope”, “age” : 69, “date” :
ISODate(“2017-10-04T18:35:44.011Z”) }
{ “_id” : 1, “name” : “Eric Malone”, “age” : 57, “date” :
ISODate(“2017-10-04T18:35:44.014Z”) }
{ “_id” : 2, “name” : “Blanche Miller”, “age” : 35, “date” :
ISODate(“2017-10-04T18:35:44.015Z”) }
{ “_id” : 3, “name” : “Sue Perez”, “age” : 64, “date” :
ISODate(“2017-10-04T18:35:44.016Z”) }
{ “_id” : 4, “name” : “Ryan White”, “age” : 39, “date” :
ISODate(“2017-10-04T18:35:44.019Z”) }
{ “_id” : 5, “name” : “Grace Payne”, “age” : 56, “date” :
ISODate(“2017-10-04T18:35:44.020Z”) }
{ “_id” : 6, “name” : “Jessie Yates”, “age” : 53, “date” :
ISODate(“2017-10-04T18:35:44.020Z”) }
{ “_id” : 7, “name” : “Herbert Mason”, “age” : 37, “date” :
ISODate(“2017-10-04T18:35:44.020Z”) }
{ “_id” : 8, “name” : “Jesse Jordan”, “age” : 47, “date” :
ISODate(“2017-10-04T18:35:44.020Z”) }
{ “_id” : 9, “name” : “Hulda Fuller”, “age” : 25, “date” :
ISODate(“2017-10-04T18:35:44.020Z”) }
And the aggregation pipeline execution result:
db.people.aggregate(pipeline)
{ “_id” : 8, “names” : [ “Sue Perez” ], “word” : “P” }
{ “_id” : 9, “names” : [ “Ryan White” ], “word” : “W” }
{ “_id” : 10, “names” : [ “Eric Malone”, “Grace Payne” ], “word” : “MP”
}
{ “_id” : 11, “names” : [ “Bernice Pope”, “Jessie Yates”, “Jesse
Jordan”, “Hulda Fuller” ], “word” : “PYJF” }
{ “_id” : 12, “names” : [ “Herbert Mason” ], “word” : “M” }
{ “_id” : 13, “names” : [ “Blanche Miller” ], “word” : “M” }
Which of the following pipelines generates the output result?
var pipeline = [{
    “$project”: {
      “surname”: { “$arrayElemAt”: [ {“$split”: [
“$name”, ” ” ] }, 1]},
      “name_size”: {  “$add” : [{“$strLenCP”:
“$name”}, -1]},
      “name”:1
    }
  },
  {
    “$group”: {
      “_id”: “$name_size”,
      “word”: { “$addToSet”: {“$substr”:
[{“$toUpper”:”$name”}, 3, 2]} },
      “names”: {“$push”: “$surname”}
    }
  },
  {
    “$sort”: {“_id”: -1}
  }
]
“`
“`
[X]
var pipeline = [{
    “$project”: {
      “surname_capital”: { “$substr”: [{“$arrayElemAt”:
[ {“$split”: [ “$name”, ” ” ] }, 1]}, 0, 1 ] },
      “name_size”: {  “$add” : [{“$strLenCP”:
“$name”}, -1]},
      “name”: 1
    }
  },
  {
    “$group”: {
      “_id”: “$name_size”,
      “word”: { “$push”: “$surname_capital” },
      “names”: {“$push”: “$name”}
    }
  },
  {
    “$project”: {
      “word”: {
        “$reduce”: {
          “input”: “$word”,
          “initialValue”: “”,
          “in”: { “$concat”: [“$$value”,
“$$this”] }
        }
      },
      “names”: 1
    }
  },
  {
    “$sort”: { “_id”: 1}
  }
]
“`
“`
var pipeline = [{
    “$sort”: { “date”: 1 }
  },
  {
    “$group”: {
      “_id”: { “$size”: { “$split”: [“$name”, ” “]}
},
      “names”: {“$push”: “$name”}
    }
  },
  {
    “$project”: {
      “word”: {
        “$zip”: {
          “inputs”: [“$names”],
          “useLongestLength”: false,
        }
      },
      “names”: 1
    }
  }]
“`
Question 4)
$facet is an aggregation stage that allows for sub-pipelines to be
executed.
var pipeline = [
  {
    $match: { a: { $type: “int” } }
  },
  {
    $project: {
      _id: 0,
      a_times_b: { $multiply: [“$a”, “$b”] }
    }
  },
  {
    $facet: {
      facet_1: [{ $sortByCount: “a_times_b” }],
      facet_2: [{ $project: { abs_facet1: { $abs:
“$facet_1._id” } } }],
      facet_3: [
        {
          $facet: {
            facet_3_1: [{ $bucketAuto:
{ groupBy: “$_id”, buckets: 2 } }]
          }
        }
      ]
    }
  }
]
In the above pipeline, which uses $facet, there are some incorrect
stages or/and expressions being used.
 
Which of the following statements point out errors in the
pipeline?
  • a $multiply expression takes a document as input, not an array.
  • can not nest a $facet stage as a sub-pipeline.
  • $sortByCount cannot be used within $facet stage.
  • facet_2 uses the output of a parallel sub-pipeline, facet_1, to
    compute an expression
  • a $type expression does not take a string as its value; only the
    BSON numeric values can be specified to identify the types.
Question 5)
Consider a company producing solar panels and looking for the next
markets they want to target in the USA. We have a collection with all
the major cities (more than 100,000 inhabitants) from all over the
World with recorded number of sunny days for some of the last
years.
 
A sample document looks like the following:
 
 
db.cities.findOne()
{
“_id”: 10,
“city”: “San Diego”,
“region”: “CA”,
“country”: “USA”,
“sunnydays”: [220, 232, 205, 211, 242, 270]
}
 
 
The collection also has these indexes:
 
db.cities.getIndexes()
[
{
  “v”: 2,
  “key”: {
    “_id”: 1
  },
  “name”: “_id_”,
  “ns”: “test.cities”
},
{
  “v”: 2,
  “key”: {
    “city”: 1
  },
  “name”: “city_1”,
  “ns”: “test.cities”
},
{
  “v”: 2,
  “key”: {
    “country”: 1
  },
  “name”: “country_1”,
  “ns”: “test.cities”
}
]

Read more

Statistical Inference All Weeks Quiz Answer

Coursera Cerficate Course_1200_630

Practice Exercise Quiz 1     Question 1) Consider influenza epidemics for two parent heterosexual families.  Suppose that the probability is 17% that at least one of the parents has contracted the disease.   The probability that the father has contracted influenza is 12% while the probability that both the mother and father have contracted … Read more

Coursera R Programming ALL Weeks Quiz & Answer

Coursera Cerficate Course_1200_630

R Programming Week 1 Quiz Answer     Question 1) R was developed by statisticians working at… The University of Auckland     Question 2) The definition of free software consists of four freedoms (freedoms 0 through 3). Which of the following is NOT one of the freedoms that are part of the definition? The … Read more

Robotics: Aerial Robotics Coursera Quiz Answers – 100% Correct Answers

Coursera Cerficate Course_1200_630

All Weeks Robotics: Aerial Robotics Coursera Quiz Answers Robotics: Aerial Robotics Week 1 Quiz Answers Quiz 1: 1.1 Answers Q1. Which of these factors has NOT contributed to the rapidly-increasing commercial interest in multi-rotor vehicles? Mechanical simplicity Ability to hover in mid air Inexpensive components Efficiency in forward flight Q2. In how many ways can … Read more

Project Planning: Putting It All Together Week 4 Quiz Answer

Coursera Cerficate Course_1200_630

Project Planning: Putting It All Together Week 4 Quiz Answer Weekly Challenge 4 Question 1) Fill in the blank: The process of identifying and evaluating potential risks and issues that could impact a project is known as _____.   risk identification risk mitigation risk analysis risk management   Question 2) When should project managers engage … Read more

Reproducible Research All Weeks Quiz Answer

Coursera Cerficate Course_1200_630

Practice Exercise Quiz 1 Question 1) Suppose I conduct a study and publish my findings. Which of the following is an example of a replication of my study? I take my own data, analyze it again, and publish new findings. An investigator at another institution conducts a study addressing a different scientific question and publishes … Read more

Coursera: Introduction to Marketing All weeks Quiz Answer

Coursera Cerficate Course_1200_630

  Week 1 Quiz Answers   Q1) Here is an example of a positioning statement for Volvo: “For upscale American families, Volvo is the family automobile that offers maximum safety.” In this example, what serves as the frame of reference for the positioning statement? Unsafe cars The claim of maximum safety Upscale American families Volvos … Read more