The MongoDB Aggregation Framework Chapter 0 – Introduction and Aggregation Concepts

The Concept of Pipelines

Which of the following is true about pipelines and the Aggregation
Framework?

Pipelines must consist of at least two stages.
Stages cannot be configured to produce our desired output.
Documents flow through the pipeline, passing from one stage to the
next
The Aggregation Framework provides us many stages to filter and
transform our data

Aggregation Structure and Syntax

Which of the following statements is true?

Only one expression per stage can be used.
An aggregation pipeline is an array of stages.
Some expressions can only be used in certain stages.

The MongoDB Aggregation Framework Chapter 1 – Basic Aggregation – $match and $project

$match: Filtering documents

Which of the following is/are true of the $match stage?

$match can only filter documents on one field.
It uses the familiar MongoDB query language.
It should come very early in an aggregation pipeline.
$match can use both query operators and aggregation expressions.

Shaping documents with $project

Which of the following statements are true of the $project stage?

$project can only be used once within an Aggregation pipeline.
$project cannot be used to assign new values to existing fields.
Beyond simply removing and retaining fields, $project lets us add new fields.
Once we specify a field to retain or perform some computation in a $project stage, we must specify all fields we wish to retain. The only exception to this is the _id field.

Optional Lab – Expressions with $project

Let’s find how many movies in our movies collection are a “labor of love”, where the same person appears in cast, directors, and writers

Note that you may have a dataset that has duplicate entries for some films. Don’t worry if you count them few times, meaning you should not try to find those duplicates.

To get a count after you have defined your pipeline, there are two simple methods.

// add the $count stage to the end of your pipeline

// you will learn about this stage shortly!

db.movies.aggregate([

{$stage1},

{$stage2},

…$stageN,

{ $count: “labors of love” }

])

// or use itcount()

db.movies.aggregate([

{$stage1},

{$stage2},

{…$stageN},

]).itcount()

How many movies are “labors of love”?

1259
1595
1263
1597

The MongoDB Aggregation Framework Chapter 2 – Basic Aggregation – Utility Stages

Lab: SUsing Cursor-like Stages

MongoDB has another movie night scheduled. This time, we polled employees for their favorite actress or actor, and got these results

favorites = [

“Sandra Bullock”,

“Tom Hanks”,

“Julia Roberts”,

“Kevin Spacey”,

“George Clooney”]

For movies released in the USA with a tomatoes.viewer.rating greater than or equal to 3, calculate a new field called num_favs that represets how many favorites appear in the cast field of the movie.

Sort your results by num_favs, tomatoes.viewer.rating, and title, all in descending order.

What is the title of the 25th film in the aggregation result?

Recount
The Heat
Erin Brockovich
Wrestling Ernest Hemingway

Lab: Setting Up the Vagrant Environment

Calculate an average rating for each movie in our collection where English is an available language, the minimum imdb.rating is at least 1, the minimum imdb.votes is at least 1, and it was released in 1990 or after. You’ll be required to rescale (or normalize) imdb.votes. The formula to rescale imdb.votes and calculate normalized_rating is included as a handout.

What film has the lowest normalized_rating?

DMZ
The Christmas Tree
Twilight
Avatar: The Last Airbender

The MongoDB Aggregation Framework Chapter 3 – Core Aggregation – Combining Information

The $lookup Stage

Which of the following statements is true about the $lookup stage?

The collection specified in from cannot be sharded
You can specify a collection in another database to from
$lookup matches between localField and foreignField with an equality
match
Specifying an existing field name to as will overwrite the the
existing field

$graphLookup Introduction

Which of the following statements apply to $graphLookup operator? check
all that apply

Provides MongoDB with graph or graph-like capabilities
$graphLookup provides MongoDB a transitive closure
implementation
$graphLookup depends on $lookup operator. Cannot be used without
$lookup
$lookup and $graphLookup stages require the exact same fields in
their specification.
$graphLookup is a new stage of the aggregation pipeline introduced in
MongoDB 3.2

$graphLookup: Simple Lookup

Which of the following statements is/are correct? Check all that
apply.

connectToField will be used on recursive find operations
as determines a collection where $graphLookup will store the stage
results
startWith indicates the index that should be use to execute the
recursive match
connectFromField value will be use to match connectToField in a
recursive match

$graphLookup: maxDepth and depthField

Which of the following statements are incorrect? Check all that
apply

maxDepth only takes $long values

maxDepth allows to specify the number of recursive lookups

depthField determines a field, in the result document, which
specifies the number of recursive lookup needed to reach that
document

depthField determines a field, which contains the value number of
documents matched by the recursive lookup

$graphLookup: General Considerations

Consider the following statement:

“$graphLookup“ is required to be the last element on the
pipeline.

Which of the following is true about the statement?

This is correct because of the recursive nature of $graphLookup we
want to save resources for last.

This is incorrect. graphLookup needs to be the first element of the
pipeline, regardless of other stages needed to perform the desired
query.

This is incorrect. $graphLookup can be used in any position of the
pipeline and acts in the same way as a regular $lookup.

This is correct because $graphLookup pipes out the results of
recursive search into a collection, similar to $out stage.

The MongoDB Aggregation Framework | Chapter 4 – Core Aggregation – Multidimensional Grouping

Facets: Single Facet Query

Which of the following aggregation pipelines are single facet
queries?

[

{“$match”: { “$text”: {“$search”: “network”}}},

{“$sortByCount”: “$offices.city”},

]

[

{“$unwind”: “$offices”},

{“$project”: { “_id”: “$name”, “hq”: “$offices.city”}},

{“$sortByCount”: “$hq”},

{“$sort”: {“_id”:-1}},

{“$limit”: 100}

]

[

{“$match”: { “$text”: {“$search”: “network”}}},

{“$unwind”: “$offices”},

{“$sort”: {“_id”:-1}}

]

Facets: Manual Buckets

Assuming that field1 is composed of double values, ranging between 0
and Infinity, and field2 is of type string, which of the following
stages are correct?

{‘$bucket’: { ‘groupBy’: ‘$field1’, ‘boundaries’: [ 0.4, Infinity
]}}
{‘$bucket’: { ‘groupBy’: ‘$field1’, ‘boundaries’: [ “a”, 3, 5.5
]}}
{‘$bucket’: { ‘groupBy’: ‘$field2’, ‘boundaries’: [ “a”, “asdas”,
“z” ], ‘default’: ‘Others’}}

Facets: Auto Buckets

Auto Bucketing will …

adhere bucket boundaries to a numerical series set by the
granularity option.
randomly distributed documents accross arbitrarily defined bucket
boundaries.
given a number of buckets, try to distribute documents evenly
accross buckets.
count only documents that contain the groupBy field defined in the
documents.

Facets: Multiple Facets

Which of the following statement(s) apply to the $facet stage?

The $facet stage allows several sub-pipelines to be executed to
produce multiple facets.

The $facet stage allows the application to generate several different
facets with one single database request.

The output of the individual facetsub −
pipelinescanbesharedusingtheexpression $FACET.$.

We can only use facets stages ($sortByCount, $bucket and
$bucketAuto) as sub-pipelines of $facet stage.

The MongoDB Aggregation Framework | Chapter 5 – Miscellaneous Aggregation

The $out Stage

Which of the following statements is true regarding the $out stage?

$out will overwrite an existing collection if specified.
$out removes all indexes when it overwrites a collection.
If a pipeline with $out errors, you must delete the collection specified to the $out stage.
Using $out within many sub-piplines of a $facet stage is a quick way to generate many differently shaped collections.

Views

Which of the following statements are true regarding MongoDB Views?

A view cannot be created that contains both horizontal and vertical slices.
Inserting data into a view is slow because MongoDB must perform the pipeline in reverse.
Views should be used cautiously because the documents they contain can grow incredibly large.
View performance can be increased by creating the appropriate indexes on the source collection.

The MongoDB Aggregation Framework | Chapter 6 – Aggregation Performance and Pipeline Quiz Answer

Aggregation Performance

With regards to aggregation performance, which of the following are true?

You can increase index usage by moving $match stages to the end of your pipeline

When $limit and $sort are close together a very performant top-k sort can be performed

Passing allowDiskUsage to your aggregation queries will seriously increase their performance

Transforming data in a pipeline stage prevents us from using indexes in the stages that follow

Aggregation Pipeline on a Sharded Cluster

What operators will cause a merge stage on the primary shard for a database?

$lookup

$out

$group

Pipeline Optimization – Part 2

Which of the following statements is/are true?

The query in a $match stage can be entirely covered by an index

The Aggregation Framework will automatically reorder stages in certain conditions

The Aggregation Framework can automatically project fields if the shape of the final document is only dependent upon those fields in the input document.
Causing a merge in a sharded deployment will cause all subsequent pipeline stages to be performed in the same location as the merge

The MongoDB Aggregation Framework Final Exam Quiz Answer

Final Exam Quiz

Question 1)

Consider the following aggregation pipelines:

Pipeline 1

db.coll.aggregate([

{“$match”: {“field_a”: {“$gt”: 1983}}},

{“$project”: { “field_a”: “$field_a.1”, “field_b”: 1, “field_c”:
1 }},

{“$replaceRoot”:{“newRoot”: {“_id”: “$field_c”, “field_b”:
“$field_b”}}},

{“$out”: “coll2”},

{“$match”: {“_id.field_f”: {“$gt”: 1}}},

{“$replaceRoot”:{“newRoot”: {“_id”: “$field_b”, “field_c”:
“$_id”}}}

])

Pipeline 2

db.coll.aggregate([

{“$match”: {“field_a”: {“$gt”: 111}}},

{“$geoNear”: {

“near”: { “type”: “Point”, “coordinates”: [ -73.99279 ,
40.719296 ] },

“distanceField”: “distance”}},

{“$project”: { “distance”: “$distance”, “name”: 1, “_id”: 0
}}

])

Pipeline 3

db.coll.aggregate([

{

“$facet”: {

“averageCount”: [

{“$unwind”: “$array_field”},

{“$group”: {“_id”: “$array_field”, “count”:
{“$sum”: 1}}}

“categorized”: [{“$sortByCount”:
“$arrayField”}]

{

“$facet”: {

“new_shape”: [{“$project”: {“range”:
“$categorized._id”}}],

“stats”: [{“$match”: {“range”: 1}}, {“$indexStats”:
{}}]

}

])

Which of the following statements are correct?

Pipeline 3 executes correctly

Pipeline 2 fails because we cannot project distance field

Pipeline 3 fails since you can only have one $facet stage per
pipeline

Pipeline 1 is incorrect because you can only have one $replaceRoot
stage in your pipeline

Pipeline 1 fails since $out is required to be the last stage of the
pipeline

Pipeline 2 is incorrect because $geoNear needs to be the first stage
of our pipeline

Pipeline 3 fails because $indexStats must be the first stage in a
pipeline and may not be used within a $facet

Question 2)

Consider the following collection:

db.collection.find()

{

“a”: [1, 34, 13]

}

The following pipelines are executed on top of this collection, using a
mixed set of different expression accross the different stages:

Pipeline 1

db.collection.aggregate([

{“$match”: { “a” : {“$sum”: 1} }},

{“$project”: { “_id” : {“$addToSet”: “$a”} }},

{“$group”: { “_id” : “”, “max_a”: {“$max”: “$_id”} }}

])

Pipeline 2

db.collection.aggregate([

{“$project”: { “a_divided” : {“$divide”: [“$a”, 1]}
}}

])

Pipeline 3

db.collection.aggregate([

{“$project”: {“a”: {“$max”: “$a”}}},

{“$group”: {“_id”: “$$ROOT._id”, “all_as”: {“$sum”:
“$a”}}}

])

Given these pipelines, which of the following statements are
correct?

Pipeline 3 is correct and will execute with no error
Pipeline 2 fails because the $divide operator only supports numeric
types
Pipeline 1 will fail because $max can not operator on _id field
Pipeline 2 is incorrect since $divide cannot operate over field
expressions
Pipeline 1 is incorrect because you cannot use an accumulator
expression in a $match stage.

Question 3)

Consider the following collection documents:

db.people.find()

{ “_id” : 0, “name” : “Bernice Pope”, “age” : 69, “date” :
ISODate(“2017-10-04T18:35:44.011Z”) }

{ “_id” : 1, “name” : “Eric Malone”, “age” : 57, “date” :
ISODate(“2017-10-04T18:35:44.014Z”) }

{ “_id” : 2, “name” : “Blanche Miller”, “age” : 35, “date” :
ISODate(“2017-10-04T18:35:44.015Z”) }

{ “_id” : 3, “name” : “Sue Perez”, “age” : 64, “date” :
ISODate(“2017-10-04T18:35:44.016Z”) }

{ “_id” : 4, “name” : “Ryan White”, “age” : 39, “date” :
ISODate(“2017-10-04T18:35:44.019Z”) }

{ “_id” : 5, “name” : “Grace Payne”, “age” : 56, “date” :
ISODate(“2017-10-04T18:35:44.020Z”) }

{ “_id” : 6, “name” : “Jessie Yates”, “age” : 53, “date” :
ISODate(“2017-10-04T18:35:44.020Z”) }

{ “_id” : 7, “name” : “Herbert Mason”, “age” : 37, “date” :
ISODate(“2017-10-04T18:35:44.020Z”) }

{ “_id” : 8, “name” : “Jesse Jordan”, “age” : 47, “date” :
ISODate(“2017-10-04T18:35:44.020Z”) }

{ “_id” : 9, “name” : “Hulda Fuller”, “age” : 25, “date” :
ISODate(“2017-10-04T18:35:44.020Z”) }

And the aggregation pipeline execution result:

db.people.aggregate(pipeline)

{ “_id” : 8, “names” : [ “Sue Perez” ], “word” : “P” }

{ “_id” : 9, “names” : [ “Ryan White” ], “word” : “W” }

{ “_id” : 10, “names” : [ “Eric Malone”, “Grace Payne” ], “word” : “MP”
}

{ “_id” : 11, “names” : [ “Bernice Pope”, “Jessie Yates”, “Jesse
Jordan”, “Hulda Fuller” ], “word” : “PYJF” }

{ “_id” : 12, “names” : [ “Herbert Mason” ], “word” : “M” }

{ “_id” : 13, “names” : [ “Blanche Miller” ], “word” : “M” }

Which of the following pipelines generates the output result?

var pipeline = [{

“$project”: {

“surname”: { “$arrayElemAt”: [ {“$split”: [
“$name”, ” ” ] }, 1]},

“name_size”: { “$add” : [{“$strLenCP”:
“$name”}, -1]},

“name”:1

}

{

“$group”: {

“_id”: “$name_size”,

“word”: { “$addToSet”: {“$substr”:
[{“$toUpper”:”$name”}, 3, 2]} },

“names”: {“$push”: “$surname”}

}

{

“$sort”: {“_id”: -1}

}

]

“`

[X]

var pipeline = [{

“$project”: {

“surname_capital”: { “$substr”: [{“$arrayElemAt”:
[ {“$split”: [ “$name”, ” ” ] }, 1]}, 0, 1 ] },

“name_size”: { “$add” : [{“$strLenCP”:
“$name”}, -1]},

“name”: 1

}

{

“$group”: {

“_id”: “$name_size”,

“word”: { “$push”: “$surname_capital” },

“names”: {“$push”: “$name”}

}

{

“$project”: {

“word”: {

“$reduce”: {

“input”: “$word”,

“initialValue”: “”,

“in”: { “$concat”: [“$$value”,
“$$this”] }

}

“names”: 1

}

{

“$sort”: { “_id”: 1}

}

]

“`

var pipeline = [{

“$sort”: { “date”: 1 }

{

“$group”: {

“_id”: { “$size”: { “$split”: [“$name”, ” “]}
},

“names”: {“$push”: “$name”}

}

{

“$project”: {

“word”: {

“$zip”: {

“inputs”: [“$names”],

“useLongestLength”: false,

}

“names”: 1

}

}]

“`

Question 4)

$facet is an aggregation stage that allows for sub-pipelines to be
executed.

var pipeline = [

{

$match: { a: { $type: “int” } }

{

$project: {

_id: 0,

a_times_b: { $multiply: [“$a”, “$b”] }

}

{

$facet: {

facet_1: [{ $sortByCount: “a_times_b” }],

facet_2: [{ $project: { abs_facet1: { $abs:
“$facet_1._id” } } }],

facet_3: [

{

$facet: {

facet_3_1: [{ $bucketAuto:
{ groupBy: “$_id”, buckets: 2 } }]

}

]

}

]

In the above pipeline, which uses $facet, there are some incorrect
stages or/and expressions being used.

Which of the following statements point out errors in the
pipeline?

a $multiply expression takes a document as input, not an array.
can not nest a $facet stage as a sub-pipeline.
$sortByCount cannot be used within $facet stage.
facet_2 uses the output of a parallel sub-pipeline, facet_1, to
compute an expression
a $type expression does not take a string as its value; only the
BSON numeric values can be specified to identify the types.

Question 5)

Consider a company producing solar panels and looking for the next
markets they want to target in the USA. We have a collection with all
the major cities (more than 100,000 inhabitants) from all over the
World with recorded number of sunny days for some of the last
years.

A sample document looks like the following:

db.cities.findOne()

{

“_id”: 10,

“city”: “San Diego”,

“region”: “CA”,

“country”: “USA”,

“sunnydays”: [220, 232, 205, 211, 242, 270]

}

The collection also has these indexes:

db.cities.getIndexes()

[

{

“v”: 2,

“key”: {

“_id”: 1

“name”: “_id_”,

“ns”: “test.cities”

{

“v”: 2,

“key”: {

“city”: 1

“name”: “city_1”,

“ns”: “test.cities”

{

“v”: 2,

“key”: {

“country”: 1

“name”: “country_1”,

“ns”: “test.cities”

}

]

M121 The MongoDB Aggregation Framework ALL Chapters

The MongoDB Aggregation Framework Chapter 0 – Introduction and Aggregation Concepts

The Concept of Pipelines

Aggregation Structure and Syntax

The MongoDB Aggregation Framework Chapter 1 – Basic Aggregation – $match and $project

$match: Filtering documents

Shaping documents with $project

Optional Lab – Expressions with $project

The MongoDB Aggregation Framework Chapter 2 – Basic Aggregation – Utility Stages

Lab: SUsing Cursor-like Stages

Lab: Setting Up the Vagrant Environment

The MongoDB Aggregation Framework Chapter 3 – Core Aggregation – Combining Information

$graphLookup Introduction

$graphLookup: Simple Lookup

$graphLookup: maxDepth and depthField

$graphLookup: General Considerations

The MongoDB Aggregation Framework | Chapter 4 – Core Aggregation – Multidimensional Grouping

Facets: Single Facet Query

Facets: Manual Buckets

Facets: Auto Buckets

Facets: Multiple Facets

The MongoDB Aggregation Framework | Chapter 5 – Miscellaneous Aggregation

Views

The MongoDB Aggregation Framework | Chapter 6 – Aggregation Performance and Pipeline Quiz Answer

Aggregation Pipeline on a Sharded Cluster

Pipeline Optimization – Part 2

The MongoDB Aggregation Framework Final Exam Quiz Answer

Statistical Inference All Weeks Quiz Answer

Coursera R Programming ALL Weeks Quiz & Answer

Rust (Programming Language) LinkedIn Skill Assessment Quiz Answer

Robotics: Aerial Robotics Coursera Quiz Answers – 100% Correct Answers

Project Planning: Putting It All Together Week 4 Quiz Answer

Reproducible Research All Weeks Quiz Answer

Regression Models Coursera Quiz Answers ALL WEEKS

Coursera: Introduction to Marketing All weeks Quiz Answer

Private Equity and Venture Capital Coursera All Quiz Answer