The MongoDB Aggregation Framework Chapter 0 – Introduction and Aggregation Concepts
The Concept of Pipelines
Framework?
- Pipelines must consist of at least two stages.
- Stages cannot be configured to produce our desired output.
- Documents flow through the pipeline, passing from one stage to the
next - The Aggregation Framework provides us many stages to filter and
transform our data
Aggregation Structure and Syntax
- Only one expression per stage can be used.
- An aggregation pipeline is an array of stages.
- Some expressions can only be used in certain stages.
The MongoDB Aggregation Framework Chapter 1 – Basic Aggregation – $match and $project
$match: Filtering documents
- $match can only filter documents on one field.
- It uses the familiar MongoDB query language.
- It should come very early in an aggregation pipeline.
- $match can use both query operators and aggregation expressions.
Shaping documents with $project
- $project can only be used once within an Aggregation pipeline.
- $project cannot be used to assign new values to existing fields.
- Beyond simply removing and retaining fields, $project lets us add new fields.
- Once we specify a field to retain or perform some computation in a $project stage, we must specify all fields we wish to retain. The only exception to this is the _id field.
Optional Lab – Expressions with $project
- 1259
- 1595
- 1263
- 1597
The MongoDB Aggregation Framework Chapter 2 – Basic Aggregation – Utility Stages
Lab: SUsing Cursor-like Stages
- Recount
- The Heat
- Erin Brockovich
- Wrestling Ernest Hemingway
Lab: Setting Up the Vagrant Environment
- DMZ
- The Christmas Tree
- Twilight
- Avatar: The Last Airbender
The MongoDB Aggregation Framework Chapter 3 – Core Aggregation – Combining Information
- The collection specified in from cannot be sharded
- You can specify a collection in another database to from
- $lookup matches between localField and foreignField with an equality
match - Specifying an existing field name to as will overwrite the the
existing field
$graphLookup Introduction
all that apply
- Provides MongoDB with graph or graph-like capabilities
- $graphLookup provides MongoDB a transitive closure
implementation - $graphLookup depends on $lookup operator. Cannot be used without
$lookup - $lookup and $graphLookup stages require the exact same fields in
their specification. - $graphLookup is a new stage of the aggregation pipeline introduced in
MongoDB 3.2
$graphLookup: Simple Lookup
apply.
- connectToField will be used on recursive find operations
- as determines a collection where $graphLookup will store the stage
results - startWith indicates the index that should be use to execute the
recursive match - connectFromField value will be use to match connectToField in a
recursive match
$graphLookup: maxDepth and depthField
apply
- maxDepth only takes $long values
- maxDepth allows to specify the number of recursive lookups
- depthField determines a field, in the result document, which
specifies the number of recursive lookup needed to reach that
document
- depthField determines a field, which contains the value number of
documents matched by the recursive lookup
$graphLookup: General Considerations
pipeline.
- This is correct because of the recursive nature of $graphLookup we
want to save resources for last.
- This is incorrect. graphLookup needs to be the first element of the
pipeline, regardless of other stages needed to perform the desired
query.
- This is incorrect. $graphLookup can be used in any position of the
pipeline and acts in the same way as a regular $lookup.
- This is correct because $graphLookup pipes out the results of
recursive search into a collection, similar to $out stage.
The MongoDB Aggregation Framework | Chapter 4 – Core Aggregation – Multidimensional Grouping
Facets: Single Facet Query
queries?
Facets: Manual Buckets
and Infinity, and field2 is of type string, which of the following
stages are correct?
- {‘$bucket’: { ‘groupBy’: ‘$field1’, ‘boundaries’: [ 0.4, Infinity
]}} - {‘$bucket’: { ‘groupBy’: ‘$field1’, ‘boundaries’: [ “a”, 3, 5.5
]}} - {‘$bucket’: { ‘groupBy’: ‘$field2’, ‘boundaries’: [ “a”, “asdas”,
“z” ], ‘default’: ‘Others’}}
Facets: Auto Buckets
- adhere bucket boundaries to a numerical series set by the
granularity option. - randomly distributed documents accross arbitrarily defined bucket
boundaries. - given a number of buckets, try to distribute documents evenly
accross buckets. - count only documents that contain the groupBy field defined in the
documents.
Facets: Multiple Facets
- The $facet stage allows several sub-pipelines to be executed to
produce multiple facets.
- The $facet stage allows the application to generate several different
facets with one single database request.
- The output of the individual facetsub −
pipelinescanbesharedusingtheexpression $FACET.$.
- We can only use facets stages ($sortByCount, $bucket and
$bucketAuto) as sub-pipelines of $facet stage.
The MongoDB Aggregation Framework | Chapter 5 – Miscellaneous Aggregation
- $out will overwrite an existing collection if specified.
- $out removes all indexes when it overwrites a collection.
- If a pipeline with $out errors, you must delete the collection specified to the $out stage.
- Using $out within many sub-piplines of a $facet stage is a quick way to generate many differently shaped collections.
Views
- A view cannot be created that contains both horizontal and vertical slices.
- Inserting data into a view is slow because MongoDB must perform the pipeline in reverse.
- Views should be used cautiously because the documents they contain can grow incredibly large.
- View performance can be increased by creating the appropriate indexes on the source collection.
The MongoDB Aggregation Framework | Chapter 6 – Aggregation Performance and Pipeline Quiz Answer
Aggregation Performance
- You can increase index usage by moving $match stages to the end of your pipeline
- When $limit and $sort are close together a very performant top-k sort can be performed
- Passing allowDiskUsage to your aggregation queries will seriously increase their performance
- Transforming data in a pipeline stage prevents us from using indexes in the stages that follow
Aggregation Pipeline on a Sharded Cluster
- $lookup
- $out
- $group
Pipeline Optimization – Part 2
- The query in a $match stage can be entirely covered by an index
- The Aggregation Framework will automatically reorder stages in certain conditions
- The Aggregation Framework can automatically project fields if the shape of the final document is only dependent upon those fields in the input document.
- Causing a merge in a sharded deployment will cause all subsequent pipeline stages to be performed in the same location as the merge
The MongoDB Aggregation Framework Final Exam Quiz Answer
Final Exam Quiz
1 }},
“$field_b”}}},
“$_id”}}}
40.719296 ] },
}}
{“$sum”: 1}}}
“$arrayField”}]
“$categorized._id”}}],
{}}]
- Pipeline 3 executes correctly
- Pipeline 2 fails because we cannot project distance field
- Pipeline 3 fails since you can only have one $facet stage per
pipeline
- Pipeline 1 is incorrect because you can only have one $replaceRoot
stage in your pipeline
- Pipeline 1 fails since $out is required to be the last stage of the
pipeline
- Pipeline 2 is incorrect because $geoNear needs to be the first stage
of our pipeline
- Pipeline 3 fails because $indexStats must be the first stage in a
pipeline and may not be used within a $facet
mixed set of different expression accross the different stages:
}}
“$a”}}}
correct?
- Pipeline 3 is correct and will execute with no error
- Pipeline 2 fails because the $divide operator only supports numeric
types - Pipeline 1 will fail because $max can not operator on _id field
- Pipeline 2 is incorrect since $divide cannot operate over field
expressions - Pipeline 1 is incorrect because you cannot use an accumulator
expression in a $match stage.
ISODate(“2017-10-04T18:35:44.011Z”) }
ISODate(“2017-10-04T18:35:44.014Z”) }
ISODate(“2017-10-04T18:35:44.015Z”) }
ISODate(“2017-10-04T18:35:44.016Z”) }
ISODate(“2017-10-04T18:35:44.019Z”) }
ISODate(“2017-10-04T18:35:44.020Z”) }
ISODate(“2017-10-04T18:35:44.020Z”) }
ISODate(“2017-10-04T18:35:44.020Z”) }
ISODate(“2017-10-04T18:35:44.020Z”) }
ISODate(“2017-10-04T18:35:44.020Z”) }
}
Jordan”, “Hulda Fuller” ], “word” : “PYJF” }
“$name”, ” ” ] }, 1]},
“$name”}, -1]},
[{“$toUpper”:”$name”}, 3, 2]} },
[ {“$split”: [ “$name”, ” ” ] }, 1]}, 0, 1 ] },
“$name”}, -1]},
“$$this”] }
},
executed.
“$facet_1._id” } } }],
{ groupBy: “$_id”, buckets: 2 } }]
stages or/and expressions being used.
pipeline?
- a $multiply expression takes a document as input, not an array.
- can not nest a $facet stage as a sub-pipeline.
- $sortByCount cannot be used within $facet stage.
- facet_2 uses the output of a parallel sub-pipeline, facet_1, to
compute an expression - a $type expression does not take a string as its value; only the
BSON numeric values can be specified to identify the types.
markets they want to target in the USA. We have a collection with all
the major cities (more than 100,000 inhabitants) from all over the
World with recorded number of sunny days for some of the last
years.