M201 MongoDB Performance

김경민·2022년 6월 11일
1
post-thumbnail

Chapter 1: Introduction

Introduction to MongoDB Performance

Which of the following statements is/are true?
-> Indexes will be a major topic covered in this course.

Chapter 1: Introduction
Hardware Considerations & Configurations Part 1

Chapter 1: Introduction
Hardware Considerations & Configurations Part 2

Regarding the performance implications, which of the following statements are correct?

  • Disk RAID architecture can impact the performance of your MongoDB deployment.
  • CPU availability impacts the performance of MongoDB, especially if using the WiredTiger storage engine.

Chapter 1: Introduction
Lab 1.1: Install Course Tools and Datasets

mongodb+srv://<username>:<password>@cluster0-jxeqq.mongodb.net/test

https://www.mongodb.com/docs/database-tools/?_ga=2.176235042.688064568.1654342381-173819729.1653806116

.\mongoimport.exe --drop -c people --uri mongodb+srv://formin:password@cluster0.tpsxlft.mongodb.net/test people.json
mongodb+srv://<username>:<password>@cluster0-jxeqq.mongodb.net/test
$ # Example for my Atlas cluster, you will need to change the
$ # user, password and Atlas cluster name
$ mongoimport --drop -c people --uri 
mongodb+srv://theusername:thepassword@m201-nhtgi.mongodb.net/m201 people.json
$ mongoimport --drop -c restaurants --uri 
mongodb+srv://theusername:thepassword@m201-nhtgi.mongodb.net/m201 restaurants.json  
mongodb://localhost:27017
$ # Example for my local cluster running on port 27017
$ mongoimport --drop -c people --uri mongodb://localhost:27017/m201 people.json
$ mongoimport --drop -c restaurants --uri 
mongodb://localhost:27017/m201 restaurants.json
> use m201
> db.people.count({ "email" : {"$exists": 1} })

Chapter 2: MongoDB Indexes

Introduction to Indexes

https://www.mongodb.com/docs/manual/indexes/?jmp=university

Which of the following statements regarding indexes are true?

  • Indexes are used to increase the speed of our queries.
  • The _id field is automatically indexed on all collections.
  • Indexes reduce the number of documents MongoDB needs to examine to satisfy a query.
  • Indexes can decrease write, update, and delete performance.

Chapter 2: MongoDB Indexes
How Data is Stored on Disk

https://www.mongodb.com/docs/manual/faq/storage/?jmp=university

Chapter 2: MongoDB Indexes
Single Field Indexes Part 2

https://www.mongodb.com/docs/manual/core/index-single/?jmp=university

Which of the following queries can use an index on the zip field?

- db.addresses.find( { zip : 55555 } )

Chapter 2: MongoDB Indexes
Understanding Explain

If you observe the following lines in the output of explain('executionStats'), what can you deduce?

"executionStats" : {
  "executionSuccess" : true,
  "nReturned" : 23217,
  "executionTimeMillis" : 91,
  "totalKeysExamined" : 23217,
  "totalDocsExamined" : 23217,
  "executionStages" : {
    "stage" : "SORT",
    "nReturned" : 23217,
    "executionTimeMillisEstimate" : 26,
    "works" : 46437,
    "advanced" : 23217,
    "needTime" : 23219,
    "needYield" : 0,
    "saveState" : 363,
    "restoreState" : 363,
    "isEOF" : 1,
    "sortPattern" : {
        "stars" : 1
    },
    "memUsage" : 32522511,
    "memLimit" : 33554432,
  • The index selected for the query was not useful for the sort part of the query
  • An in-memory sort was performed
  • The system came very close to throwing an exception instead of returning results

Chapter 2: MongoDB Indexes
Understanding Explain for Sharded Clusters

With the output of an explain command, what can you deduce?

  • The index used by the chosen plan
  • If a sort was performed by walking the index or done in memory
  • All the different stages the query needs to go through with details about the time it takes, the number of documents processed and returned to the next stage in the pipeline

Chapter 2: MongoDB Indexes
Sorting with Indexes

https://www.mongodb.com/docs/manual/tutorial/sort-results-with-indexes/?jmp=university

Given the following schema for the products collection:

{
"_id": ObjectId,
"product_name": String,
"product_id": String
}

And the following index on the products collection:

{ product_id: 1 }

Which of the following queries will use the given index to perform the sorting of the returned documents?

db.products.find({ product_id: '57d7a1' }).sort({ product_id: -1 })
db.products.find({ product_name: 'Soap' }).sort({ product_id: 1 })
db.products.find({}).sort({ product_id: 1 })
db.products.find({}).sort({ product_id: -1 })

Chapter 2: MongoDB Indexes
Querying on Compound Indexes Part 1

Chapter 2: MongoDB Indexes
Querying on Compound Indexes Part 2

https://www.mongodb.com/docs/manual/core/index-compound/?jmp=university

https://www.mongodb.com/docs/manual/tutorial/create-indexes-to-support-queries/?jmp=university

Chapter 2: MongoDB Indexes
When you can sort with Indexes

When you can sort with Indexes

  • Index prefixes can be used in query predicates to increase index utilization.
  • Index prefixes can be used in sort predicates to prevent in-memory sorts.
  • We can invert the keys of an index in our sort predicate to utilize an index by walking it backwards.

Chapter 2: MongoDB Indexes
Multikey Indexes

https://www.mongodb.com/docs/manual/core/index-multikey/?jmp=university

Given the following index:

{ name: 1, emails: 1 }

When the following document is inserted, how many index entries will be created?

{
"name": "Beatrice McBride",
"age": 26,
"emails": [
  "puovvid@wamaw.kp",
  "todujufo@zoehed.mh",
  "fakmir@cebfirvot.pm"
]
}

-> 3

Chapter 2: MongoDB Indexes
Partial Indexes

https://www.mongodb.com/docs/manual/core/index-partial/?jmp=university

Which of the following is true regarding partial indexes?

  • Partial indexes represent a superset of the functionality of sparse indexes.
  • Partial indexes can be used to reduce the number of keys in an index.
  • Partial indexes support compound indexes.

Chapter 2: MongoDB Indexes
Text Indexes

Which other type of index is mostly closely related to text indexes?

  • Multi-key indexes

Chapter 2: MongoDB Indexes
Collations

https://www.mongodb.com/docs/manual/reference/collation/?jmp=university

Which of the following statements are true regarding collations on indexes?

  • Collations allow the creation of case insensitive indexes
  • We can define specific collations in an index

Chapter 2: MongoDB Indexes
Wildcard Index Type: Part 1

Chapter 2: MongoDB Indexes
Wildcard Index Type: Part 2

Using the wildcardProjection flag with Wildcard Indexes, we can:

  • exclude a set of fields from the Wildcard Index.
  • include a set of fields in the Wildcard Index.

Chapter 2: MongoDB Indexes
Wildcard Index Use Cases

Which of the following are good reasons to use a Wildcard Index?

  • The query pattern on documents of a collection is unpredictable.
  • An application consistently queries against document fields that use the Attribute Pattern.

Chapter 2: MongoDB Indexes
Lab 2.1: Using Indexes to Sort

In this lab you're going to determine which queries are able to successfully use a given index for both filtering and sorting.

Given the following index:

{ "first_name": 1, "address.state": -1, "address.city": -1, "ssn": 1 }

Which of the following queries are able to use it for both filtering and sorting?

- db.people.find({ "first_name": "Jessica" }).sort({ "address.state": 1, "address.city": 1 })
- db.people.find({ "first_name": "Jessica", "address.state": { $lt: "S"} }).sort({ "address.state": 1 })
- db.people.find({ "address.state": "South Dakota", "first_name": "Jessica" }).sort({ "address.city": -1 })

Chapter 2: MongoDB Indexes
Lab 2.2: Optimizing Compound Indexes

In this lab you're going to examine several example queries and determine which compound index will best service them.

> db.people.find({
"address.state": "Nebraska",
"last_name": /^G/,
"job": "Police officer"
})
> db.people.find({
"job": /^P/,
"first_name": /^C/,
"address.state": "Indiana"
}).sort({ "last_name": 1 })
> db.people.find({
"address.state": "Connecticut",
"birthday": {
"$gte": ISODate("2010-01-01T00:00:00.000Z"),
"$lt": ISODate("2011-01-01T00:00:00.000Z")
}
})

If you had to build one index on the people collection, which of the following indexes would best service all 3 queries?

-> { "address.state": 1, "last_name": 1, "job": 1 }

Chapter 3: Index Operations

Building Indexes

https://www.mongodb.com/docs/manual/core/index-creation/?jmp=university

Which of the following are true of index build operations?

  • MongoDB now only has one index buid type available

Chapter 3: Index Operations
Query Plans

https://www.mongodb.com/docs/manual/core/query-plans/?jmp=university

Which of the following is/are true concerning query plans?

  • Query plans are cached so that plans do not need to be generated and compared against each other every time a query is executed.

Chapter 3: Index Operations
Forcing Indexes with Hint

What is the method that forces MongoDB to use a particular index?

  • hint()

Chapter 3: Index Operations
Resource Allocation for Indexes Part 1

Chapter 3: Index Operations
Resource Allocation for Indexes Part 2

Chapter 3: Index Operations
Resource Allocation for Indexes Part 3

https://en.wikipedia.org/wiki/B-tree

Which of the following statements apply to index resource allocation?

  • For the fastest processing, we should ensure that our indexes fit entirely in RAM
  • Indexes are not required to be entirely placed in RAM, however performance will be affected by constant disk access to retrieve index information.

Chapter 3: Index Operations
Basic Benchmarking Part 1

Chapter 3: Index Operations
Basic Benchmarking Part 2

What type of strategy and tools should we be using to performance benchmark a MongoDB installation?

  • Publicly available tools, including correct database variations

In this lab you're going to determine which index was used to satisfy a query given its explain output.

> var exp = db.restaurants.explain("executionStats")

> exp.find({ "address.state": "NY", stars: { $gt: 3, $lt: 4 } }).sort({ name: 1 }).hint(REDACTED)

Which resulted in the following output:

{
"queryPlanner": {
"plannerVersion": 1,
"namespace": "m201.restaurants",
"indexFilterSet": false,
"parsedQuery": "REDACTED",
"winningPlan": {
"stage": "SORT",
"sortPattern": {
"name": 1
},
"inputStage": {
"stage": "SORT_KEY_GENERATOR",
"inputStage": {
"stage": "FETCH",
"inputStage": {
"stage": "IXSCAN",
"keyPattern": "REDACTED",
"indexName": "REDACTED",
"isMultiKey": false,
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 1,
"direction": "forward",
"indexBounds": "REDACTED"
}
}
}
},
"rejectedPlans": [ ]
},
"executionStats": {
"executionSuccess": true,
"nReturned": 3335,
"executionTimeMillis": 20,
"totalKeysExamined": 3335,
"totalDocsExamined": 3335,
"executionStages": "REDACTED"
},
"serverInfo": "REDACTED",
"ok": 1
}

Given the redacted explain output above, select the index that was passed to hint.

- { "address.state": 1, "stars": 1, "name": 1 }

Chapter 4: CRUD Optimization

Optimizing your CRUD Operations

https://www.mongodb.com/docs/manual/tutorial/create-indexes-to-support-queries/?jmp=university

https://www.mongodb.com/docs/manual/tutorial/sort-results-with-indexes/?jmp=university

https://www.mongodb.com/docs/manual/tutorial/create-queries-that-ensure-selectivity/?jmp=university

When building indexes to service your queries, which of the following is the general rule of thumb you should keep when ordering your index keys?

Note, use the following definitions to for this question:

  • equality: indexed fields on which our queries will do equality matching
  • range: indexed fields on which our queries will have a range condition
  • sort: indexed fields on which our queries will sort on

-> equality, sort, range

Chapter 4: CRUD Optimization
Covered Queries

https://www.mongodb.com/docs/manual/core/query-optimization/?jmp=university

Given the following indexes:

{ _id: 1 }
{ name: 1, dob: 1 }
{ hair: 1, name: 1 }

Which of the following queries could be covered by one of the given indexes?

- db.example.find( { name : { $in : [ "Bart", "Homer" ] } }, {_id : 0, dob : 1, name : 1} )

Chapter 4: CRUD Optimization
Regex Performance

Given the following index:

> db.products.createIndex({ productName: 1 })

And the following query:

> db.products.find({ productName: /^Craftsman/ })

-> The query will do an index scan.

Chapter 4: CRUD Optimization
Aggregation Performance

https://www.mongodb.com/docs/manual/core/query-plans/

With regards to aggregation performance, which of the following are true?

  • When $limit and $sort are close together a very performant top-k sort can be performed
  • Transforming data in a pipeline stage prevents us from using indexes in the stages that follow

Chapter 4: CRUD Optimization
Lab 4.1: Equality, Sort, Range

In this lab you're going to use the equality, sort, range rule to determine which index best supports a given query.

Given the following query:

db.accounts.find( { accountBalance : { $gte : NumberDecimal(100000.00) }, city: "New York" } )

.sort( { lastName: 1, firstName: 1 } )

Which of the following indexes best supports this query with regards to the equality, sort, range rule.

-> { city: 1, lastName: 1, firstName: 1, accountBalance: 1 }

Chapter 4: CRUD Optimization
Lab 4.2: Aggregation Performance

For this lab, you're going to create an index so that the following aggregation query can be executed successfully.

We assume you have imported the restaurants dataset in your cluster. If not, it is attached to this lesson, so you can import it following the instructions in the Install Course Tools and Datasets lesson in Chapter 1.

Before you work on your solution, delete all indexes for the restaurants dataset:

db.restaurants.dropIndexes()

Then, run the following query and you will receive an error.

db.restaurants.aggregate([
{ $match: { stars: { $gt: 2 } } },
{ $sort: { stars: 1 } },
{ $group: { _id: "$cuisine", count: { $sum: 1 } } }
])
{
"ok": 0,
"errmsg": "Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.",
"code": 16819,
"codeName": "Location16819"
}

The task of this lab is to identify why this error is occuring and build an index to resolve the issue.

Keep in mind that there might be several indexes that resolve this error, but we're looking for an index with the smallest number of fields that resolves the error and services this command.

In the text box below, submit the index that resolves the issue.

For example, if you ran:

db.restaurants.createIndex( { foobar: 1 } )

to create the index that fixes the error, then you would enter the following document which lists the fields indexed into the text box:

{ foobar: 1 }

-> { stars: 1 }

Chapter 5: Performance on Clusters

Performance Considerations in Distributed Systems Part 1

Chapter 5: Performance on Clusters
Performance Considerations in Distributed Systems Part 2

https://www.mongodb.com/docs/manual/core/distributed-queries/?jmp=university

https://www.mongodb.com/docs/manual/core/sharded-cluster-query-router/#how-mongos-handles-query-modifiers

From a performance standpoint, when working with a distributed database it's important to consider...

  • Latency
  • Routed Queries

Chapter 5: Performance on Clusters
Increasing Write Performance with Sharding Part 1

Chapter 5: Performance on Clusters
Increasing Write Performance with Sharding Part 2

https://www.mongodb.com/docs/manual/core/distributed-queries/

https://www.mongodb.com/docs/manual/core/bulk-write-operations/?jmp=university

Which of the following is/are true?

  • Picking a good shard key is one of the most important parts of sharding.

Chapter 5: Performance on Clusters
Reading from Secondaries

https://www.mongodb.com/docs/manual/core/read-preference/?jmp=university

When should you ever read from secondaries?

  • To provide reads with lower latency.
  • When doing ad-hoc queries and analytic jobs.

Chapter 5: Performance on Clusters
Replica Sets with Differing Indexes Part 1

Chapter 5: Performance on Clusters
Replica Sets with Differing Indexes Part 2

Chapter 5: Performance on Clusters
Replica Sets with Differing Indexes Part 3

Which of the following conditions apply when creating indexes on secondaries?

  • A secondary should never be allowed to become primary

Chapter 5: Performance on Clusters
Aggregation Pipeline on a Sharded Cluster

https://www.mongodb.com/docs/manual/core/aggregation-pipeline-sharded-collections/?jmp=university

What operators will cause a merge stage on the primary shard for a database?

  • $out
  • $lookup

Final Exam

Question 1

Which of these statements is/are true?

  • Creating an ascending index on a monotonically increasing value creates index keys on the right-hand side of the index tree.

Final Exam
Question 2

Which of the following statements is/are true?

  • It's important to ensure that your shard key has high cardinality.
  • Indexes can decrease insert throughput.
  • It's important to ensure that secondaries with indexes that differ from the primary not be eligible to become primary.
  • Indexes can be walked backwards by inverting their keys in a sort predicate.
  • Partial indexes can be used to reduce the size requirements of the indexes.

Final Exam
Question 3

Which of the following statements is/are true?

  • Collations can be used to create case insensitive indexes.
  • By default, all MongoDB user-created collections have an _id index.
  • It's common practice to co-locate your mongos on the same machine as your application to reduce latency.

Final Exam
Question 4

Which of the following statements is/are true?

  • Indexes can solve the problem of slow queries.
  • Indexes are fast to search because they're ordered such that you can find target values with few comparisons.

Final Exam
Question 5

Which of the following statements is/are true?

  • Compound indexes can service queries that filter on a prefix of the index keys.
  • Query plans are removed from the plan cache on index creation, destruction, or server restart.
  • If no indexes can be used then a collection scan will be necessary.

Final Exam
Question 6

Which of the following statements is/are true?

  • The ideal ratio between nReturned and totalKeysExamined is 1.
  • You can use the --wiredTigerDirectoryForIndexes option to place your indexes on a different disk than your data.
  • An index doesn't become multikey until a document is inserted that has an array value.

Final Exam
Question 7

Given the following indexes:

{ categories: 1, price: 1 }
{ in_stock: 1, price: 1, name: 1 }

The following documents:

{ price: 2.99, name: "Soap", in_stock: true, categories: ['Beauty', 'Personal Care'] }

{ price: 7.99, name: "Knife", in_stock: false, categories: ['Outdoors'] }

And the following queries:

db.products.find({ in_stock: true, price: { $gt: 1, $lt: 5 } }).sort({ name: 1 })

db.products.find({ in_stock: true })

db.products.find({ categories: 'Beauty' }).sort({ price: 1 })

Which of the following is/are true?

  • Index #1 would provide a sort to query #3.
  • Index #2 can be used by both query #1 and #2.

0개의 댓글