Photo by Denny Müller on Unsplash

Understand Indexing in MongoDB

Over Simplified Explanation

Arnav Zek
3 min readAug 2, 2020

--

Indexing is just a way of taking one or more fields of a table and restructuring it in a binary tree and loading it on the RAM. So that table can be queried with speed.

Overview

The idea of indexing in MongoDB is similar to the index of any book, They increase the speed of finding a page. The index in MongoDB increases the speed of finding documents

How do indexes work?

First, let’s understand how you declare the index in MongoDB

//for creatingcollectionName.createIndex({field: 1/-1/"text" }) index//Ascending = -1 
//Descending = 1
//For text based search = text
//for removing index
collectionName.dropIndex({field:value})

How does it work under the hood?

Imagine a collection of users, each document containing various fields, one such field is “score”.

Let’s say we want all users who scored 23.

When no index exists, MongoDB goes through each document to find the queried document (This is called as “Collection scan” / COLLSCAN )

How can we optimize this?

To optimize we will ask MongoDB to create an index, MongoDB will create a table with one column for score and another column for documentID. Technically Index is not a table but almost similar to Binary Tree.

Then MongoDB will load this Index on RAM. Now MongoDB only needs to scan this table rather than scanning the whole database. This is much faster.

So, for sake of simplicity, An Index is a binary tree stored in RAM with only the values of the particular column being queried.

MongoDB narrows the dataset using the Index. (This is called Index Scan / IXNSCAN)

Here is a visual representation of a score Index and its mapping.

Note: Performance improvement by Index is only visible when the number of documents crosses 100K.

You can compare it yourself by comparing two queries one with an indexed field and one without an Index

db.<collection name>.find(query).explain('executionStat')//object.winingPlan.stage will tell the type of scan COLLSCAN or IXNSCAN

executionTimeMillis will tell you the time taken in the execution

Downsides of Index

Indexes are not free, Indexes cost space. Indexes may increase read speed but whenever something is written, the index needs to be updated to resolve this we use B- tree we do some calculations before insertion so that it is faster.

B-Tree

Indexes are not like tables, conceptually it is actually a custom implementation of B-Tree (Binary Tree). SQL databases also use the similar kind of data structure for indexing.

This following Video best explains B Tree.

https://www.youtube.com/watch?v=aZjYr87r1b8

--

--