What is MongoDB Sharding?: A Comprehensive Guide

Table of Contents

Toggle

Introduction

Sharding helps distribute the data across multiple machines. If an increase in the size of the data or dealing with large databases, it becomes challenging for the single server to handle it. This article guides what Sharding is in MongoDB, its benefits, and setting up a MongoDB Sharding. In this article, we’ll also learn about MongoDB Count.

What is MongoDB Sharding

MongoDB Sharding is MongoDB’s approach to meeting the demands of data growth by recording data across multiple machines to share the database load among different servers. MongoDB sharding helps support deployment with enormous datasets and high throughput operations.

A MongoDB Sharded Cluster is the set of nodes comprising a sharded MongoDB deployment, and it consists of three components:-

Shards – A Shard holds a subset of the sharded data and the combination of the multiple Shards forms a complete dataset. Each Shard can be deployed as a replica set to provide high availability and data consistency.
Mongos – The Mongos act as the query router providing an interface between the client applications and the sharded cluster.
Config Servers – Config Servers store the metadata and configuration settings for the cluster.

What is MongoDB Count?

MongoDB Count is used in counting the number of documents. It returns the number of records that match the selection criteria. It takes two arguments, the first one is the selection criteria, and the second one is optional.

Syntax for MongoDB Count:-

db.Collection_Name.count(

Selection_criteria,

{

limit: <integer>,

skip: <integer>,

hint: <string or document>,

maxTimeMS : <integer>,

readConcern: <string>,

collation: <document>

})

In case we want to count the number of documents in the collection, then use this syntax:-

db.Collection_name.count()

Benefits of MongoDB Sharding:-

Sharding allows users to scale up our database to handle the increased loads and thus offers various benefits to users, let’s have a look at some of the benefits of MongoDB Sharding:-

Increased Read/Write Throughput – MongoDB Sharding spreads the read and writes workload among the shards in a sharded cluster by distributing the data set across multiple shards. Each Shard has its processing speed, and both read and write throughput can be increased by adding more shards in the cluster.

High Availability – Deployment of the config servers and the shards as replica sets provide increased availability. The sharded cluster can perform partial read and writes even when one or more shard replica sets are unavailable.

Storage Capacity – Sharding allows us to store the subset of the total sharded data into different shards. Thus, we can use additional shards to increase the full storage capacity with the data increase.

Data Locality – With the help of Zone Sharding, you can easily create distributed databases to support geographically distributed apps, and the policies enforce data residency within specific regions. Each zone can have one or more shards.

How to implement MongoDB Sharding?

Let’s look at some of the actions that you need to perform to implement MongoDB Sharding:-

Set Up the Config Server

Each config server replica set can have any number of MongoDB processes (up to 50), and for each of those, you will need to start it with the –configsvr option.

E.g.:-

mongod –configsvr –replSet <configReplSetName> –dbpath <path> –port 27019 –bind_ip localhost,<hostname(s)|ip address(es)>

Now connect to just one of the replica set members from there.

mongo –host <hostname> –port 27019

Now run rs. initiate() on just one of the replica set members.

rs.initiate(

{

_id: “<configReplSetName>”,

configsvr: true,

members: [

{ _id : 0, host : “<cfg1.example.net:27019>” },

{ _id : 1, host : “<cfg2.example.net:27019>” },

{ _id : 2, host : “<cfg3.example.net:27019>” }

]

}

)

Set Up Shards

With the config server set up and running, we can create the shards, and the process will be similar to the config server but using the – shardsvr option.

E.g.:-

mongod –shardsvr –replSet <shardReplicaSetNamereplSetname> –dbpath <path> –port 27018 –bind_ip <cluster hostname(s)|ip address(es)>

Now connect to just one of the replica set members from there.

mongo –host <hostname> –port 27018

Now run rs. initiate() on just one of the replica set members. Make sure you leave out the configsvr option.

rs.initiate(

{

_id: “<shardReplicaSetNamereplSetnamereplSetName>”,

members: [

{ _id : 0, host : “<shard-host1.example.net:2701827019>” },

{ _id : 1, host : “<shard-host2.example.net:2701827019>” },

{ _id : 2, host : “<shard-host3.example.net:2701827019>” }

]

}

)

Start the Mongos

Now finally, set up the Mongos and point it to your config servers replica set:-

mongos –configdb

<configReplSetName>/<cfg1.example.net:27019>,<cfg2.example.net:27019>,<cfg3.example.net:27019> –bind_ip localhost,<cluster hostname(s)|ip address(es)>

Configure and Turn on Sharding for the Database

Now first connect to your Mongos:-

mongo –host <hostname> –port 27017

Then add your shards to the cluster:-

sh.addShard(“<shardReplicaSetName>/<shard-host1.example.net:27018>,<shard-host2.example.net:27018>,<shard-host3.example.net:27018>”)

Enable Sharding on your database:-

sh.enableSharding(“<database>”)

Finally, shard your collection using the sh. shard collection() method. You can do this via Hashed Sharding:-

sh.shardCollection(“<database>.<collection>”, { <shard key field> : “hashed” , … } )

Or via Range_Based Sharding:-

sh.shardCollection(“<database>.<collection>”, { <shard key field> : 1, … } )

Following the above steps, you can set up your sharded cluster.

Conclusion

MongoDB Sharding is one of the best methods to manage large datasets and distribute the workload among different servers. This article introduces you to MongoDB Sharding and MongoDB Count, various components of the shards, and how Sharding is beneficial for the users. We have also shown you the different steps needed to implement MongoDB Sharding.

***************************

TechnologyTimesNow