Sharding refers to the process of distributing data across multiple servers which helps in retrieving it easily. When the data set is huge, it is said that sharding significantly improves the query response time.
Let us see how we can implement sharding. Sharding requires a minimum of three servers for test purpose. In production scenario, sharding may require more servers.
Sharding is implemented through the following :-
1) Config Servers – These servers contain the metadata as to where the data belongs to. Means, it holds data as to which shard each data goes. A minimum of 2 config servers in a huge databases system is recommended.
2) Query Router Servers – These servers are the ones that communicate with the application directly. Applications are unaware of sharding and they contact the query router, query router contacts the config servers, get the required data and gives it back to the application.
3) Shard Servers – These are the servers that are used to distribute data. When sharding is enabled for a collection, the data in that collection gets distributed across these shard servers.
I am using this as a testing platform, so I am using only 3 servers now. I am implementing config server and query router server into a single physical server and I am using 2 independent servers as shard servers – so a total of 3 servers.
It is important that the servers be able to communicate with each other. Hence, make sure that the hostnames are set correctly and that firewall doesnt block these servers. The ports that are mainly used are 27017 and 27019 and hence firewall shouldnt block these ports as well.
The hostnames that I chose for the servers are as follows.
Server 1 -Single Server that acts as config server and query router – configquery.mongotest.com
Server 2 – First Shard server – shard1.mongotest.com
Server 3 – Second Shard Server – shard2.mongotest.com
Install mongo in all servers. You can follow the steps mentioned here.
Setting up Config Server
Config server holds the metadata and runs on port 27019. Create a folder for the config server metadata and start config server as follows.
The above starts mongod and all logs go to /var/log/mongo/mongod.log. Whatever number of config servers you use, make sure you use the same settings in all servers and also make sure the dbpath and port are same in all servers.
Setting up Query Router Server
Service that Query Router server uses is mongos and it is the most important part of sharding. In order to start mongos, create a folder for the query router log, and connect with mongos to the config servers. In our case, we do as follows.
Mongos runs on port 27017 by default, and whatever number of query router servers you use, make sure to use same settings and same port, with difference only in the hostnames. All the actions in the database are performed through query router.
Setting up Shard Servers
Shard servers do not need any separate setup other than the default mongo installation. By default, the dbpath of mongo installation will be at /var/lib/mongo/. Once sharding is enabled for a database or collection, the collection data spreads across the multiple shard servers and details can be seen in their respective dbpaths.
Steps to enable sharding for a mongodb database and collection
1) First login to any shard server as root, and connect to the mongos instance from there.
In our example, the config server and query router server are both the same single server. The above command connects to the mongo instance installed in configquery.mongotest.com on port 27017.
2) Now you will get a mongos prompt. Note that that it is a mongos prompt and not a mongo prompt. You can connect to the individual shard servers mongo instances by simply typing mongo in the command prompt.
3) Now, we need to add the two shard servers to the mongos instance. This is how the config server, the query router and the shard servers get connected.
4) Now I am going to create a test database and enable sharding for that database.
5) Now, we need to create a collection that is to be sharded. To create a collection, simply save it with the desired field name and value. Here I am going to create a collection named test_collection with field name and values greproot1, greproot2 etc. This is the collection that I will shard later.
Now, the database test_db has a collection named test_collection with a field name and values greproot1,greproot2,greproot3 etc as follows.
6) Now we need to select a shard key. Mongodb shards the data based on this shard key.
Before selecting a field to be used as a shard key, the field has to be set as index field. In mongodb, every collection will have a default _id field as you see in the above output. I am going to use that as my index and the shard key.
7) Now, we are going to shard the collection.
The above command shards the collection, but you will not see the effect unless a new data is entered into the collection. Until new data is added, the collection resides in its primary shard. As soon as new data is entered, the data spreads across the multiple shards.
8) You can see the sharding status using the following command.
The above output displays the two shards, the databases present, which is the primary shard for each database, in which shards are the data distributed etc.
Hope this article gives you a brief introduction to start testing sharding. Enjoy !!!