Software Development

Setting up sharded mongodb cluster in localhost

I have been playing around with MongoDb, thanks to the M101J Course offered by Mongodb University. These NoSQL datastores are gaining popularity due to a number of reasons and one among them being the ease with which they can be scaled out i.e horizontal scaling. This horizontal scaling in MongoDB can be achieved by creating a sharded cluster of mongodb instancesYou might want to understand the concept of sharding before continuing. The MongoDB reference has a very clear explanation for the same here.

A sharded setup of mongodb requires the following:
 
 

  • Mongodb Configuration server – this stores the cluster’s metadata
  • mongos instance – this is the router and routes the queries to different shards based on the sharding key
  • Individual mongodb instances – these act as the shards.

The below is the architecture diagram of a sample mongodb sharded setup (Source: MongodDB Reference)

sharded-cluster-production-architecture

Lets create all of the above components in a single instance i.e on your localhost.

Creating Mongodb Config Server

$ mkdir \data\configdb
$ mongod --configsvr --port 27010
  1. First creates a data directory to store the cluster metadata
  2. Second launches the config server deamon on port 27010. The default port 27019, but I have overriden by using the --port command line option.

Setting up Query Routers (mongos instances)

This is the routing service for the sharding cluster where by it takes queries from the application and gets the data from the specific shards. Query routers can be setup by using the mongos command as shown below:

$ mongos -configdb localhost:27010 --port 27011

This outputs a number of things on the console starting with the following line:

2015-02-01T18:51:35.606+0300 warning: running with 1 config server should be done only for testing purposes and is not recommended for production

It is recommended to run with 3 configdb server for production so as to avoid a single point of failure. But for our testing, 1 configdb server should be fine.

--configdb command line option is used to let the Query router know about the config servers we have setup. It takes a comma separated: values like –configdb host1:port1,host2:port2. In our case we have only 1 config server.

Running mongodb shards

Now we need to run the actual mongodb instances which store the shared data. We will created 3 sharded instances of mongodb and run all of these on localhost on different ports and provide each mongodb instance its own --dbpath as shown below:

Mongodb Shard – 1

$ mongod --port 27012 --dbpath \data\db

Mongodb Shard – 2

$ mongod --port 27013 --dbpath \data\db2

Mongodb Shard – 3

$ mongod --port 27014 --dbpath \data\db3

Now we have three shards of mongodb running on localhost. For the database I will be using the students database having collection grades. The structure of the documents in grades is given below:

{ 
  "_id" : ObjectId("50906d7fa3c412bb040eb577"), 
  "student_id" : 0, 
  "type" : "exam", 
  "score" : 54.6535436362647 
}

You can choose any database of your choice.

Registering the shards with mongos

Now that we have created our two mongodb shards running at localhost:27012 and localhost:27013 respectively, we will go ahead and register these shards with our mongos query router, also define which database we need to shard and then enable sharding on the collection we are interested by providing the shard key. All these have to be carried out by connecting to the mongos query router as shown in the below commands:

$ mongo --port 27011 --host localhost
mongos> sh.addShard("localhost:27012")
{ "shardAdded" : "shard0000", "ok" : 1 }
mongos> sh.addShard("localhost:27013")
{ "shardAdded" : "shard0001", "ok" : 1 }
mongos> sh.enableSharding("students")
{ "ok" : 1 }
mongos> sh.shardCollection("students.grades", {"student_id" : 1})
{ "collectionsharded" : "students.grades", "ok" : 1 }
mongos>

In the sh.shardCollection we specify the collection and the field from the collection which is to be used as a shard key.

Adding data to the mongodb sharded cluster

Lets connect to mongos and run some code to populate data to the grades collection in students database.

for ( i = 200; i < 10000; i++ ) {
  db.grades.insert({student_id: i, type: "exam", score : Math.random() * 100 }); 
  db.grades.insert({student_id: i, type: "quiz", score : Math.random() * 100 }); 
  db.grades.insert({student_id: i, type: "homework", score : Math.random() * 100 });
}
WriteResult({ "nInserted" : 1 })

After inserting the data we would notice some activity in the mongos daemon stating that it is moving some chunks for specific shard and so on i.e the balancer will be in action trying to balance the data across the shards. The output will be something like:

2015-02-02T18:26:26.770+0300 [Balancer] moving chunk ns: students.grades moving 
( ns: students.grades, shard: shard0000:localhost:27012, lastmod: 1|1||000000000000000000000000, min: { student_id: MinKey }, max: { student_id: 200.0 }) 
shard0000:localhost:27012 -> shard0001:localhost:27013                                                    

2015-02-02T18:31:12.314+0300 [Balancer] moving chunk ns: students.grades moving 
( ns: students.grades, shard: shard0000:localhost:27012, lastmod: 2|2||000000000000000000000000, min: { student_id: 200.0 }, max: { student_id: 2096.0 }) 
shard0000:localhost:27012 -> shard0002:localhost:27014

Lets look at the status of the shards by connecting to the mongos. It can be achieved by using the sh.status() command.

$ mongo --port 27011 --host localhost
mongos> sh.status()
--- Sharding Status ---
sharding version: {
  "_id" : 1,
  "version" : 4,
  "minCompatibleVersion" : 4,
  "currentVersion" : 5,
  "clusterId" : ObjectId("54cf95d9d9309193f5fa0780")
}
shards:
  {  "_id" : "shard0000",  "host" : "localhost:27012" }
  {  "_id" : "shard0001",  "host" : "localhost:27013" }
  {  "_id" : "shard0002",  "host" : "localhost:27014" }
databases:
  {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
  {  "_id" : "blog",  "partitioned" : false,  "primary" : "shard0000" }
  {  "_id" : "course",  "partitioned" : false,  "primary" : "shard0000" }
  {  "_id" : "m101",  "partitioned" : false,  "primary" : "shard0000" }
  {  "_id" : "school",  "partitioned" : false,  "primary" : "shard0000" }
  {  "_id" : "students",  "partitioned" : true,  "primary" : "shard0000" }
    students.grades
      shard key: { "student_id" : 1 }
      chunks:
          shard0001       1
          shard0002       1
          shard0000       1
      { "student_id" : { "$minKey" : 1 } } -->> { "student_id" : 200 } on : shard0001 Timestamp(2, 0)
      { "student_id" : 200 } -->> { "student_id" : 2096 } on : shard0002 Timestamp(3, 0)
      { "student_id" : 2096 } -->> { "student_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(3, 1)
  {  "_id" : "task-db",  "partitioned" : false,  "primary" : "shard0000" }
  {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }

The above output shows that the database students is sharded and the sharded collection is the grades collection. It also shows the different shards available and the range of shard keys distributed across different shards. So on shard0001 we have student_id from minimum to 200, then on shard0002 we have student_id from 200 upto 2096 and the rest in shard0000.

We can also connect to individual shards and query to find out the max and minimum student ids available.

On shard0000

$ mongo --host localhost --port 27012
MongoDB shell version: 2.6.7
connecting to: localhost:27012/test
> use students
switched to db students
> db.grades.find().sort({student_id : 1}).limit(1)
{ "_id" : ObjectId("54cf97295a23cc67efa848c8"), "student_id" : 2096, "type" : "exam", "score" : 6.7372970981523395 }
> db.grades.find().sort({student_id : -1}).limit(1)
{ "_id" : ObjectId("54cf973b5a23cc67efa8a567"), "student_id" : 9999, "type" : "homework", "score" : 60.64519872888923 }

On shard0001

C:\Users\Mohamed>mongo --host localhost --port 27013
MongoDB shell version: 2.6.7
connecting to: localhost:27013/test
> use students
switched to db students
> db.grades.find().sort({student_id:1}).limit(1).pretty()
{
        "_id" : ObjectId("54cf97d05a23cc67efa8a568"),
        "student_id" : 1,
        "type" : "exam",
        "score" : 5.511052813380957
}
> db.grades.find().sort({student_id:-1}).limit(1).pretty()
{
        "_id" : ObjectId("54cf97d15a23cc67efa8a7bc"),
        "student_id" : 199,
        "type" : "homework",
        "score" : 51.78457708097994
}

On shard0002

$ mongo --host localhost --port 27014
MongoDB shell version: 2.6.7
connecting to: localhost:27014/test
> use students
switched to db students
> db.grades.find().sort({student_id:1}).limit(1).pretty()
{
  "_id" : ObjectId("54cf971f5a23cc67efa83292"),
  "student_id" : 200,
  "type" : "homework",
  "score" : 79.56434232182801
}
> db.grades.find().sort({student_id:-1}).limit(1).pretty()
{
  "_id" : ObjectId("54cf97295a23cc67efa848c7"),
  "student_id" : 2095,
  "type" : "homework",
  "score" : 62.75710032787174
}

Lets execute the same set of queries on the mongos query router and see that the results this time will include data from all the shards and not just individual shard.

$ mongo --port 27011 --host localhost
MongoDB shell version: 2.6.7
connecting to: localhost:27011/test
mongos> use students
switched to db students
mongos> db.grades.find().sort({student_id:-1}).limit(1).pretty()
{
  "_id" : ObjectId("54cf973b5a23cc67efa8a567"),
  "student_id" : 9999,
  "type" : "homework",
  "score" : 60.64519872888923
}
mongos> db.grades.find().sort({student_id:1}).limit(1).pretty()
{
  "_id" : ObjectId("54cf97d05a23cc67efa8a568"),
  "student_id" : 1,
  "type" : "exam",
  "score" : 5.511052813380957
}
mongos>

So this brings to the end of setting up sharded mongodb cluster on localhost. Hope it was informative and useful!

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Ross
Ross
9 years ago

Dear Mohamed, thanks for the very easy to follow guide on content which has always appeared complicated to me. I too am on M101J as a new entrant to the world of NoSQL it is awesome to use.

Mohamed Sanaulla
9 years ago

Thanks for taking your time to read through the post. I have just scratched the surface of sharding. There is lot to it which we would encounter and learn as we are involved more in sharding.

Yes, NoSQL provides lot more flexibility and lot more power than the usual SQL

devplayerJoe
devplayerJoe
9 years ago

Nice and simple tutorial….I would be grateful if you let me know how to deploy custom load balancing before shards?

Mohamed Sanaulla
9 years ago
Reply to  devplayerJoe

Why would you want a load balancer before the shards? The routing is taken care by mongos and you can have one mongos running on each of your app instances. You need not worry about balancing between shards, as this is taken care of my the mongos with the help of config server and the shard key.

You can have replica sets running for each shard server and again your Mongo client will take care of commenting to primary of replica set and then updating itself with the new primary if the primary goes down.

Denis
Denis
9 years ago

Thanks for this tutorial. It was so helpful, but, i can’t understand one thing. How it is possible this to work when you do not have connected shards with config servers and also config servers with shards as shown in the schema? This is confusing me.

kiran
kiran
9 years ago

Can you write sample java program to connect with multiple query router (mongos). What i did : mongoClient = new MongoClient(Arrays.asList(new ServerAddress(“10.10.10.17”, 27015),new ServerAddress(“10.10.10.15”, 27015))); DB db = mongoClient.getDB(Constants.MongoDBProperties.MONGO_DB_NAME); dbCollection = db.getCollection(Constants.MongoDBProperties.MONGO_COLLECTION_NAME); In above code i am getting dbCollection abject but when i try to fetch data from collection i am getting below exception. com.mongodb.MongoException: can’t find a master at com.mongodb.DBTCPConnector.checkMaster(DBTCPConnector.java:518) at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:277) at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:257) at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:310) at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:295) at com.mongodb.DBCursor._check(DBCursor.java:368) at com.mongodb.DBCursor._hasNext(DBCursor.java:459) at com.mongodb.DBCursor.hasNext(DBCursor.java:484) at com.quickheal.andrcloudscan.daoImpl.MongoDBDAOImpl.getAppInfo(MongoDBDAOImpl.java:30) at com.quickheal.andrcloudscan.serviceImpl.DatabaseServiceImpl.getAppInfo(DatabaseServiceImpl.java:23) at com.quickheal.andrcloudscan.controller.QHWebServicesController.getAppInfo(QHWebServicesController.java:27) at com.quickheal.andrcloudscan.webservices.QHCloudWebServices.appInfo(QHCloudWebServices.java:29) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at… Read more »

Raj
8 years ago

Dear Mohamed – Short and Crisp!! , wonderful article. Thanks for publishing it

Back to top button