
Table of Content
Overview
This is the second part about MongoDB Shard Setup. In this blog, we will discuss about the basic concepts of implementing Shard with Replica-set in Windows instance. If you would like to use Linux, please change directories specification and binary usage (mongod.exe into mongod) as per Linux. To get a better understanding of the use case, please refer our first blog post – MongoDB Shard – Part I.
Production Setup
For production environment, the below minimal configurations are needed:
- Config Servers [metadata]: Deploy config servers as a 3 member replica set.
- Shard [real data]: Deploy each shard as a 3 member replica set.
- Query Router [routing]: Deploy one or more mongos routers.
Prerequisites
MongoDB 3.4 installed in,
1 |
E:\Softwares\MongoDB\Server\3.4 |
Use Case
1 |
cd E:\Softwares\MongoDB\Server\3.4 |
Components
- 3 nos Query Router
- 1 no Config Server (3 Replica-sets)
- 3 nos Shard Server (each 3 Replica-sets)
Configuration Details
Shard (2x)
Port
- Shard1: Data1: 27017, replica-set (27117, 27217)
- Shard2: Data2: 28017, replica-set (28117, 28217)
- Shard3: Data3: 29017, replica-set (29117, 29217)
Directory
Shard1
1 2 3 |
E:\Softwares\MongoDB\shard_data\shard1\data1, E:\Softwares\MongoDB\shard_data\shard1\data2, E:\Softwares\MongoDB\shard_data\shard1\data3 |
Shard2
1 2 3 |
E:\Softwares\MongoDB\shard_data\shard2\data1, E:\Softwares\MongoDB\shard_data\shard2\data2, E:\Softwares\MongoDB\shard_data\shard2\data3 |
Shard3
1 2 3 |
E:\Softwares\MongoDB\shard_data\shard3\data1, E:\Softwares\MongoDB\shard_data\shard3\data2, E:\Softwares\MongoDB\shard_data\shard3\data3 |
Commands to Start Shard
Shard1
1 2 3 |
mongod.exe --shardsvr --port 27017 --logpath "E:\Softwares\MongoDB\shard_data\logs\shard1_1.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\shard1\data1" --replSet shard1_replset mongod.exe --shardsvr --port 27117 --logpath "E:\Softwares\MongoDB\shard_data\logs\shard1_2.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\shard1\data2" --replSet shard1_replset mongod.exe --shardsvr --port 27217 --logpath "E:\Softwares\MongoDB\shard_data\logs\shard1_3.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\shard1\data3" --replSet shard1_replset |
Shard2
1 2 3 |
mongod.exe --shardsvr --port 28017 --logpath "E:\Softwares\MongoDB\shard_data\logs\shard2_1.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\shard2\data1" --replSet shard2_replset mongod.exe --shardsvr --port 28117 --logpath "E:\Softwares\MongoDB\shard_data\logs\shard2_2.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\shard2\data2" --replSet shard2_replset mongod.exe --shardsvr --port 28217 --logpath "E:\Softwares\MongoDB\shard_data\logs\shard2_3.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\shard2\data3" --replSet shard2_replset |
Shard3
1 2 3 |
mongod.exe --shardsvr --port 29017 --logpath "E:\Softwares\MongoDB\shard_data\logs\shard3_1.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\shard3\data1" --replSet shard3_replset mongod.exe --shardsvr --port 29117 --logpath "E:\Softwares\MongoDB\shard_data\logs\shard3_2.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\shard3\data2" --replSet shard3_replset mongod.exe --shardsvr --port 29217 --logpath "E:\Softwares\MongoDB\shard_data\logs\shard3_3.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\shard3\data3" --replSet shard3_replset |
Connect to One of the Shard Server to Enable ReplicateSet [Repeat for all shard servers]
1 2 3 |
mongo.exe --host localhost --port 27017 mongo.exe --host localhost --port 28017 mongo.exe --host localhost --port 29017 |
Once query is executed, anyone of the nodes will be elected as Primary by replica-set using round-bin.
1 2 3 4 5 6 7 8 9 10 |
rs.initiate( { _id: "shard1_replset", members: [ { _id : 0, host : "localhost:27017" }, { _id : 1, host : "localhost:27117" }, { _id : 2, host : "localhost:27217" } ] } ) |
Note: Perform the above other instances only once per shard instance [not in replicas]. Repeat executing the above query by changing the below fields alone:
- _id – to other shard replica-set ids ( shard2_replset, shard3_replset)
- members – change hosts/ports to other shard servers (port: 28x, 29x)
ConfigServer (4x)
Port
configserver1: 47017, replica-set (47117, 47217)
1 2 3 |
E:\Softwares\MongoDB\shard_data\config_server1\data1, E:\Softwares\MongoDB\shard_data\config_server1\data2, E:\Softwares\MongoDB\shard_data\config_server1\data3 |
Command
1 2 3 |
mongod.exe --configsvr --port 47017 --logpath "E:\Softwares\MongoDB\shard_data\logs\configsvr1_1.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\config_server1\data1" --replSet configserver1_replset mongod.exe --configsvr --port 47117 --logpath "E:\Softwares\MongoDB\shard_data\logs\configsvr1_2.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\config_server1\data2" --replSet configserver1_replset mongod.exe --configsvr --port 47217 --logpath "E:\Softwares\MongoDB\shard_data\logs\configsvr1_3.log" --logappend --dbpath "E:\Softwares\MongoDB\shard_data\config_server1\data3" --replSet configserver1_replset |
Connect to Config Servers to Enable ReplicateSet [Repeat for all Config Servers]
1 |
mongo.exe --host localhost --port 47017 |
1 2 3 4 5 6 7 8 9 10 11 |
rs.initiate( { _id: "configserver1_replset", configsvr: true, //indicates it is config server replica-set members: [ { _id : 0, host : "localhost:47017" }, { _id : 1, host : "localhost:47117" }, { _id : 2, host : "localhost:47217" } ] } ) |
Note: Perform the above other instances only once per config server instances [not in replicas].
QueryRouter (1x)
Port
- mongos1 (QueryRouter1): 1000
- mongos2 (QueryRouter2): 1001
- mongos3 (QueryRouter3): 1002
1 2 3 |
mongos.exe --configdb configserver1_replset/localhost:47017,localhost:47117,localhost:47217 --port 1000 mongos.exe --configdb configserver1_replset/localhost:47017,localhost:47117,localhost:47217 --port 1001 mongos.exe --configdb configserver1_replset/localhost:47017,localhost:47117,localhost:47217 --port 1002 |
Perform Sharding [1 QueryRouter, 1 ConfigServer, 3 Shard (each 3 replica)]
Connect to Query Router
1 |
mongo.exe --host localhost --port 1000 |
Add Shard Servers to Config Server
- sh.addShard(“localhost:27017″)
- sh.addShard(“localhost:28017″)
- sh.addShard(“localhost:29017″)
OR (with replica-set id)
sh.addShard(“shard1_replset/localhost:27017,localhost:27117,localhost:27217″)
- sh.addShard(“shard2_replset/localhost:28017,localhost:28117,localhost:28217″)
- sh.addShard(“shard3_replset/localhost:29017,localhost:29117,localhost:29217″)
Check Status to Check about Sharding
- sh.status()
Enable Sharding for Database
- sh.enableSharding(“test_mds”)
Enable Sharding for Database/Collection
- sh.shardCollection(“test_mds.company”,{“company_name”:1});
Import Data
1 |
mongoimport --host localhost --port 1000 --db test_mds --collection company "E:\Softwares\MongoDB\shard_data\input\test_mds\company.json" |
Note: Dump file (company.json) can be any collection. If there is no dump file, please use the below command to load the sample data into collection:
Load sample data: for(var i=0;i<10000;i++){db.company.insert({id: Math.random(), count:i, date: new Date()})}
Check Sharding Data
- Connect to Query Router: mongo.exe –port 1000
- use test_mds
- db.company.count()
- db.company.find()
- sh.status() – can see database with split values
- Check number of records on each shard:
- Shard1
- Shard2
- Shard3
Note: Check documents count in all primary Shards. For secondary replica-set, execute the below command after logging into the command-line and before executing db.company.count(). rs.slaveOk() - to tell Primary. We are querying secondary and not primary to do query operation as secondary will take time to sync up with primary few a times.
Key Points to be Considered before Sharding
For large data with less frequently accessible data and I/O or CPU-bound not against the size of the whole data set, sharding is an extra expense and not a useful option. For example: 250 GB of data on two shards is lot more expensive than 500 GB of data on a single shard. For more details, refer to https://groups.google.com/forum/#!topic/mongodb-user/hcEfOhXxSlA
Hardware Usage
- CPU Contention (Speed, Cores): MongoDB will perform rare CPU bound operations (building index, compression, aggregation/map-reduce queries, sorting).
- Storage (data loading & pattern, IOPS, Size): Use AWS Provisioned IOPS EBS for Disk IO throughput. Use mongoperf utility to check Disk IO performance.
- Network (Latency, Throughput): Write-concerns and read-preference. Use netperf.
- Memory (workingset): Keep data in memory. Use db.serverStatus({ “workingSet”: 1}).
References
Create Shard Cluster
Turning MongoDB Replica Set to a Sharded Cluster
Replica-set Arbiter