MongoDB Shard Setup – Part II

MongoDB Shard Setup – Part II

Overview

This is the second part about MongoDB Shard Setup. In this blog, we will discuss about the basic concepts of implementing Shard with Replica-set in Windows instance. If you would like to use Linux, please change directories specification and binary usage (mongod.exe into mongod) as per Linux. To get a better understanding of the use case, please refer our first blog post – MongoDB Shard – Part I.

Production Setup

For production environment, the below minimal configurations are needed:

  • Config Servers [metadata]: Deploy config servers as a 3 member replica set.
  • Shard [real data]: Deploy each shard as a 3 member replica set.
  • Query Router [routing]: Deploy one or more mongos routers.

Prerequisites

MongoDB 3.4 installed in,

Use Case

Components

  • 3 nos Query Router
  • 1 no Config Server (3 Replica-sets)
  • 3 nos Shard Server (each 3 Replica-sets)

Configuration Details

Shard (2x)

Port

  • Shard1: Data1: 27017, replica-set (27117, 27217)
  • Shard2: Data2: 28017, replica-set (28117, 28217)
  • Shard3: Data3: 29017, replica-set (29117, 29217)

Directory

Shard1

Shard2

Shard3

Commands to Start Shard

Shard1

Shard2

Shard3

Connect to One of the Shard Server to Enable ReplicateSet [Repeat for all shard servers]

Once query is executed, anyone of the nodes will be elected as Primary by replica-set using round-bin.

Note: Perform the above other instances only once per shard instance [not in replicas]. Repeat executing the above query by changing the below fields alone:

  • _id – to other shard replica-set ids ( shard2_replset, shard3_replset)
  • members – change hosts/ports to other shard servers (port: 28x, 29x)

ConfigServer (4x)
Port

configserver1: 47017, replica-set (47117, 47217)

Command

Connect to Config Servers to Enable ReplicateSet [Repeat for all Config Servers]

Note: Perform the above other instances only once per config server instances [not in replicas].

QueryRouter (1x)

Port

  • mongos1 (QueryRouter1): 1000
  • mongos2 (QueryRouter2): 1001
  • mongos3 (QueryRouter3): 1002

Perform Sharding [1 QueryRouter, 1 ConfigServer, 3 Shard (each 3 replica)]

Connect to Query Router

Add Shard Servers to Config Server

  • sh.addShard(“localhost:27017″)
  • sh.addShard(“localhost:28017″)
  • sh.addShard(“localhost:29017″)

OR (with replica-set id)

sh.addShard(“shard1_replset/localhost:27017,localhost:27117,localhost:27217″)

  • sh.addShard(“shard2_replset/localhost:28017,localhost:28117,localhost:28217″)
  • sh.addShard(“shard3_replset/localhost:29017,localhost:29117,localhost:29217″)

Check Status to Check about Sharding

  • sh.status()

select

Enable Sharding for Database

  • sh.enableSharding(“test_mds”)

select

Enable Sharding for Database/Collection

  • sh.shardCollection(“test_mds.company”,{“company_name”:1});

select

Import Data

select

Note: Dump file (company.json) can be any collection. If there is no dump file, please use the below command to load the sample data into collection:

Load sample data: for(var i=0;i<10000;i++){db.company.insert({id: Math.random(), count:i, date: new Date()})}

Check Sharding Data

  • Connect to Query Router: mongo.exe –port 1000
  • use test_mds

select

  • db.company.count()
  • db.company.find()

select

  • sh.status() – can see database with split values

select

  • Check number of records on each shard:
    • Shard1

select

    • Shard2

select

    • Shard3

select

Note: Check documents count in all primary Shards. For secondary replica-set, execute the below command after logging into the command-line and before executing db.company.count(). rs.slaveOk() - to tell Primary. We are querying secondary and not primary to do query operation as secondary will take time to sync up with primary few a times.

Key Points to be Considered before Sharding

For large data with less frequently accessible data and I/O or CPU-bound not against the size of the whole data set, sharding is an extra expense and not a useful option. For example: 250 GB of data on two shards is lot more expensive than 500 GB of data on a single shard. For more details, refer to https://groups.google.com/forum/#!topic/mongodb-user/hcEfOhXxSlA

Hardware Usage

  • CPU Contention (Speed, Cores): MongoDB will perform rare CPU bound operations (building index, compression, aggregation/map-reduce queries, sorting).
  • Storage (data loading & pattern, IOPS, Size): Use AWS Provisioned IOPS EBS for Disk IO throughput. Use mongoperf utility to check Disk IO performance.
  • Network (Latency, Throughput): Write-concerns and read-preference. Use netperf.
  • Memory (workingset): Keep data in memory. Use db.serverStatus({ “workingSet”: 1}).

References

Create Shard Cluster

Turning MongoDB Replica Set to a Sharded Cluster

Replica-set Arbiter

3145 Views 3 Views Today