CopyDisable

Friday, 24 June 2016

MongoDB inaccurate count() after crash

One of my MongoDB dev database server had crashed due to abrupt power failure. I was running MongoDB 3.2.4 with WiredTiger storage engine. I had one user collection in my test database; and at the time of server crash, inserts were going on from a loop into this collection.
I started back the server; MongoDB did recovery from last checkpoint and it started fine.
2016-06-24T09:53:47.113+0530 W - [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2016-06-24T09:53:47.113+0530 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
After the server had started, I tried to check the number of documents that got inserted into my users collection, so I run the db.collection.count() and seeing the result I was stunned.
> db.users.count()
7
> db.users.find({}).count()
7
clip_image002
Then I run db.collection.stats() that too also confirmed the previous results
clip_image004
This result was not correct, as previously I had 7 documents in the users collections (before the insert operation was started). At the time of crash as the insert operations were going on, so the number documents should have increased. So where the document from my new inserts had gone????? Immediately I felt the fear of data loss or corruption.
I run aggregate method with $group to check the number of docs and result was encouraging:
> db.users.aggregate({ $group: { _id: null, count: { $sum: 1 } } })
{ "_id" : null, "count" : 27662 }
clip_image006
My data was there, so the fear of data loss went away Smile . That means the issue was with the count() and stats() results.
Then I checked the MongoDB documents and I found that, if the MongoDB instance using WiredTiger storage engine had unclean shutdown, then the statistics on size and count may go wrong.
To restore and correct the statistics after an unclean shutdown, we should run the validate command or the helper method db.collection.validate() on each collection of the mongod.
Then I run the db.collection.validate() method on my users collections:
clip_image008
After that I run the count() method, it gave me correct results:
> db.users.count()
27662
clip_image010
Tip: Don’t always rely on count() or stats() methods, run aggregate() with $group to get the document count if you have any doubt. Also on sharded cluster, it is always better to run aggregate() with $group to get the count of documents.

Monday, 30 May 2016

JOIN in MongoDB ? or JOIN’s kin? or something similar?

One of the sought after features in MongoDB was to have the ability to join collections. People working on RDBMS were very much familiar with joins and could not even imagine working without joins. The base of RDBMS is the relations, and join is one of the success factors of RDBMS. Also the join is the one of the major performance issues in RDBMS when we have large amount of data. MongoDB is based on document model, most of the time all the data for a record is located in a single document. So if the data is properly modelled in MongoDB the need for Joins can be avoided. For some requirements like reporting, analytics etc. it is possible that the data we need may reside in multiple collections. As MongoDB user base is growing and more and more users from RDBMS world are using MongoDB, so requirement of Join came out strongly. Starting with MongoDB version 3.2, one new aggregation framework operator $lookup was added. The $lookup operator performs an operation similar to a Join (left outer join). We can read data from one collection and merge the data with data from another collection. Prior to MongoDB 3.2, similar work had to be implemented in application code.
Let’s get our hands dirty with an example.
Suppose we have two collections:
users collection stores user’s information.
image
activity collection stores users activities.
image
Referring to RDBMS, we may think userID field in users collection as the primary key and userID field in activity collection as the foreign key Smile. So the link between these users and activity collection is the userID field.
Now suppose we got a requirement: “find username and city of the user performing each activity”. But the user’s detail information is stored in users collection, so we have to join the activity and users collections using the userID field to extract the required data.
It’s the time to leverage the power of $lookup operator. So our aggregation query will be:
> db.activity.aggregate(
{
"$lookup": {
from : "users",
localField : "userID",
foreignField: "userID",
as : "userInfo" }
})

image



from: Specifies the collection from the current database to be joined, in our example it will be the users collection.
localField: Specifies the field from the input documents, in our case it will be userID field of activity collection.
foreignField: Specifies the field from the documents of the “from” collection, in our case it will be userID from users collection.
as: Specifies the name of the new array field, each array contains the matching documents from the “from” collection. We are naming this array as userInfo.
image
From above output, we can see that the whole users document is stored within the userInfo array.
The data returned above is not looking cool, this is not the format in which we wanted the data. If we get data in the following format, it would be nice:
 UserID, Activity, UserName, City
So for that we have to use two more aggregation framework operators, $unwind and $project, let’s rewrite our aggregation query:
> db.activity.aggregate(
{
"$lookup": {
from : "users",
localField : "userID",
foreignField: "userID",
as : "userInfo" }
},
{
"$unwind": "$userInfo"
},
{
"$project": {
"UserID":"$userID",
"UserName" : "$userInfo.username",
"City" : "$userInfo.city",
"activity" : 1,
"_id": 0 }
}
)

image
Voila, required data is ready Thumbs up

Saturday, 21 May 2016

Index Filters in MongoDB

MongoDB query optimizer processes queries and pick out the most efficient query plan for a query. MongoDB query system uses this plan each time the query runs. MongoDB optimizer chooses the optimal index (if indexes are available) for a query.
MongoDB optimizer works very well but sometimes we may have a better idea of which index to use for a given query. We can run the hint() method on a query to override query optimizer’s index selection process and tell the system which index should be used for the given query. So we have to specify hint() method from client side every time we want to override the index selection process. Sometimes we may have better idea about a query and the index to be used for that query and also we don’t want end user to override the index selection process by providing hint(). For all these the solution is Index Filters.
Index filter provides us a temporary (index filter do not persist after shutdown) way to inform MongoDB that a particular query type should use particular index. It determines which indexes the optimizer evaluates for a query shape (a query shape consists of the query itself, any sort criteria and any projection specifications). So if we have specified an index filter for a specific query type, then we don’t have add hint() to the same query. Also hint() is ignored by MongoDB when index filter exists for the particular query shape.
I will show one example to clarify the concept.
I have one collection users, having following data:  
{ "userID" : 1001, "name" : "Pranab Sharma", "city" : "Mumbai", "favFood" : "Chinese", "favDrink" : "beer" }
{ "userID" : 1002, "name" : "Danish Khan", "city" : "Guwahati", "favFood" : "Chinese", "favDrink" : "beer" }
{ "userID" : 1003, "name" : "Samir Das", "city" : "Mumbai", "favFood" : "Continental", "favDrink" : "milk" }
{ "userID" : 1004, "name" : "John Butler", "city" : "Mumbai", "favFood" : "Indian", "favDrink" : "vodka" }
{ "userID" : 1005, "name" : "Xi Xen", "city" : "Guwahati", "favFood" : "Chinese", "favDrink" : "wine" }
{ "userID" : 1006, "name" : "Vladimir Pulaxy", "city" : "Guwahati", "favFood" : "Chinese", "favDrink" : "beer" }
{ "userID" : 1007, "name" : "Karina Ali", "city" : "Mumbai", "favFood" : "Mexican", "favDrink" : "beer" }

This collection has two user defined indexes:
  • { "userID" : 1, "favDrink" : 1  }
  • { "userID" : 1, "city" : 1 }
Suppose we have a query which finds out the users having userID greater than equal to 1003, loves to drink beer and then sorts the result by the city field.
db.users.find({"userID" : {"$gte": 1003}, "favDrink": "beer"}).sort({"city": 1})
After running the query, if we check the execution stats of the query in MongoDB’s log (set the log level to 1 to get execution stats of the query using the command:  db.adminCommand({setParameter:1, logLevel:1} ) :
[conn1] command test.users command: find { find: "users", filter: { userID: { $gte: 1003.0 }, favDrink: "beer" }, sort: { city: 1.0 } } planSummary: IXSCAN { userID: 1.0, city: 1.0 } keysExamined:5 docsExamined:5 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:342
From the above log line, we can see that the query optimizer has choosen the index { "userID" : 1, "city" : 1 } for our query. It scanned 5 index entries and 5 documents scanned.
Suppose, we know that if we use the index {"userID":1, "favDrink": 1 } for this query, then the system will require less number of document scanning. Let’s run the query providing this index in hint():
db.users.find({"userID" : {"$gte": 1003}, "favDrink": "beer"}).sort({"city": 1}).hint({"userID":1, "favDrink": 1 })
Now the execution stats for this query:
[conn1] command test.users command: find { find: "users", filter: { userID: { $gte: 1003.0 }, favDrink: "beer" }, sort: { city: 1.0 }, hint: { userID: 1.0, favDrink: 1.0 } } planSummary: IXSCAN { userID: 1.0, favDrink: 1.0 } keysExamined:5 docsExamined:2 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:342
This time MongoDB scanned 2 documents, 3 documents less than our previous attempt without hint().
Say, we want to enforce the use of the index {"userID":1, "favDrink": 1 }  for our above query. So we can set an index filter Smile, instead of informing everyone to use the index {"userID":1, "favDrink": 1 } in hint().

To create an index filter, we can use the command planCacheSetFilter. This command has the follwing syntax:
db.runCommand(
   {
      planCacheSetFilter: <collection>,
      query: <query>,
      sort: <sort>,
      projection: <projection>,
      indexes: [ <index1>, <index2>, ...]
   }
)

So for our example, the command will be:
db.runCommand(
{
     planCacheSetFilter: "users",
     query: {"userID" : {"$gte": 1003}, "favDrink": "beer"},
     sort: {"city": 1},
     projection: {},
     indexes: [{"userID":1, "favDrink": 1 }]
} )


image
Our index filter is in place, now let’s run the query without hint():
db.users.find({"userID" : {"$gte": 1003}, "favDrink": "beer"}).sort({"city": 1})
image
Checking the execution status of the query:
[conn1] command test.users command: find { find: "users", filter: { userID: { $gte: 1003.0 }, favDrink: "beer" }, sort: { city: 1.0 } } planSummary: IXSCAN { userID: 1.0, favDrink: 1.0 } keysExamined:5 docsExamined:2 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:342
Yes, we can see MongoDB used the index {"userID":1, "favDrink": 1 } for our query (as we specified in our index filter), so our index filter worked.


To check whether MongoDB really applied an index filter for our query we can use the explain() method and check the indexFilterSet field. If it is set to true, that means MongoDB had applied index filter.
image

If we can change the comparision value for userID field’s $gte and for favDrink, then also our index filter will work.
Let’s examine:
db.users.find({"userID" : {"$gte": 1001}, "favDrink": "wine"}).sort({"city": 1})
image
[conn1] command test.users command: find { find: "users", filter: { userID: { $gte: 1001.0 }, favDrink: "wine" }, sort: { city: 1.0 } } planSummary: IXSCAN { userID: 1.0, favDrink: 1.0 } keysExamined:6 docsExamined:1 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:221
We can see that MongoDB used the index that we specified in the index filter definition. As changing the value does not change the query shape, so MongoDB used that index filter.
But if we change $gte to say $gt or $lt then our index filter will not work and MongoDB will again use the index { "userID" : 1, "city" : 1 }.
Let’s examine:
db.users.find({"userID" : {"$gt": 1003}, "favDrink": "beer"}).sort({"city": 1})
command test.users command: find { find: "users", filter: { userID: { $gt: 1003.0 }, favDrink: "beer" }, sort: { city: 1.0 } } planSummary: IXSCAN { userID: 1.0, city: 1.0 } keysExamined:4 docsExamined:4 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:2 reslen:342
From above log, we can see that MongoDB used the default index { userID: 1, city: 1 }, as changing $gte to $gt changed the query shape.


Now let’s examine if providing a hint for a query works, if we have index filter in place for that query:
 db.users.find({"userID" : {"$gte": 1001}, "favDrink": "wine"}).sort({"city": 1}).hint({ userID: 1, city: 1 })
We provided the { userID: 1, city: 1 } index as a hint to our query.
[conn1] command test.users command: find { find: "users", filter: { userID: { $gte: 1001.0 }, favDrink: "wine" }, sort: { city: 1.0 }, hint: { userID: 1.0, city: 1.0 } } planSummary: IXSCAN { userID: 1.0, favDrink: 1.0 } keysExamined:6 docsExamined:1 hasSortStage:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:1 reslen:221
From the log, it is clear that MongoDB used the { userID: 1, favDrink: 1 } index, it did not consider the index { userID: 1, city: 1 } that we specified in the hint.


We can use the planCacheListFilters command to get the list of index filters for a given collection:
db.runCommand( { planCacheListFilters : "users" })
image

Also we can run the planCacheClearFilters command to remove a specific index filter or all the index filters in a collection. To remove a specific index filter we have to specify the query shape. For our example:
db.runCommand(
{
    planCacheClearFilters: "users",
    "query" : {"userID" : {"$gte" : 1003},"favDrink" : "beer"},
    "sort": {"city" : 1},
    "projection": {}
} )
image
To clear all index filters on a collection, just omit the query shape in the planCacheClearFilters command:
db.runCommand({planCacheClearFilters: "users"})
image
Index filter is a very nice tool for optimizing MongoDB experience, so try it out and enjoy Thumbs up.

Friday, 20 May 2016

How to change MongoDB’s sort buffer size

When MongoDB could not use an index obtain the sort order for a query, then it sorts the results in memory. If the sort operation consumes more than 32 megabytes, MongoDB returns an error:
"Executor error during find command: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit."

image
As written in MongoDB document, to avoid this error we can either create an index supporting the sort operation or we can use sort() in conjunction with limit()
Also memory usage limit for sorts can be configured via the internalQueryExecMaxBlockingSortBytes parameter. In the following example I am setting the sort buffer size to 128MB:
> db.adminCommand({"setParameter": 1, "internalQueryExecMaxBlockingSortBytes" : 134217728})
image
Now my MongoDB will use 128MB memory for the sorts that could not use an index .




Thursday, 7 April 2016

MongoDB point-in-time recovery using oplog

MongoDB oplog is a capped collection that records all the data change operations in the databases. As the oplog is a capped collection, so it can record database changes only for a particular period of time.
Consider the following scenario:Say our oplog can store changes for last 24 hours. Our daily backup takes place at 3AM. Suppose one user fired a drop command by mistake, and dropped a collection at 11AM. To recover the dropped collection we can restore the backup of 3AM and then we can apply the oplog and recover it till 11AM (till the drop statement).
Again suppose if our oplog size is small and it can store changes only for 4 hours. In this case, we will not be able to perform the point-in-time recovery for the above scenario. So be sure that you have a large enough oplog or you take backup of your oplog frequently (before getting overwritten).
Oplog is enabled when we start a mongodb replica set. In case of standalone mongodb instance, if we need oplog then we can start the mongod instance as master (master-slave replication) or as a single node replica set.

Hands On:

Note: I have used MongoDB 3.2.4 on Ubuntu 14.04 for this tutorial.
In this example I am going to show the recovery process in a standalone mongodb instance. Also I am going to enable oplog in this instance by starting it as a master node of master-slave replication (using master-slave replication is not recommended, and it is recommended to use replica set. As here it is just for tutorial purpose, so I am using master-slave replication).
To start MongoDB with oplog in (master-slave replication) use the  --master option.
mongod --config /etc/mongod.conf --master
I will create a collection mycoll in the database mydb with 10000 documents.
image
At this point we will take a backup, for backup I am going to use mongodump.
image
After taking backup, we will do some more changes in the database
Inserting another 10000 documents:
image
Update all the documents and increment the score of all the documents by 10:
image
Again add another 5000 documents:
image
We have 25,000 documents in our mycoll collection.
Now I will run a query that will remove 5000 documents.
image
Lets suppose the above remove command was fired by mistake and we want to recover the deleted documents.
We have monogdump backup, but that backup was taken when the collection had only 10000 documents. So this backup will not be sufficient to recover the deleted documents. So we are going to use the oplog, our saviour Smile .
Take backup of the oplog using mongodump:MongoDB stores oplog in the local database. The collection in which the oplog is stored depends on the replication type:
1) Using master-slave type replication: oplog is stored in oplog.$main collection
2) Using replica-set: oplog is stored in oplog.rs collection
For this example we are using master-slave replication, so I am taking mongodump backup of oplog.$main collection.
image
I created oplogbackup directory and in that directory I took mongodump of the oplog.$main collection.
Find the recovery point:As we have our mongod running and oplog available, we can query the oplog.$main collection and find the recovery point. We executed the delete command on mydb.mycoll collection, so our query will be as follows:
> d=db.getSiblingDB("local")
> d.oplog.$main.find({"ns":"mydb.mycoll","op":"d"}).sort({$natural: 1}).limit(5)

image
Our aim is to find the "ts" field for first delete operation. In our case it is  "ts" : Timestamp(1460026341, 1). Note this value as  1460026341:1 and it is our recovery point, we will have to use this value in mongorestore.
Also we can run the bsondump tool to generate json file of our oplog backup and then searching the json file for the recovery point.
image

Recovering Database:
First I will restore the mongodump backup of the database. I am restoring the backup in a fresh mongod instance:
image
mongorestore command restored 10,000 documents of our mydb.mycoll collection. The remaining recovery will be done using the oplog backup.
We can replay oplog using the mongorestore command’s --oplogReplay option. To replay oplog, mongorestore command requires oplog file backup to be named as oplog.bson. So I am going to move and rename our oplog backup file oplog.$main.bson into another directory as oplog.bson.
image
For point-in-time recovery, mongorestore has another option --oplogLimit. This option allows us to specify timestamp (in <seconds>[:ordinal] format). --oplogLimit instructs mongorestore to include oplog entries before the provided timestamp. So restore operation will run till the provided timestamp.
So running our oplog replay:
image
Database recovery is completed, now lets check the recovered data
image
Yes, we can see all the 25,000 documents are back. So we are able to recover the 5,000 deleted documents.
image
Also our updated 20,000 documents with incremented score of 10, also present Smile .
So our point in time recovery is successful Party smile

Tuesday, 23 February 2016

Securing MongoDB: Using x.509 Certificate

 

Enabling SSL x.509 certificate to authenticate the members of a replica set

Instead of using simple plain-text keyfiles, MongoDB replica set or sharded cluster members can use x.509 certificates to verify their membership.
The member certificate, used for internal authentication to verify membership to the sharded cluster or a replica set, must have the following properties (Source: https://docs.mongodb.org/manual/tutorial/configure-x509-member-authentication/#member-x-509-certificate) :
  • A single Certificate Authority (CA) must issue all the x.509 certificates for the members of a sharded cluster or a replica set.
  • The Distinguished Name (DN), found in the member certificate’s subject, must specify a non-empty value for at least one of the following attributes: Organization (O), the Organizational Unit (OU) or the Domain Component (DC).
  • The Organization attributes (O‘s), the Organizational Unit attributes (OU‘s), and the Domain Components (DC‘s) must match those from the certificates for the other cluster members.
  • Either the Common Name (CN) or one of the Subject Alternative Name (SAN) entries must match the hostname of the server, used by the other members of the cluster.
  • If the certificate includes the Extended Key Usage (extendedKeyUsage) setting, the value must include clientAuth (“TLS Web Client Authentication”).

We have an existing setup of MongoDB 3.2.1 replica set running on three Ubuntu 14.04 VMs. The VMs are named as server2, server3 and server4. This mongodb replica set do not have authentication enabled. We are going to use this replica set and update it to use x.509 certificates for member authentication.
For this document I am going to use self-signed certificates, but self-signed certificates are insecure as we can’t trust the authenticity of the self-signed certificates. For our internal private setup, use we can use self-signed certificates.
First we will create our own CA (certificate authority) which will sign the certificates of our mongodb servers.
First I am going to create private key for my CA server.
openssl genrsa -out mongoPrivate.key -aes256
image

Next I will create the CA certificate:
openssl req -x509 -new -extensions v3_ca -key mongoPrivate.key -days 1000 -out mongo-CA-cert.crt
image

Our CA is ready to sign our certificates, now we will create private keys for our mongodb servers and will generate CSR (certificate signing request) files for each mongodb server. We will use the CA certificate that we created above and generate  certificates for our mongod servers using the CSR files.
We will use a single command which will create the private key and CSR for a particular server. We have to create private key and CSR in all the mongodb server nodes. So we have three servers server2, server3 and server4 and we will run this command in all the three servers.

openssl req -new -nodes -newkey rsa:2048 -keyout server2.key -out server2.csr

While generating the CSR in our 3 servers, we will keep all fields in the certificate request same, except the Common Name. We will set the Common Name as hostname of that respective server on which we run the command.
We are using the -nodes option, this will leave the private key unencrypted, so that we don’t have to type/enter/configure password while starting our MongoDB server.
Creating Private Key and CSR for server2:
image

Now we will sign the server2’s CSR with our CA certificate and we will generate the public certificate of server2.
image

Creating Private Key and CSR for server3:
image

Creating Private Key and CSR for server4:
image
Generating certificates of server3 and server4:
image
Once certificates for all the three servers are generated, we will copy the server certificate and CA certificates to each server.
We have to concatenate the private key and the public certificate of a server into a single .pem file.
The cat command concatenates the private key and the certificate into a PEM file suitable for MongoDB.
image
image
image

Changes in MongoDB configuration file:

In all the three mongodb servers we have to add the below highlighted config options. For PEMKeyFile and clusterFile we have to use the respective .pem files of that server.
Config file in server3:
net:
  port: 27017
  bindIp: 0.0.0.0
  ssl:
        mode: preferSSL
        PEMKeyFile: /mongodb/config/server3.pem
        CAFile: /mongodb/config/mongo-CA-cert.crt
        clusterFile: /mongodb/config/server3.pem







security:
  clusterAuthMode: x509


Explanation:
mode: preferSSL
Connections between servers use TLS/SSL. For incoming connections, the server accepts both TLS/SSL and non-TLS/non-SSL. For details please read https://docs.mongodb.org/manual/reference/configuration-options/#net.ssl.mode
For the timebeing we are configuring SSL connection between our servers. The clients can connect to MongoDB server without TLS/SSL.

PEMKeyFile: /mongodb/config/server3.pemThe .pem file that we created using the private key and certificate of that particular server.
Note: We created unencrypted server’s private key, so we do not have to specify the net.ssl.PEMKeyPassword option. If the private key is encrypted and if we do not specify net.ssl.PEMKeyPassword option then mongod or mongos will prompt for a passphrase.

CAFile: /mongodb/config/mongo-CA-cert.crt
This option requires a .pem file containing root certificate chain from the Certificate Authority. In our case we have only one certificate of CA, so here the path of the .crt file is specified.

clusterFile: /mongodb/config/server3.pem
This option specifies the .pem file that contains the x.509 certificate-key file for membership authentication. If we skip this option then the cluster uses the .pem file specified in the PEMKeyFile setting. In our case this file is same, so we may skip this option as our .pem file is same as the one we specified in PEMKeyFile setting.

clusterAuthMode: x509The authentication mode used for cluster authentication. Please visit the link https://docs.mongodb.org/manual/reference/configuration-options/#security.clusterAuthMode for details.

Now we are ready to start the MongoDB servers of our replica set. After starting the mongod if we check the logs, we can see lines as shown below:
2016-02-24T10:20:39.424+0530 I ACCESS   [conn8]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "emailAddress=pranabksharma@gmail.com,CN=server2,OU=Technologies,O=Pe-Kay,L=Mumbai,ST=Maharashtra,C=IN" }
2016-02-24T10:20:40.254+0530 I ACCESS   [conn9]  authenticate db: $external { authenticate: 1, mechanism: "MONGODB-X509", user: "emailAddress=pranabksharma@gmail.com,CN=server4,OU=Technologies,O=Pe-Kay,L=Mumbai,ST=Maharashtra,C=IN" }
Seems the x.509 authentication for our replica set members are working Hot smile.
Now login to the primary of the replica set and create our initial user administrator using the localhost exception. For details of creating users you may read http://pe-kay.blogspot.in/2016/02/update-existing-mongodb-replica-set-to.html and http://pe-kay.blogspot.in/2016/02/mongodb-enable-access-control.html , I am not going to write about that in this post.


Enabling client SSL x.509 certificate authentication

Each unique x.509 client certificate corresponds to a single MongoDB user, so we have to create client x.509 certificates for all the users that we need to connect to the database. We can not use one client certificate to authenticate more than one user. First we will create the client certificates and then we will add the corresponding mongodb users.
While generating the CSR for a client, we have to remember the following points:
  • A single Certificate Authority (CA) must issue the certificates for both the client and the server.
  • A client x.509 certificate’s subject, which contains the Distinguished Name (DN), must differ from the certificates that we generated for our mongodb servers server2, server3 and server4. The subjects must differ with regards to at least one of the following attributes: Organization (O), the Organizational Unit (OU) or the Domain Component (DC). If we do not change any of these attributes, we will get error ("Cannot create an x.509 user with a subjectname that would be recognized as an internal cluster member.") when we add user for that client.
    In the below screenshot I tried to create user for one of the client. In this case I created the client certificate keeping all the attributes same as the Member x.509 Certificate, except the Common Name. So I got the error: 
    image

So for my client certificate I am going to change the Organizational Unit attribute. For our mongodb server server2, server3 and server4 certificates, we generated CSR with Organizational Unit as Technologies, for client certificate I am going to use Organizational Unit as Technologies-Client. Also I am keeping the Common Name as the name of the user root (which going to connect to the database). All the remaining attributes in the client certificate CSR are kept same as the server certificate.
image
After that using the same CA as we did for our server certificates, we signed the client certificate CSR and generated our client certificate.
image
Then we concatenated the user’s private key and the public certificate and created the .pem file for that user.
image
Our client certificate is created, we have to add user in MongoDB for this certificate. To authenticate with a client certificate, we have to add a MongoDB user using the value of subject collected from the client certificate.
The subject of a certificate can be extracted using the below command:
openssl x509 -in mongokey/rootuser.pem -inform PEM -subject -nameopt RFC2253
image
Using the subject value emailAddress=pranabksharma@gmail.com,CN=root,OU=Technologies-Client,O=Pe-Kay,L=Mumbai,ST=Maharashtra,C=IN we will add one user in the $external database (Users created on the $external database should have credentials stored externally to MongoDB). So using the subject I created one user with root role in the admin database:
rep1:PRIMARY> db.getSiblingDB("$external").runCommand({
... createUser: "emailAddress=pranabksharma@gmail.com,CN=root,OU=Technologies-Client,O=Pe-Kay,L=Mumbai,ST=Maharashtra,C=IN",
... roles: [{role: "root", db: "admin"}]
... })

image
Our client certificate is created, corresponding user is created, we are ready to connect to mongodb using SSL certificate. For that first we have to change the value of ssl mode option from existing preferSSL to requireSSL in all our mongod server config files. This will make all the servers to accept only ssl connections.
net:
  port: 27017
  bindIp: 0.0.0.0
  ssl:
        mode: requireSSL
        PEMKeyFile: /mongodb/config/server4.pem
        CAFile: /mongodb/config/mongo-CA-cert.crt

If we change the mode: preferSSL to mode: requireSSL, and try to connect to mongod in usual way, we get the following error:
Error: network error while attempting to run command 'isMaster' on host '<hostname>:<port>'
image
Now the client authentication needs x.509 certificates rather than username and password as we change the ssl mode to requireSSL.
Lets connect to mongod using our client certificate:
mongo --ssl --sslPEMKeyFile ./mongokey/rootuser.pem --sslCAFile ./ssl/CA/mongo-CA-cert.crt --host server2
After getting connected, if we try to run any query, we get the following error:
image
As the user got root role and we connected to the admin database, it should be able to get the list of collections. What is the issue Surprised smile???
If we run the connection status query, we can see that the authentication information is blank.
rep1:PRIMARY> db.runCommand({'connectionStatus' : 1})
{
        "authInfo" : {
                "authenticatedUsers" : [ ],
                "authenticatedUserRoles" : [ ]
        },
        "ok" : 1
}

Well when our mongo shell connects to our mongod using ssl certificate as shown above, only the connection part is completed. To have authenticated, we have to use the db.auth() method in the $external database as shown below:

rep1:PRIMARY> db.getSiblingDB("$external").auth(
...   { mechanism: "MONGODB-X509",
... user: "emailAddress=pranabksharma@gmail.com,CN=root,OU=Technologies-Client,O=Pe-Kay,L=Mumbai,ST=Maharashtra,C=IN"
... }
... )

image
Once we authenticated, we can run our queries and do admin tasks Nerd smile.
Now say we have another client certificate with subject emailAddress=pranabksharma@gmail.com,CN=reader,OU=Technologies-Client,O=Pe-Kay,L=Mumbai,ST=Maharashtra,C=IN and we connected to the database using this certificate. After getting connected we have to authenticate, so lets try to authenticate using another user(emailAddress=pranabksharma@gmail.com,CN=root,OU=Technologies-Client,O=Pe-Kay,L=Mumbai,ST=Maharashtra,C=IN) , and we will get error:
Error: Username "emailAddress=pranabksharma@gmail.com,CN=root,OU=Technologies-Client,O=Pe-Kay,L=Mumbai,ST=Maharashtra,C=IN" does not match the provided client certificate user "emailAddress=pranabksharma@gmail.com,CN=reader,OU=Technologies-Client,O=Pe-Kay,L=Mumbai,ST=Maharashtra,C=IN"
image
We have to authenticate with the same user, using which we connected to MongoDB server. We can’t use a different user’s certificate to connect to mongodb and later use a different user to authenticate.