CopyDisable

Monday, 30 May 2016

JOIN in MongoDB ? or JOIN’s kin? or something similar?

One of the sought after features in MongoDB was to have the ability to join collections. People working on RDBMS were very much familiar with joins and could not even imagine working without joins. The base of RDBMS is the relations, and join is one of the success factors of RDBMS. Also the join is the one of the major performance issues in RDBMS when we have large amount of data. MongoDB is based on document model, most of the time all the data for a record is located in a single document. So if the data is properly modelled in MongoDB the need for Joins can be avoided. For some requirements like reporting, analytics etc. it is possible that the data we need may reside in multiple collections. As MongoDB user base is growing and more and more users from RDBMS world are using MongoDB, so requirement of Join came out strongly. Starting with MongoDB version 3.2, one new aggregation framework operator $lookup was added. The $lookup operator performs an operation similar to a Join (left outer join). We can read data from one collection and merge the data with data from another collection. Prior to MongoDB 3.2, similar work had to be implemented in application code.
Let’s get our hands dirty with an example.
Suppose we have two collections:
users collection stores user’s information.
image
activity collection stores users activities.
image
Referring to RDBMS, we may think userID field in users collection as the primary key and userID field in activity collection as the foreign key Smile. So the link between these users and activity collection is the userID field.
Now suppose we got a requirement: “find username and city of the user performing each activity”. But the user’s detail information is stored in users collection, so we have to join the activity and users collections using the userID field to extract the required data.
It’s the time to leverage the power of $lookup operator. So our aggregation query will be:
> db.activity.aggregate(
{
"$lookup": {
from : "users",
localField : "userID",
foreignField: "userID",
as : "userInfo" }
})

image



from: Specifies the collection from the current database to be joined, in our example it will be the users collection.
localField: Specifies the field from the input documents, in our case it will be userID field of activity collection.
foreignField: Specifies the field from the documents of the “from” collection, in our case it will be userID from users collection.
as: Specifies the name of the new array field, each array contains the matching documents from the “from” collection. We are naming this array as userInfo.
image
From above output, we can see that the whole users document is stored within the userInfo array.
The data returned above is not looking cool, this is not the format in which we wanted the data. If we get data in the following format, it would be nice:
 UserID, Activity, UserName, City
So for that we have to use two more aggregation framework operators, $unwind and $project, let’s rewrite our aggregation query:
> db.activity.aggregate(
{
"$lookup": {
from : "users",
localField : "userID",
foreignField: "userID",
as : "userInfo" }
},
{
"$unwind": "$userInfo"
},
{
"$project": {
"UserID":"$userID",
"UserName" : "$userInfo.username",
"City" : "$userInfo.city",
"activity" : 1,
"_id": 0 }
}
)

image
Voila, required data is ready Thumbs up