Vikas Gupta: Software architect

Why MongoDB?

Posted by Vikas Gupta on November 21, 2012

A database is needed to persist the data of an application. We have been successfully using relational databases to do so. They are mature and well known. So, why use a new technology like document based database to store the web application data. Let’s explore the reasons by looking at the features of a commonly used document based databases, Mongodb. 

1. Document Data Model: In a document oriented database (DOD), data model is represented by a document. A document is essentially a set of property names and their values. The values can be s*imple data types, such as strings, numbers, and dates. But these values can also be arrays and even other documents. The values can be simple data types, such as strings, numbers, and dates. But these values can also be arrays and even other documents.

On the contrary, in a relational database, since the tables are essentially flat, representing the various one-to-many relationships is going to require multiple tables. But, this has it’s own costs. For instance, to represent data, we may have to join different tables.

Additionally, DODs does not require a fixed schema, whereas, a table has a fixed schema. MongoDB groups documents into collections, containers that don’t impose any sort of schema. In theory, each document in a collection can have a completely different structure. Hence, application code enforces the data’s structure.

2. Replication: MongoDB uses replica sets to provide database replication for automatic failover. Replica sets consist of exactly one primary node and one or more secondary nodes. Like the master-slave replication that you may be familiar with from other databases, a replica set’s primary node can accept both reads and writes, but the secondary nodes are read-only. What makes replica sets unique is their support for automated failover: if the primary node fails, the cluster will pick a secondary node and automatically promote it to primary. When the former primary comes back online, it’ll do so as a secondary.

3. Speed and Durability: In the realm of database systems there exists an inverse relationship between write speed and durability. Write speed can be understood as the volume of inserts, updates, and deletes that a database can process in a given time frame. Durability refers to level of assurance that these write operations have been made permanent.

In MongoDB’s case, users control the speed and durability trade-off by choosing write semantics and deciding whether to enable journaling. All writes, by default, are
fire-and-forget, which means that these writes are sent across a TCP socket without requiring a database response. If users want a response, they can issue a write using a
special safe mode provided by all drivers. This forces a response, ensuring that the write has been received by the server with no errors. Safe mode is configurable; it can also be used to block until a write has been replicated to some number of servers. For high-volume, low-value data (like clickstreams and logs), fire-and-forget-style writes
can be ideal. For important data, a safe-mode setting is preferable.

4. Scaling: We can either scale up(vertical scaling) or out(horizontal scaling). When we cannot scale up. we scale out. MongoDB has been designed to make horizontal scaling manageable. It does so via a range-based partitioning mechanism, known as auto-sharding, which automatically manages the distribution of data across nodes. The sharding system handles the addition of shard nodes, and it also facilitates automatic failover. Individual shards are made up of a replica set consisting of at least two nodes, ensuring automatic recovery with no single point of failure. All this means that no application code has to handle these logistics; your application code communicates with a sharded cluster just as it speaks to a single node.

According to its creators, MongoDB has been designed to combine the best features of key-value stores and relational databases. Key-value stores, because of their simplicity, are extremely fast and relatively easy to scale. Relational databases are more difficult to scale, at least horizontally, but admit a rich data model and a powerful query language. If MongoDB represents a mean between these two designs, then the reality is a database that scales easily, stores rich data structures, and provides sophisticated query mechanisms.

MongoDB vs other databases

1. Simple Key-value stores: Simple key-value stores index values based on a supplied key. A common use case is caching. There’s no enforced schema. You can put a new value and then use its key either to retrieve that value or delete it. Systems with such simplicity are generally fast and scalable.

The best-known simple key-value store is memcached. Memcached stores its data in memory only, so it trades persistence for speed. It’s also distributed; memcached nodes running across multiple servers can act as a single data store, eliminating the complexity of maintaining cache state across machines. Compared with MongoDB, a simple key-value store like memcached will often allow for faster reads and writes. But unlike MongoDB, these systems can rarely act as primary data stores. Simple key-value stores are best used as adjuncts, either as caching layers atop a more traditional database or as simple persistence layers for ephemeral services like job queues.

2. Sophisticated Key-Value Stores: Sophisticated key-value stores manage a relatively self-contained domain that demands significant storage and availability. Because of their masterless architecture, these systems scale easily with the addition of nodes. They opt for eventual consistency, which means that reads don’t necessarily reflect the latest write. But what users get in exchange for weaker consistency is the ability to write in the face of any one node’s failure. Cassandra is a sophisticated key-value store.

3. Relational Databases: MongoDB and MySQL are both capable of representing a rich data model, although where MySQL uses fixed-schema tables, MongoDB has schemafree documents. MySQL and MongoDB both support B-tree indexes, and those accustomed to working with indexes in MySQL can expect similar behavior in MongoDB. MySQL supports both joins and transactions, so if you must use SQL or if you require transactions, then you’ll need to use MySQL or another RDBMS. That said, MongoDB’s document model is often rich enough to represent objects without requiring joins. And its updates can be applied atomically to individual documents, providing a subset of what’s possible with traditional transactions. Both MongoDB and MySQL support replication. As for scalability, MongoDB has been designed to scale horizontally, with sharding and failover handled automatically. Any sharding on MySQL has to be managed manually, and given the complexity involved, it’s more common to see a vertically scaled MySQL system.

4. Other document Databases:  Few databases identify themselves as document databases. As of this writing, the only well-known document database apart from MongoDB is Apache’s CouchDB. CouchDB’s document model is similar, although data is stored in plain text as JSON, whereas MongoDB uses the BSON binary format. Like MongoDB, CouchDB supports secondary indexes; the difference is that the indexes in CouchDB are defined by writing map-reduce functions, which is more involved than the declarative syntax used by MySQL and MongoDB. They also scale differently. CouchDB doesn’t partition data across machines; rather, each CouchDB node is a complete replica of every other.

Use Cases of MongoDB

1. Web Applications: MongoDB is well suited as a primary data store for web applications as it provides Rich data model and query mechanisms.

2. Agile Development

3. Analytics and Logging

4. Caching

Limitations

1. MongoDB should be run on 64-bit machines.

2. It is important to run MongoDB with replication, especially if you’re not running with journaling enabled. Because MongoDB uses memory-mapped files, any unclean shutdown of a mongod not running with journaling may result in corruption.

Advertisements

One Response to “Why MongoDB?”

  1. cseonlineaccess said

    Really, it was a very good informative post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: