MySQL vs. MongoDB: Choosing a Data Management Solution
Table Of Contents
- 1. Introduction
- 2. The dominance of RDBM systems
- 3. A new era of NoSQL movement
- 4. Tables vs Documents vs Graphs vs Key/Values
- 5. MySQL and MongoDB: Conscious Decision
- 5.1. Enforced Schema vs Schemaless
- 5.2. Normalization vs Duplication
- 5.3. Relations vs References
- 5.4. Transactions vs Atomic Updates
- 5.5. SQL vs JSON
- 5.6. Stored Procedures vs Scripting
- 5.7. GROUP BY vs Aggregations
- 5.8. Clustering and Sharding/Partitioning
- 5.9. Full text search
- 5.10. Development
- 5.11. Monitoring & Diagnostics
- 5.12. Security
- 5.13. Trade-offs, not battles
- 5.15. MySQL or MongoDB?
- 5.16. MySQL and MongoDB: Nowadays
- 6. Conclusions
1. Introduction
It would be fair to say that as IT professionals we are living in the golden age of data management era. As our software systems become more complex and more distributed, driven by myriads of scalability and availability requirements, the traditional solutions used for years began to fall behind.
The goal of this tutorial is to provide a fair overview of the two data management solutions: mature and widely used relation database represented by MySQL and a new kid on the block, MongoDB, which in turn represents a new generation of data store. By no means is it a battle with a clear winner at the end. Instead, the intention is to help developers to make a choice and find a right fit for their applications by going through the features of MySQL and MongoDB, discussing the different guarantees and limitations each one has.
2. The dominance of RDBM systems
The history and evolution of data management software is not a particularly exciting topic to spend time on. For a quite some time relational database management systems took a leadership and for years the choice was obvious, either MySQL, PostreSQL or Oracle, just to name a few. SQL (and its vendor-specific dialects) was the de-facto standard for querying relational data stores and by definition every backend software developer had to learn and understand a least some basics of it. It worked surprisingly well until recently, when a whole family of new data management systems has emerged, the phenomenon also known as NoSQL (or Not Only SQL) movement.
3. A new era of NoSQL movement
The challenges which modern software systems are facing nowadays with respect to data volumes and requests/transactions throughput requirements revealed that relational data stores often become a bottleneck, thus imposing the limits on overall system scalability. Traditionally, the way to solve this problem was just buying a bigger box (so called vertical scalability), however at some point the price to pay becomes very, very high, making the whole system ridiculously expensive and impractical.
The industry was actively looking into a cheaper ways to build complex distributed systems, making use of horizontal scalability instead. It also meant to come up with alternatives to relation data stores, which could scale horizontally as well. That the moment, when NoSQL movement has begun.
4. Tables vs Documents vs Graphs vs Key/Values
A Relational data model represents all data in terms of tuples and relations (better known as tables). The structured data fits this model very well and for a long time no other viable alternative existed. With the NoSQL movement, many alternative data models were developed, giving a birth to a whole bunch of specialized data storage systems.
The NoSQL solutions could be classified in a few different categories. Document data stores are designed for storing, querying and managing documents (semi-structured data). The more mature representatives of this category include CouchDB, Couchbase, MongoDB, OrientDB and HyperDex. Key/Value data stores are designed for storing, querying and managing associative arrays (also known as dictionaries or hashes).
The most widely used representatives of this category include DynamoDB, FoundationDB, HyperDex, MemcacheDB, Redis, Riak, Aerospike and OrientDB. Graph data stores are designed for efficiently storing and operating graph structures. The well-known representatives of this category include Neo4J, InfiniteGraph, GraphBase and OrientDB. Last but not least, wide-column data stores are taking a hybrid approach (combining some characteristics of key/value data stores and traditional relational data stores). The most advanced representatives of this category include Accumulo, Cassandra and HBase.
Please notice that the list of different NoSQL data stores presented above is far from being complete, it just contains the most well-known and widely used names, but there are a lot more out there.
5. MySQL and MongoDB: Conscious Decision
Enough with the introduction, so let us move on to more practical things. In this part of the tutorial we are going to take a look on all aspects of application development process using MySQL and MongoDB, also spending some time talking about deployment and monitoring. The goal is to discuss the trade-offs and design decisions that each of those data stores make and analyze how they are changing the way we develop applications. Also, another purpose this tutorial serves is to assist in making a decision when MongoDB may be a better choice than MySQL (and vice versa), taking into account application architecture, data access patterns and storage requirements.
As we already now, MongoDB is a document data store. It stores JSON-style documents which are grouped into collections. At the top of MongoDB data model hierarchy is a database (please refer to official documentation in order to get the comprehensive details). The current production ready version of MongoDB is 3.0.4.
From other side, MySQL is relational data store. The data is stored in tables which contains columns. The tables are grouped into database (please refer to official documentation in order to get the comprehensive details). The current production ready version of MySQL is 5.6.25.
MySQL supports multiple storage engines, each of them serves own purpose and has some features not available in other storage engines. Along this tutorial we assume that InnoDB storage engine is used as it is the default and most general-purpose one which is recommended for use except for specialized use cases.
5.1. Enforced Schema vs Schemaless
MySQL as a relation data store requires a strict schema for its data model: all tables should be created with columns defined. Only then the data could be stored and queried using the SQL language. It somewhat complicates the development and deployment process as every time the modification of data model is required, the schema should be changed first and then data migrated. Here is an example of typical steps required to create new database, new table and insert one row into it using MySQL shell and SQL language:
CREATE DATABASE tutorial; USE tutorial; CREATE TABLE chapters ( id INT PRIMARY KEY AUTO_INCREMENT, title VARCHAR(255) NOT NULL ); INSERT INTO chapters (title) VALUES ("MySQL and MongoDB: trade-offs, not battles");
In contrast, MongoDB does not impose any schema on the documents being stored in the collection. It becomes the responsibility of application to deal with that, the only thing MongoDB restricts is the supported data types. It significantly speeds up the development process as MongoDB could be used right away to store JSON documents of any shape.
The structure of the documents could evolve and change over time, and different generations of the documents may coexist in the same collection. It is up to application developer to decide how to perform data cleanup or migration if necessary. Let us take a look on how the same steps of creating new database, new collection and inserting one document into it could be done using MongoDB shell:
use tutorial db.chapters.insert({ "title": "MySQL and MongoDB: trade-offs, not battles" })
It is worth to mention that upcoming 5.7 version of MySQL (which is currently in release candidate phase) introduces the native JSON support. With the introduction of this new JSON data type it would be possible to combine enforced schema and schemaless data in MySQL.
5.2. Normalization vs Duplication
Normalization in the world of relational databases is the process of organizing tables to minimize data redundancy. It involves decomposing a table into less redundant (and smaller) tables without losing data. It also assumes defining foreign keys in the old tables referencing the primary keys of the new ones. For example, this is the way to create two tables in MySQL, where the books table references the authors table:
CREATE TABLE authors ( id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(255) NOT NULL ); CREATE TABLE books ( id INT PRIMARY KEY AUTO_INCREMENT, title VARCHAR(255) NOT NULL, author_id INT NOT NULL, FOREIGN KEY (author_id) REFERENCES authors(id) ); INSERT INTO authors (name) VALUES ("Michael Kofler"); INSERT INTO books (title, author_id) VALUES ("The Definitive Guide to MySQL 5", 1);
The normalization is obsolete for MongoDB. Instead, careful data modeling and certain level of data duplication is the way to get most out of document data store. It also becomes the responsibility of the application to decide the level of data repetition and to keep all data fragments in sync in case of modifications. For example:
db.books.insert( { "title": "The Definitive Guide to MySQL 5", "author": { "name": "Michael Kofler" } })
5.3. Relations vs References
In the normalized database, a JOIN clause from the SQL is usually used to combine records from two or more MySQL tables (there are other techniques as well but by and large they essentially lead to the same results). For example:
SELECT * FROM books b INNER JOIN authors a ON a.id = b.author_id;
MongoDB supports document references but does not support joins: to resolve the reference, the additional query (or queries) should be executed (leading to infamous in relational world N+1 query problem). For example, in the snippet below we are inserting the reference to authors collection from the books collection instead of embedding author details:
db.authors.insert({ "name": "Michael Kofler", "_id": 1 }) db.books.insert({ "title": "The Definitive Guide to MySQL 5", "author": { "$ref": "authors", "$id": 1 } })
The necessary round-trips are the price to pay for the flexibility of going schemaless. However, as we already mentioned in section Normalization vs Duplication, certain level of data duplication and careful analysis of data access patterns may mitigate the problem.
5.4. Transactions vs Atomic Updates
A transaction is a logical unit of work performed by data management system against its data. The importance of transaction is encapsulated into its foundational properties: transaction must be atomic, consistent, isolated and durable (also known as ACID model). MySQL is a fully transactional relation data store which makes it a perfect choice for the mission-critical applications that are not able to tolerate data lost or inconsistency. However, transactions are not free: they significantly limit the horizontal scalability of the data stores, and MySQL is not an exception.
MongoDB does not support transactional guarantees. Instead, it supports atomic updates on the single document level. It means when a single update operation modifies multiple documents, the modification of each document is atomic, but the operation as a whole is not atomic and other operations may interleave. It worth mentioning that it is possible to implement transaction-like semantic on top of MongoDB (using the two-phase commits technique) but it requires quite complex implementations.
5.5. SQL vs JSON
SQL is a special-purpose programming language designed for managing data held in a relational database management system. It is the only query language supported by MySQL, enriched with a few vendor-specific extensions. Despite its simplicity, SQL is a very powerful language which essentially consists of two parts: a data definition language (DDL) and a data manipulation language (DML). We have seen quite a few examples already, but let us take a look on the simplest and probably most useful example of selecting rows from the table:
SELECT * FROM books; SELECT * FROM books WHERE title LIKE "%MySQL%";
In MongoDB, everything is a document, represented as JSON. MongoDB does not have a dedicated language to manage the documents and takes very different approach instead. Document manipulation or querying is described using very rich set of operators, composed with each other using JSON structure, as any other document. This unified representation is quite powerful, expressive and easy to understand, plus it does not require to learn yet another programming language. Most of the operators are self-explanatory and intuitive, for example:
db.books.find(); db.books.find({ "title": { "$regex": "MySQL" } });
5.6. Stored Procedures vs Scripting
A stored procedure is a special kind of routine available on the server side of relational data store. Stored procedures could be implemented in different programming languages however most of the time they are written using SQL or its vendor-specific dialect, which is the case with MySQL. Typically, stored procedures help increase the performance of the applications by executing some logic directly on the server, requiring less information to be sent between the server and the client. For example, here is a very simple MySQL stored procedure:
DELIMITER // CREATE PROCEDURE find_books (IN pattern CHAR(255)) BEGIN SELECT COUNT(*) FROM books WHERE title LIKE pattern; END // DELIMITER ; CALL find_books("%MySQL%");
MongoDB supports server-side scripting in the form of executing JavaScript code on the server. Those server-side scripts may resemble stored procedures but the implementation goes beyond just simple scripting as it supports map/reduce paradigm, firstly populated by Google, and widely adopted for big data analytics. Essentially, by supplying the script to map/reduce command, it becomes possible parallelize the processing of a large and very large datasets across many MongoDB data store instances. Let us take a look on very naïve map/reduce script example which counts how many books have “MySQL” string in their title:
db.runCommand( { mapReduce: "books", map: function() { if (this.title.indexOf("MySQL") >= 0) { emit("MySQL", 1); } }, reduce: function(key, counters) { count = 0; for (var index = 0; index < counters.length; ++index) { count += counters[index]; } return count; }, out: { inline: 1 } } )
Compared to the MySQL stored procedure example, the MongoDB one looks overly verbose and complicated. However once the concepts of the map/reduce paradigm become more clear, the example is going to look quite simple, opening a whole new frontier of applying many data processing, exploration and analysis techniques.
5.7. GROUP BY vs Aggregations
In data management domain, an aggregate function is a function where the values of multiple data records are grouped together depending on certain aggregation criteria, as such producing a single value of more significant meaning or measurement. The SQL language defines a set of specific clauses for performing aggregations: GROUP BY and HAVING. The set of standard grouping functions among many others include AVG(), COUNT(), MAX(), MIN(), STD(), SUM(). Let us take a look on GROUP BY example in MySQL shell:
CREATE TABLE orders ( id INT PRIMARY KEY AUTO_INCREMENT, publisher VARCHAR(255) NOT NULL, product VARCHAR(255) NOT NULL, price NUMERIC (15,2) NOT NULL ); INSERT INTO orders (product, publisher, price) VALUES ("The Definitive Guide to MySQL 5", "APress", 37.36), ("MySQL Cookbook", "O’Reilly", 49.65), ("Learning MySQL", "O’Reilly", 29.48); SELECT publisher, SUM(price) FROM orders GROUP BY publisher;
MongoDB offers a specialized set of aggregations operations which process documents and return computed (or aggregated) results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. Aggregations in MongoDB could be done in two ways: using aggregation pipeline or single purpose aggregation methods and commands.
Let us take a look on the same example we have prepared to demonstrate MySQL aggregations, this time using MongoDB’s aggregation pipeline:
db.books.insert( { "title": "The Definitive Guide to MySQL 5", "publisher": "APress", "price": 37.36 }) db.books.insert( { "title": "MySQL Cookbook", "publisher": "O’Reilly", "price": 49.65 }) db.books.insert( { "title": "Learning MySQL", "publisher": "O’Reilly", "price": 29.48 }) db.books.aggregate([ { $match: {} }, { $group: { "_id": "$publisher", "price": { "$sum": "$price" } }} ]);
The map/reduce command which we discussed in Stored Procedures vs Scripting section could also be considered as a way to do aggregations.
5.8. Clustering and Sharding/Partitioning
Nowadays, the data volumes are growing exponentially and the single physical instance of data store often is not able to persist and manage such a mass of data at acceptable pace. Clustering is technology that allows to bring many individual computing instances to work together. With respect to data storages, clustering goes side by side with data sharding/partitioning, which is a technique to split large amount of data across multiple data store instances.
For quite a while, MySQL has been available in the MySQL Cluster edition which claims to be a write-scalable, real-time, ACID-compliant transactional data store. MySQL Cluster is built on top of distributed, multi-master architecture, which supports auto-sharding/partitioning and scales horizontally to serve read and write intensive workloads.
Historically, MySQL Cluster has a reputation of being too complex, quite hard to configure, monitor and maintain. In contrast to standalone MySQL deployments, the data schema should be designed in such a way so to take data sharding / partitioning into account, otherwise the performance of the data store will suffer a lot. Lastly, MySQL Cluster has many limitations comparing to the regular MySQL distribution.
MongoDB document data store supports sharding/partitioning out of the box using concept of sharded clusters. The strong side of MongoDB sharding/partitioning is that it has simple and easy configuration. It scales horizontally pretty well but there are quite a few limitations to be aware of.
5.8.1. Replication
Replication is an essential technique to keep data safe (by replicating it across many data store instances) and in many cases to improve scalability and fault-tolerance of the applications dealing with this data. MySQL supports traditional master/slave replication, which by default is asynchronous but semi-synchronous and delayed replication modes are also possible.
MongoDB approaches replication by introducing replica sets. Basically, it is master/slave replication but MongoDB uses a little bit different terminology. The master, called the primary, receives all write operations and slaves, called secondaries, apply operations from the primary. The one of the greatest features supported by replica sets is automatic failover: when a primary does not communicate with the other members of the replica set, the replica set will attempt to select another member to become the new primary.
To be fair, it is possible to configure MySQL master/slave replication with automatic failover as well but there are some concerns and discussions about this feature in MySQL community.
5.9. Full text search
Since a long time MySQL has support for full-text indexing and searching that is implemented using special type of the index. It is important to mention that full-text indexing support became available on the InnoDB storage engine only since MySQL 5.6 release branch.
Interestingly, full-text search in MySQL is possible using natural language search (phrase search), boolean search (terms search) where the words to search for might be marked as “must be present” or “must be absent”, and query expansion search (a slight modification of a natural language search). However, at the moment full-text indexing is not supported in clustered MySQL deployments (see please Clustering and Sharding/Partitioning section for a brief discussion about MySQL Cluster). Let us take a look on quick example:
CREATE TABLE products ( id INT PRIMARY KEY AUTO_INCREMENT, product VARCHAR(255) NOT NULL, description VARCHAR(255) NOT NULL, FULLTEXT product_description (product, description) ); INSERT INTO products (product, description) VALUES ("The Definitive Guide to MySQL 5", "This book shows readers how to connect to MySQL via all of the major APIs, including PHP, Perl, Java, JSP, and C#"), ("MySQL Cookbook", "Ideal for beginners and professional database and web developers"), ("Learning MySQL", "Book travels far into MySQL's subtleties, including complex queries and joins"); SELECT * FROM products WHERE MATCH (product, description) AGAINST ('database guide' IN NATURAL LANGUAGE MODE);
Full-text search support was introduced in MongoDB not so long ago. Similarly to MySQL, it is implemented using special type of the index on the string content (or array of strings). MongoDB also supports phrase search, terms search and boolean search as a combinations of those. It is easy to use and elegantly implemented feature but not without limitations. Unfortunately, at the moment MongoDB gives no control to specify the subset of fields to perform full-text search against: it is always matches all the fields included in the full-text index. Let us take a look on MongoDB full-text search in action:
db.products.insert({ "product": "The Definitive Guide to MySQL 5", "description": "This book shows readers how to connect to MySQL via all of the major APIs, including PHP, Perl, Java, JSP, and C#" }) db.products.insert({ "product": "MySQL Cookbook", "description": "Ideal for beginners and professional database and web developers" }) db.products.insert({ "product": "Learning MySQL", "description": "Book travels far into MySQL's subtleties, including complex queries and joins" }) db.products.createIndex({ product: "text", description: "text" }) db.products.find({ $text: { $search: "database guide" } })
5.10. Development
When it comes to development, MySQL provides a wide set of connectors which could be used to communicate with the server from most of the mainstream programming languages of your choice. With respect to Java development, MySQL includes a JDBC driver which conforms to the JDBC 3.0 and JDBC 4.0 specifications.
Although it is quite rare to encounter the code which uses straight JDBC interfaces to communicate with relation data store (many frameworks have been built to facilitate that), it is good to know what is involved. Let us take a look on following code snippet which connects to MySQL and executes a single SELECT statement:
try { connection = DriverManager.getConnection("jdbc:mysql://localhost/tutorial"); Statement statement = null; ResultSet resultSet = null; try { statement = connection.createStatement(); resultSet = statement.executeQuery("SELECT * FROM books"); while (resultSet.next()) { // Now do something with the ResultSet .... } } catch (SQLException ex) { // Log exception } finally { if (resultSet != null) { try { resultSet.close(); } catch ( SQLException ex) { /* ignore */ } } if (statement != null) { try { statement.close(); } catch ( SQLException ex) { /* ignore */ } } } } catch (SQLException ex) { // Log exception } finally { if (connection != null ) { try { connection.close(); } catch( SQLException ex ) { /* ignore */ } } }
It looks quite verbose and full of exception handling. MongoDB in turn has a rich ecosystem of drivers, which aside from mainstream languages includes drivers for Erlang, Go, Node.js and Scala. Having a freedom to come up with own specifications, MongoDB is in the process of development of the next generation drivers and quickly adapting the existing implementations to the latest trends in the industry (the good example of that is ReactiveMongo: Reactive Scala Driver for MongoDB).
As an example, let us take a look on one of the way to connect to MongoDB using Java and Morphia driver (assuming we have Book and Author data model classes):
final MongoClient client = new MongoClient( "localhost", 27017 ); final Datastore dataStore = morphia .map( Book.class, Author.class ) .createDatastore( client, "tutorial" ); final Query< Book > query = dataStore.createQuery( Book.class ). final List< Book > books = query.asList();
Historically, JSON support in Java was quite good because of rich set of the community supported libraries, thus making conversions of JSON documents to Java classes (and back) no brainer.
It would be unfair not to mention that Java community has developed quite a lot of different object-relational mapping frameworks (ORMs) to abstract from low-level JDBC interfaces, which essentially allow to write the code as concise as MongoDB’s example. However, it is important to understand that they add quite a lot of complexity into existing applications.
5.10.1. Deployment
Both MySQL and MongoDB are available on most of the major operating systems. In most cases MySQL is being installed from platform-specific packages and requires privileged access to the system. Although the downloadable archives are also available, depending on the operating system, configuration and edition (for example, MySQL Cluster), the installation may get quite complex and non-intuitive (however may not require privileged access to the system).
Contrary, MongoDB in most cases is distributed as downloadable archive which could be unpacked and used right away. The sensible defaults play the great role here by requiring minimum configuration, just run the MongoDB server and start filling data store with documents.
It will be fair to mention that with containerization and Docker, the landscape of deployment procedures (we so get used to) is changing very rapidly. Moving from traditional package-based distributions to preconfigured containers flatten the installable packages and download-unpack-run models: everything becomes an image, run as isolated container, available in milliseconds.
Not to mention that many cloud providers (AWS, Cloud Foundry, Open Shift, just to name a few) have software-as-a-service offerings for MySQL and/or MongoDB, taking care of all configuration, infrastructure and scalability details.
5.11. Monitoring & Diagnostics
Clearly, monitoring is a critical component of any data store administration procedure. There is quite a number of commercial solutions for monitoring data store of your choice but let us discuss the ones available as part of the MySQL and MongoDB distributions.
MySQL has a several ways to perform monitoring and diagnostics. First of all, it is SHOW ENGINE <ENGINE> STATUS command. Secondly, MySQL supports performance schema which is a feature for monitoring MySQL server execution at a low level. And lastly, MySQL supports DTrace probes (not supported by every operating system) which are designed to provide information about the execution of queries within MySQL server and the different areas of the system being utilized during that process.
MongoDB distribution includes a number of command line utilities that allow to quickly retrieve statistics about the performance and activity of each data store instance. Not only that, every running MongoDB instance also provides a web interface that exposes diagnostic and monitoring information through a simple web page. And lastly, MongoDB includes a number of commands to get the insights about the state of the database by connecting to it through the shell.
5.12. Security
MySQL uses privilege-based security model. The primary function of the MySQL’s privilege system is to authenticate a user and to associate that user with privileges on a particular database (such as CREATE, DROP, SELECT, INSERT, UPDATE, DELETE and others).
As of now, there are few things that are not supported by MySQL privilege system: inability to explicitly specify that a given user should be denied access and inability to specify that a user has privileges to create or drop tables in a database but not to create or drop the database itself. On the transport layer, MySQL supports secure (encrypted) connections between clients and the server using the SSL (Secure Sockets Layer) using the TLS v1.0 protocol.
MongoDB security features include authentication, authorization and auditing. Its foundation is a role-based access control with flexible set of privileges. It worth to mention that MongoDB provides a basis for user-defined roles by defining a set of build-in roles. It is also possible to use TLS/SSL (Transport Layer Security/Secure Sockets Layer) to encrypt all of MongoDB’s network traffic to ensure that it is only readable by the intended client.
5.13. Trade-offs, not battles
As we have seen, MySQL and MongoDB both serve same common goal: be a comprehensive data management solution for its users. However, the ways each of them approaches this goal are very, very different. From one side, we see quite mature and battle-tested relational data store. From other side, we met young but quickly maturing document data store, so suitable for certain classes of modern web applications.
As was stated several times along this tutorial, it is not a battle and we are not looking for a winner. Rather, we are looking for solution which better fits to the needs of your applications. Even more, heterogeneous data storage infrastructure becomes rather a norm than an exception nowadays: for example, MongoDB may perfectly fit into analytics and reporting demands, whereas MySQL may store billing transactions. Luckily, with the help of this tutorial you will be able to pick the right tool to do the job.
In many respects it would be fair to say that MongoDB was built for web: embracing JSON, very short learning curve, foundation of rapid development, new features added every release. MySQL is battle-tested, conservative and old fashion, however things are changing very fast as relation data stores try to adapt themselves to meet modern application needs.
5.15. MySQL or MongoDB?
A lot have been already said about MySQL and MongoDB. It would be good to finish the discussion with the set of recommendations when MongoDB is preferred than MySQL and vice-versa.
First of all, if your data is critical to your business, very likely MySQL is a safer choice: ACID properties are there for a reason. But every application is different. Content management systems, logs management, analytics, forums and blogs, events stores, product catalogs, inventory management, those kinds of applications may benefit from MongoDB as a data store.
Schemaless data model is an enabler for rapid development: just introduce new properties as you go without a need to perform schema evolutions and data migration. Arguably, but MongoDB’s style of manipulating documents and running queries is much more developer-friendly (moreover, it does not require any language to learn, like SQL). Configuring MongoDB‘s replica sets and sharded clusters is really easy and fast, comparing to MySQL Cluster configuration (and management).
Modern architectural patterns are promoting to split application’s read and write paths (for example, Command and Query Responsibility Segregation or just CQRS). With that, applications could be designed in a way where write path is served by MySQL but MongoDB could be a data storage for read path, where data pieces are composed into meaningful documents.
5.16. MySQL and MongoDB: Nowadays
It is worth mentioning that two companies, Twitter and Facebook, successfully operate some of the largest MySQL deployments out there. They happily share their experience though the numerous blog posts: MySQL at Twitter, Another look at MySQL at Twitter and incubating Mysos, How Twitter Stores 250 Million Tweets a Day Using MySQL, WebScaleSQL: A collaboration to build upon the MySQL upstream, and more.
From other side, adoption of MongoDB is growing every day with more and more companies deploying it at a larger and larger scale. For example, eBay shared How We Built eBay’s First Node.js Application, while Craigslist followed up with MongoDB at Craigslist: 1 Year Later. No doubts, we are going to see more wide adoption in the future.
6. Conclusions
In this short tutorial we tried to take a look on two very popular data stores, traditional relation database represented by MySQL and document data store represented by MongoDB. We briefly walked through most important features and data guarantees each of those has, trying to assess how applications scalability and availability might be affected. We have omitted a lot of low-level intrinsic details as both MySQL and MongoDB worth several books alone. Instead, we did a high-level overview, picking the most interesting and important areas.
I have been tracking NoSQL databases for several years, collecting public data on skills, vendors, benchmarks, security vulnerabilities, so on. The overall market for NoSQL databases is still tiny. Slides regularly updated with new data as I find it and summary of skills and vendor funding in section 2: https://speakerdeck.com/abchaudhri/considerations-for-using-nosql-technology-on-your-next-it-project-1
Hi Akmal,
Thank you very much for sharing such an invaluable presentation, it is by all means a tremendous work.
Thank you a lot!
Best Regards,
Andriy Redko
Good article for beginners who want to migrate from RDBMS to No-SQL database. I am currently using MongoDB in my new project in Grails framework. The only problem i face with mongodb is the missing multi commit transactions. If there is a good work round on multi commit in No-SQL movement then the market share will go up
Have a nice day
Hi Sakar,
Thank you for the comment. Transaction-like behavior is the topic often discussed in NoSQL community. Some NoSQL data storages do provide some forms to support multi commits (f.e. Redis has MULTI/EXEC, http://redis.io/topics/transactions), some have transactions support as a commercial offerings (f.e. Hyperdex, http://hyperdex.org/). So everyone is picking the right trade offs for their applications.
Thank you.
Best Regards,
Andriy Redko
i think most of us are wrong to face DB problems. If we use 90% or more of completeness and 100% speed ,we can doin the more.
Future —> Once more Refresh,better than 5s to wait.
Nice tutorial on MongoDB vs MySql, Check this out!
http://www.thejavabook.com/2015/08/mysql-vs-mongodb-which-is-better/
What about Derby?
It seems that every generation think the have come up with a new discovery, at bit like every youngster thinks they have discovered sex. The so called new NoSql databases are just different names for stuff that was around before Relational databases were developed to replace the shortcomings of most of them. (SQL has nothing to do with relational) take ISAM, any Navigational/Graph/network or hierarchical database pre the 1980’s such as CODASYL IDMS, IMS, PICK, U2, dbase and many more, a lot of them are still used today. A ISAM database can do anything that a so called noSQL database… Read more »
Yeah, you’re absolutely right! Reinventing the wheel is in fact what characterize “modern” era… That’s explanation for so called “knowledge explosion”.
I think the most important part from the article is “First of all, if your data is critical to your business, very likely MySQL is a safer choice”. That’s why I often find it weird when I see people building web-apps with MongoDB. I mean, isn’t your users data of value to you?! Also the most often mentioned reasons to choose Mongo are “scalability”, but as you have nicely shown even Twitter can build on MySQL. With the advancing cloud solutions like Amazon Aurora it’s not really a problem… Mongo may be justified for special use cases, but not for… Read more »
Hi Daniel,
Thank you for quite thoughtful comment. Indeed, MongoDB may or may not fit the needs of a particular application, taking into account numerous choices out there. Also, things do change, and very fast. MongoDB has evolved a lot since the article had been published, and it is more robust, scalable and reliable data store these days. Still, it is always necessary to evaluate and pick every part of your stack responsibly, and MongoDB is not an exception.
Best Regards,
Andriy Redko