MySQL Is a Great NoSQL
NoSQL is a set of database technologies built to handle massive amounts of data or specific data structures foreign to relational databases. However, the choice to use a NoSQL database is often based on hype, or a wrong assumption that relational databases cannot perform as well as a NoSQL database. Operational cost is often overlooked by engineers when it comes to selecting a database. At Wix engineering, we’ve found that in most cases we don’t need a NoSQL database, and that MySQL is a great NoSQL database if it’s used appropriately.
When building a scalable system, we found that an important factor is using proven technology so that we know how to recover fast if there’s a failure. For example, you can use the latest and greatest NoSQL database, which works well in theory, but when you have production problems, how long does it take to resume normal activity? Pre-existing knowledge and experience with the system and its workings—as well as being able to Google for answers—is critical for swift mitigation. Relational databases have been around for over 40 years, and there is a vast industry knowledge of how to use and maintain them. This is one reason we usually default to using a MySQL database instead of a NoSQL database, unless NoSQL is a significantly better solution to the problem—for example, if we need a document store, or to handle high data volume that MySQL cannot handle.
However, using MySQL in a large-scale system may have performance challenges. To get great performance from MySQL, we employ a few usage patterns. One of these is avoiding database-level transactions. Transactions require that the database maintains locks, which has an adverse effect on performance.
Instead, we use logical application-level transactions, thus reducing the load and extracting high performance from the database. For example, let’s think about an invoicing schema. If there’s an invoice with multiple line items, instead of writing all the line items in a single transaction, we simply write line by line without any transaction. Once all the lines are written to the database, we write a header record, which has pointers to the line items’ IDs. This way, if something fails while writing the individual lines to the database, and the header record was not written, then the whole transaction fails. A possible downside is that there may be orphan rows in the database. We don’t see it as a significant issue though, as storage is cheap and these rows can be purged later if more space is needed.
Here are some of our other usage patterns to get great performance from MySQL:
Do not have queries with joins; only query by primary key or index.
Do not use sequential primary keys (auto-increment) because they introduce locks. Instead, use client-generated keys, such as GUIDs. Also, when you have master-master replication, auto-increment causes conflicts, so you will have to create key ranges for each instance.
Any field that is not indexed has no right to exist. Instead, we fold such fields into a single text field (JSON is a good choice).
We often use MySQL simply as a key-value store. We store a JSON object in one of the columns, which allows us to extend the schema without making database schema changes. Accessing MySQL by primary key is extremely fast, and we get submillisecond read time by primary key, which is excellent for most use cases. So we found that MySQL is a great NoSQL that’s ACID compliant.
In terms of database size, we found that a single MySQL instance can work perfectly well with hundreds of millions of records. Most of our use cases do not have more than several hundred million records in a single instance.
One big advantage to using relational databases as opposed to NoSQL is that you don’t need to deal with the eventually consistent nature displayed by most NoSQL databases. Our developers all know relational databases very well, and it makes their lives easy.
Don’t get me wrong, there is a place for NoSQL; relational databases have their limits—single host size and strict data structures. Operational cost is often overlooked by engineers in favor of the cool new thing. If the two options are viable, we believe we need to really consider what it takes to maintain it in production and decide accordingly.
Reference: | MySQL Is a Great NoSQL from our JCG partner Aviran Mordo at the Aviran’s Place blog. |
There are some other good examples where MySQL has been used very successfully in large projects:
o http://stackshare.io/wix/scaling-wix-to-60m-users—from-monolith-to-microservices/
o http://www.techrepublic.com/article/etsy-goes-retro-to-scale/
o https://eng.uber.com/mezzanine-migration/
You’re using MySQL as a schemaless, constraintless, joinless, transactionless, queryless key value store. Wouldn’t something like Redis, Couchbase, Aerospike or an in memory data grid make more sense? These aren’t really new technologies anymore.
These are all viable solutions if you have the knowledge to recover and respond quickly to problems.
Maybe is was not clear from the article, but the way we use MySQL is not completely schema-less, we do take advantage of the secondary indexes when needed, however for data that you don’t need to query for (in a where clause) is being put in a JSON blob, while on redis and other key-value stores you do not have that flexibility of using secondary indexes.
I hope that didn’t come across wrong. You have valid points I just think that as each day passes the NoSQL solutions become more mature and the reason to use MySQL like this becomes less persuasive. To pick on the Etsy example a couple quotes stand out to me – “At the time, he continues, NoSQL databases were still very early in their lifecycle.” – “With now CEO Chad Dickerson and Elliott-McCrea arriving from Flickr, they decided to go with what they knew worked, rather than hope for the best with shiny new technology.” That sums it up for me.… Read more »
Don’t get me wrong. If you have scalability issues where a single (or a couple) MySQL server(s) can’t handle then NoSQL is probably the better way to go, anything that will lower your DevOps cost.
But in many cases a single MySql server can handle the data very good (especially in a microservices architecture where you can easily “shard” your MySql databases by microservice), it would be better to go with MySQL and have all the advanteges of a relational database then to manage clusters of NoSQL servers unnecessarily, deal with eventual consistency and paradigm shift
How about postgresql for json + sql + indexability ?
A viable option if you know how to operate it and get a quick time to recovery.