Scylla DB vs Apache Cassandra Overview
1. Introduction
In this article, let’s take a quick look at the similarities and differences between two NoSql databases, ScyllaDB and Apache Cassandra DB.
2. What’s Scylla DB?
Scylla DB is a NoSQL database written in C++ and is Apache Cassandra’s drop-in replacement database. ScyllaDB provides low latency and high throughput at a fraction of the cost of other NoSQL databases. ScyllaDB has reimplemented Cassandra in C++ to increase the performance and to utilize the multi-core serves. Scylla DB used the Seastar framework in the implementation.
3. Scylla DB Architecture
ScyllaDB’s Architecture is similar to Cassandra DB Architecture in terms of writing to Commit log, Memtable, Flushing Memtable to SSTable, and SSTables compaction.
3.1 CAP Theorem
All highly distributed and massively scalable databases adhere to CAP Theorem.
- Consistency
- Availability
- Partition Tolerance
- CP – Redis, MongoDB, HBase, BT
- AP – Cassandra, Scylla, DynamoDB
- CA – Not Possible in Distributed systems
3.2 Consistent Hashing
ScyllaDB utilizes the Consistent Hashing mechanism like other NoSQL databases.
- Consistent hashing is a very useful strategy for distributed database systems. It allows us to distribute data across a cluster of nodes in such a way that will minimize the redistribution of the data when new nodes are added to the cluster or nodes removed from the cluster.
- In Consistent Hashing, when the distributed hash table is resized, only a minimal number of keys are to be remapped.
3.3 ScyllaDB Terminology
- Consistent Hashing – Hashing mechanism to distribute the load evenly across the cluster.
- Commit Log – Append-only log of all mutations to a Node.
- Memtable – In-memory data-structure holding data before they are flushed to SST files.
- SSTable – File of key/value string pairs, sorted by keys.
- Gossip – Protocol for Internode communication.
- Compaction – Mechanism to evict tombstones, and consolidate SSTables.
- Partitioner – Determines how the data is distributed across the cluster.
- Coordinator – Selected node in the cluster to interact with DB.
- Seastar – Open-source C++ framework for high-performance server applications.
4. Scylla DB vs Cassandra – Similarities
- Both are NoSql databases.
- ScyllaDB is written in C++ and Cassandra is written in Java.
- Both databases share almost identical architecture.
- Both scales linearly as more nodes are added.
- Both favor Availability over Consistency.
- Compatible drivers between these two databases.
5. Scylla DB vs Cassandra – Differences
- Cassandra is written in Java and ScyllaDB is written in C++.
- Scylla DB uses Seastar Framework .
- Scylla has a thread-per-core design.
- Scylla uses asynchronous Networking.
- Scylla has Future/Promise based APIs.
- Garbage collection occurs in Cassandra, No Garbage collection in Scylla.
- Scylla is a drop-in alternative to Cassandra.
- Cassandra is based on Multithreading, ScyllaDB is based on the thread-per-core model.
- Seed Node simplification in new Scylla release.
- With Cassandra, need side/external cache to be in place for improving performance. Scylla Offers In-memory cache.
- ScyllaDB has fewer configurations compared to Cassandra and utilizes the “Workload Conditioning” to auto-tune the Memtable flushing and SSTable compaction based on deep-learning techniques.
- Cassandra needs Seed nodes to bootstrap the new node joining the cluster. ScyllaDB is implemented with seedless architecture.
6. Scylla DB vs Cassandra – Performance
- Performance – Scylla is a 10x performant on the same hardware.
- CPU Resources – Scylla uses reduced CPU resources by avoiding program loading into JVM.
- Memory Management – Scylla is more flexible and has complex memory management.
- Latency – Scylla offers low latency at higher percentiles due to share-nothing and lockless novel design.
- Cache – Scylla Reduces the need for external cache and offers in-memory cache.
- LWT (Lightweight Transactions) – Scylla commonly uses fewer rounds than Cassandra to complete LWT. Cassandra requires read before writing for Lightweight Trsnacations.
- Handling Spikes – “Workload Prioritization” can be used in Scylla to handle occasional spikes in traffics.
7. Performance Metrics
- 2.5X reduction in AWS EC2 costs -Dramatic improvement in availability through 10X reduction in cluster size.
- Up to 11X improvement in 99th percentile latency.
- Amazon EC2 Bare Metal Instances are a great platform for Scylla.
8. Summary
ScyllaDB is a high performant, low-latency and cost optimized database. It addresses some of the pitfalls that Apache Cassandra has, by implementing it in C++ and leveraging Seastar framework. This article provides quick comparison of features, optimizations, and performance between Apache Cassandra and Scylla DB.