Software Development

CAP is not just for your head

Today i like to write about an important theorem in distributed computer systems. I’m sure you notice the subject of this post is about CAP  theorem (also known as Brewer’s theorem). Eric brewer is the man who proposed CAP theorem in 2000.

CAP is the acronym of three words:

Consistency: All nodes must read the latest changed data, In the other word every node in our distributed system should read same data. If a write operation occured in one of the nodes, Reading same data from the other node must return the latest write ( When system received something newer, then must not return any of the older data items )

Availability: There must not be any request what is blocked with any of nodes, All of the requests must have an response about the status of request.

Partition Tolerance: The system continues its convenient tasks even any of the messages lost or there are some failure parts in system.

CAP theorem is about the impossibility of having all of these attributes together in a system.  ٍEvery distributed system at most can have two of these three attributes. Most of the references introduce CAP as an triangle which a distributed system can have just two of its angles.

CAP Triangle
CAP Triangle

Examples of Consistency + Availability are:

  • Single-node Databases
  • Cluster Databases
  • LDAP
  • xFS file system

Examples of Consistency + Partition Tolerance are:

  • Distributed Databases
  • Distributed Locking
  • Majority Locking

Examples of Availability + Partition Tolerance are:

  • Coda
  • Web caching
  • DNS

You can read the formal proof of CAP theorem in : Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-tolerant Web. It is not very hard and with reading this article can clear everything for you.

But in this post i proof this concept with some justifications.

Assumption: In a simple distributed system we have two nodes: NODE A & NODE B.  a client writes “DataItem1″ on NODE A and at the same time a client request to read the “DataItem1″ on NODE B.

Assume we have CA environment, so all the data in all nodes are consistent and all of the nodes can execute every query, If all the messages between nodes fail then a query to node B cannot have the latest value of data item. As you see there are some situations that we can not have “CA” environment with “P”.

Assume we have CP environment, so all the data in all nodes are consistent and there is partition tolerance attribute. Now if before writing “DataItem1″ on NODE A connection between two nodes break, Requesting to node B cannot execute our query, So web missed availability. Node B wants to synch its data with NODE A but the connection is broken so response cannot be available.

At lat assume we have PA environment.  So every request from nodes will have a response and partition tolerance permits our system to continue its tasks with any message and system failures. If client writes “DataItem1″ on NODE A and in the same time other client send request for “DataItem1″  to Node B and the connection between two nodes are broken then client will read old version of “DataItem1″.

Note: It is possible to have delay in communication and  synchronizing between the nodes, It is the most important reason which a PA system cannot have consistency at all. In these environments we have partial consistency between our nodes.
 

Reference: CAP is not just for your head from our JCG partner Soroosh Sarabadani at the Just another Java blog blog.

Soroosh Sarabadani

A worm who loves to have JAVA and share his findings with others.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button