Apr 4, 2016

Consistency considerations in distributed data stores

In a large distributed system, network partitions are a given. This means we cannot achieve both consistency and availability (CAP theorem). So our choices are either to relax the consistency for system to be highly available under partitions (or) make consistency a priority and system will not be available under certain conditions. Both of these options require developers to be aware of what the system is offering.

For example, if the system is emphasizing consistency, then the developer has to deal with availability issues. So if an update fails because of system unavailability, then developer need to plan on what to do with that update.

On other hand, if the system is emphasizing availability, then the developer should assume there will be times when the reads will not return latest updates. The application needs to be tolerant i.e., work with slightly stale data.


Most distributed systems have to cope with network partitions and write availability is a must for many internet based applications. So distributed data stores have to offer weaker forms of consistency to maximize availability while providing guarantees that are essential for the application.

Typically in any distributed storage system there are two perspectives on data consistency -

  • data-centric consistency and 
  • client-centric consistency. 

The provider (i.e., data-centric) focus is typically on the how to synchronize the data among the replicas, how to order the operations, how the updates flow thru the system and what guarantees the system can provide with respect to updates etc. 

The client-centric perspective views the system from the outside as a black box. The client focus is on what guarantees the distributed storage system provides and can be captured as SLA etc. In short, the client centric perspective deals with how and when they observe the data updates in the system.

Whenever we talk about replication & consistency, there are two dimensions we need to consider - staleness of updates and ordering of updates.

Staleness of updates basically describes by how much a replica is lagging the source. The staleness could be either in terms of time or in terms of versions. Generally the staleness of updates is not an issue if the corresponding real-world operation has same or higher lag without the storage system.


For example, say Alice wires money to Bob. Alice account is debited immediately. On other hand, Bob's account is not credited for some time. Typically the time window is 1-3 days. This time window is far longer than any delay the replica synchronization might take. So generally small delays often go unnoticed.

Ordering of updates on other hand is more critical. If we have to execute all updates in their chronological order, it is very hard to implement in distributed database. One option is to use locks but then that will bring down both performance and availability of the system. Another alternative is to utilize timestamps etc but then replicas might disagree about time and create clock synchronization issues for us.

Given the above, there are various consistency models to relax ordering requirements while keeping the consistency guarantees that are essential for the applications. These consistency models typically can be ordered by the strictness of their ordering requirements.

Client-centric Consistency Models...
Monotonic Read Consistency (MRC) - Basically this model guarantees that if a client reads version n, then later on that client will read only versions >= n. This is important because from an application perspective, it is better for the versions become visible in chronological order even if there is a delay.

For example, say Alice wires money to Bob. Bob sees the money credited in his account. Now Bob tries to transfer funds to Wendy. But Bob gets "insufficient funds" error. This at minimum will cause good bit of irritation if not more.

Read Your Writes Consistency (RYWC) - This model guarantees that if a client writes version n, then afterwards client will read only version n or later updates.

For example, say Wendy checks her bank account and sees the funds. Later on say Wendy checks again and finds her funds disappeared from the bank account. Not a good place for the bank.

Generally this consistency guarantee helps user or applications avoid situations where same request is issued multiple times. The application might issue same request again because it gets impression that the request failed. For idempotent ops, re-issue is not a problem except for additional load on the system. For non-idempotent ops, it could create a severe problem.

Session Consistency - This is a practical version of above consistency model. Basically the guarantee here is that for the duration of the session, the client is provided with read-your-writes guarantees.

Monotonic Writes Consistency (MWC) - This model guarantees that two updates from the client will be serialized in the same order they arrive at the storage system. This guarantee helps applications avoid lost updates i.e., say application first writes and then updates but the update is executed first.

For example, continuing our bank scenario, say Bob corrected the account number before finalizing the transfer to Wendy. If this model (i.e., MWC) guarantee is not there, the money can easily end up in some other person account. Not a good place for the bank again.

Write Follows Read Consistency (WFRC) - This model guarantees to client that any update following a read of version n will only execute on replicas that are at-least of version n. This guarantee helps applications avoid seemingly lost updates.

Data-centric Consistency Models...

Weak Consistency (WC) - This model basically means do not expect any consistency guarantees from the system. In other words, think of this model guarantee as - the replicas might by chance become consistent.

For example, think of browser cache. It is updated from time to time but replicas are rarely (if ever) be consistent. It does not mean, browser cache is not useful.

Eventual Consistency (EC) - This model is bit more stricter than Weak Consistency. Basically in EC, if no failures and no new updates are made then eventually all updates converge to the same value. When there are no failures, the inconsistency window is typically determined by factors like communication delays, load on system etc.

While there are certainly some use cases where EC cannot be applied, it is generally sufficient as the real world itself is inherently eventually consistent. The challenge with EC is more conflict resolution is needed at the application layer.

Causal Consistency (CC) - This is the strictest consistency level we can achieve in an always available data store. In CC, basically the requests with causal dependencies need to be executed in the same order on all replicas.

Typically vector clocks are used to identify potential causal dependencies. One challenge with vector clocks though is, with more updates, the dependency trees can become fairly large. This in turn increases the overhead and staleness as the dependency trees need to be evaluated for applying the updates.

Sequential Consistency (SC) - This is a stricter consistency model then CC and cannot be achieved in always available systems. This model basically requires that all replicas execute all updates in same order.

Serializable operations can be interleaved as long as they appear to have happened in some sequential order. The system is free to choose the order provided it doesn't violate a set of rules (i.e. no two operations can access the same lock at the same time). This model is not possible in the presence of failures. The system either becomes unavailable or violates this consistency model.

Linearizability (LIN) - Linearizable operations are atomic with respect to a single object. It means you don't have multiple operations running concurrently against that object and potentially corrupting the state. This is hard to implement in a distributed system because of clock synchronization issues.

Finally while consensus protocols (ex: Paxos, Raft...) can guarantee that all replicas serialize requests in the same order, they cannot guarantee that all requests execute in the chronological order they arrived. For that we would need distributed locking but then that will lead to poor performance.

Wrap Up...
As you can see from above, quite a few variations and scenarios are possible. Ultimately it comes down to the application(s) the distributed data store is needed and whether or not those applications can deal with the consequences.

References -

  • Don’t Settle for Eventual Consistency, Wyatt Lloyd, Michael J, Michael Kaminsky, David G
  • Rethinking Eventual Consistency, Philip A Bernstein, Sudipto Das 
  • Consistency in Distributed Storage Systems, David Bermbach and Jorn Kuhlenkamp 
  • Eventually Consistent, Werner Vogels