Introduction

Causal consistency [1] is one of the consistency criteria that can be used on distributed databases as consistency criteria.

Distributed database provides causal consistency if read and write operations that are causally related are seen by every node of the distributed system in the same order. Concurrent writes may be seen in different order in diffrent nodes.  Causal consistency is waker than sequential consistency [2] but stronger than eventual consistency [3]. See earlier blog for more detailed description on eventual consistency https://blog.mariadb.org/eventually-consistent-databases-state-of-the-art/.

When a transaction performs a read operation followed later by a write operation, even on different object, the first read is said to be causally ordered before the write. This is because the value created by the write may have been dependent upon the result of the read operation. Similarly, a read operation is causally ordered after the earlier write on the same object that stored the data retrieved by the read. Also, even two write operations performed by the same node are defined to be causally ordered, in the order they were performed. Intuitively, after writing value v into object x, a node knows that a read of x would give v, so a later write could be said to be (potentially) causally related to the earlier one. Finally, causal order is transitive: that is, if operation A is (causally) ordered before B, and B is ordered before C, A is ordered before C.

Operations that are not causally related, even through other operations, are said to be concurrent. Consider following example:

This execution is causally consistent because read operations r3[x] is causally dependent on write w2[x] and read operation r4[x] is causally dependent on write w1[x]. Note that concurrent write operations may be seen on different order on different transactions or nodes.

Implementing Causal Consistency

Causal consistency can be reached by using Lamport clocks [4] or version vectors [5]. The causal consistency model is implemented by using multi-part timestamps, which are assigned to each object. These timestamps are stored on a vector that contains the version number of the object at each replica. This vector must be included (in the form of dependencies) in all update and query requests so that operations respect the causal ordering: an operation A can only be processed at a given node if all operations, on which the operation A causally depends, have already been applied at that node.

Each element in the vector clock corresponds to one host and the value of element indicates the number of messages sent from that host. The vector clock information is used to order or delay the delivery of messages if necessary, thus maintaining the required consistency. However, for maintaining the consistency of data items, we need information about writes on each data item, and maintaining a clock per data item can help. Therefore, instead of a vector clock of size N (number of hosts), we maintain a vector of size M (number of objects). The value of v[i] in the vector v contains the number of writes on data item i.

Each host maintains a vector of size M. Whenever it updates the value of an item, it increments the corresponding element and sends the vector along with the message of data item and new value to every site, which has a copy of the replica. When a host receives an update message, it delays the delivery of a message till each element in its vector is greater than or equal to the one that is piggybacked. After that, the updates to the data items are applied. In this case, the message overhead is O(M) and thus is independent of the number of hosts in the system.

If each message is an update message, it carries the new value of the data item rather than instructions. Then the delivery of an update on a data-item does not need not wait for the previous updates on the same item. This would not have been possible if vector clocks had been used. In that case, the delivery of a massage would have been delayed even for previous messages that are causally overwritten.

Applications and Databases using Causal Consistency

COPS and Eiger

COPS system [6] (Clusters of Order-Preserving Servers) introduces the causal+ consistency and is designed to support complex online applications that are hosted in a small number of large-scale data-centers, each of which is composed of front-end servers (clients of COPS) and back-end key-value data stores. Eiger [7] has a similar design but a different implementation.

COPS and Eiger support causality through a client library. Both systems replicate writes to geographically distributed data centers and enforce observed ordering. The observed ordering is enforced by delaying the write operations until all causally previous operations have been already applied at that data center.

COPS executes all read and write operations in the local data center in a linearizable [8] fashion, and then replicates data across data centers in a causal+ consistent order in the background. Figure 1 describes the high level architecture of COPS.

 

cops

Figure 1: Architecture of COPS and Eiger [6].

Similarly, the Eiger system provides the linearizability inside each data center and the causally-consistent data store based on a column-family data model to achieve better performance in a geo-distributed setting. All operations from clients are served from the local data center using a client library. The library mediates access to nodes in the local data center, executes the read and write transaction algorithms, and tracks causality and attaches dependencies to write operations. Each replica stores full replica of the database, and operations are handled locally. After an operation is executed locally, the operation is asynchronously pushed to remote data centers, but committed only after all causally dependent operations have been previously committed.

Bolts-on

Bailis et. al. in [9] propose a client-side middle-ware software called Bolt-on. This middle-ware guarantees only application-defined dependencies as an alternative to the causal consistency. Figure 2 describes the architecture of the Bolt-on middle-ware.

 

bolt

Figure 2: Architecture of Bolts-on [9].

The Bolt-on architecture assumes that the underlying data store handles most aspects of data management, including replication, availability, and convergence. In the architecture, the underlying data store locally uses the eventual consistency and allows a large space of read and write histories; the middle-ware handles causal dependencies, and consists of a set of rules, which limit the possible histories to the histories that obey the desired consistency model.

Application: MMORPG

In Massively Multi-player Online Role-Playing Games (MMORPG), players can cooperate with others in a virtual game world, and both players and different game words are naturally distributed. These systems manage large amounts of data, and the biggest problem is how to support data consistency. According to the CAP theorem, we have to sacrifice one of two properties: consistency or availability [10]. If an online game does not guarantee the availability, players’ requests may fail. If data is inconsistent, players may get data not conforming to the game logic, and this data can affect their operations. Therefore, it is important for the MMORPG environment to find a balance between the data consistency and the system availability. For this reason, we must analyze the data consistency requirements of MMORPG so as to find the balance [10].

Diao [10] has studied different consistency models for MMORPG and found that there indeed are part of data, where the causal consistency is an appealing choice: Game data. The game data contains e.g. the world appearance, the meta-data of non-player characters (the characters are created by game developers and controlled only by the game logic), the system configuration and game rules. This data is used by players and the game engine in the entire game, but can be only modified by the game developers. Consistency requirements for the game data are not so strict compared e.g. to the account data. Because e.g. a change of non-player character name or of the duration of bird animation may not be noticed by players.

Furthermore, some change of the game data needs to be delivered to all online players synchronously, e.g. a change of the word appearance, the weapon power, non-player characters, game rules and scripts. If there is inconsistency on these areas, it will cause errors on game display and logic errors on players. Therefore, some data needs to be stored on the server side and some on the client side. The game data on the client side could only synchronize with servers when a player logs in to or starts a game. For this reason, the causal consistency is required [10].

This could mean that when a player A uses the browser to connect with the game server, the game server will check the current local data and update the game data of the client side in the form of data packets. After updating, all future local accesses will return the updated value. Player B, who has not communicated with the game server, will still retain the outdated game data.

Game servers maintain the primary version of game data, and transfer it to client sides. Additionally, players on different game words cannot discuss to each other. Thus, the only need is to make sure that the game data is consistent in one game word in a time so that all players on that game word are handled equally. This requires using the strong consistency locally in the game word and the causal consistency among different game words. When the game data is modified by developers, the update value should be delivered synchronously to all replicates on that game word, and asynchronously to other game words.

While the possibility of using the causal consistency on MMORPG has been identified on research [10] to the authors’ knowledge there is no actual publications or other information that the causal consistency is actually used on MMORPG games (e.g. Lord of the Rings Online).

Application: Facebook

Facebook is an online Social networking service. After registering to use the site, users can create a user profile, add other users as friends, exchange messages, post status updates and photos, share videos and receive notifications when others update their profiles.

When you log into your account on Facebook, the server will show your own status messages and your friends’ status messages at that point in time. Status messages on Facebook may contain pictures, shared links and stories or your own messages. Naturally, your account data requires a strong consistency, but for status data the weaker consistency models are acceptable. During the time the user is online, the status updates of a user’s friends and of the user do not need to be strictly ordered, and the causal ordering is enough.

Thus when a user A sends a status update and a user B replies to that update, there is a causal order on the two updates. However, when users C and D do a totally unrelated update, the order these updates appear to users A and B is not relevant. This is because users A and B do not know in which order updates are performed.

The reason why the eventual consistency is not enough for Facebook status updates is that the eventual consistency does not require any ordering between writes. Consider a case, where the user A first sends a status update, and after few seconds A updates the first status update. With the eventual consistency, all friends of A could see only the first update, because the eventual consistency does not guarantee that first update is performed before the second one. In the causal consistency, as there is a read (by user A) of first update and then write (updated status from user A), these are causally related and all user A’s friends will naturally see second update.

Although the causal consistency is the possible consistency model for Facebook status updates and several similar distributed services containing status updates like LinkedIn, Twitter and Yahoo, to author’s knowledge there is not scientific or other literature that would show the causal consistency being really used.

Conclusions

The causal consistency model can be enforced with Lamport clocks. Transactions using the causal consistency are executed in an order that reflects their causally-related read/write operations’ order. Concurrent operations may be committed in different orders and their results can be read also in different orders.

Actually, the causal consistency can solve many problems, which cannot be solved in the eventual consistency, such as ordering operations. The causal consistency ensures that every sees operations in the same causal order, and this makes the causal consistency stronger than the eventual consistency. However, the causal consistency cannot support e.g. distributed integrity constraints.

Although there are a few promising systems developed on research to the authors’ knowledge there is no commercial or mature systems using the causal consistency model. For more thorough discussion see [11].

References

[1] Mustaque Ahamad , Gil Neiger , James E. Burns , Prince Kohli , P.W. Hutto: Causal Memory: Definitions, Implementation and Programming, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.3356

[2] Leslie Lamport: How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Trans. Comput. C-28,9 (Sept. 1979), 690-691.

[3] Werner Vogels: Eventually consistent. Communications of the ACM 52: 40. doi:10.1145/1435417.1435432

[4] Leslie Lamport: Time, clocks, and the ordering of events in a distributed system, Communications of the ACM, 21: 7, pp. 558–565, 1978.

[5] C. J. Fidge: Timestamps in message passing systems that preserve the partial ordering, in Theoretical Computer Science, 1988.

[6] W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen: Don’t settle for eventual: scalable causal consistency for wide-area storage with COPS, in Proceedings of the 23rd ACM Symposium on Operating Systems Principles, pp. 401–416, Portugal, October 23-26, 2011.

[7] W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen: Stronger semantics for low-latency geo-replicated storage,” in Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, pp. 313–328, Lombard, IL, USA, April 2-5, 2013.

[8] M. Herlihy and J. M. Wing, Linearizability: A correctness condition for concurrent objects, ACM
Trans. Program. Lang. Syst., 12:3, pp. 463–492, 1990.

[9] . Bailis, A. Ghodsi, J. M. Hellerstein, and I. Stoica,: Bolt-on causal consistency,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 761–772, New York, NY, USA, June 22-27, 2013

[10] Z. Diao, “Consistency models for cloud-based on-line games: the storage system’s perspective,” in
25th Workshop on Grundlagen von Datenbanken, Computing, Portland, Oregon, USA, July 16-19,
pp. 16–21, Ilmenau, Germany, May 28 – 31, 2013.

[11] Mawahib Musa Elbushra, Jan Lindström: Causal Consistent Databases, Open Journal of Databases (OJDB), 2:1, http://www.ronpub.com/publications/OJDB_2015v2i1n02_Elbushra.pdf

On February 26th 2015 the first meeting of Germany’s Rhein/Neckar User group will take place in Heidelberg.

Georg Richter, lead of MariaDB’s connectors team will give a talk “Don’t! No! Never!- common mistakes in relational database systems”.

Two special guests (Sanja Byelin, MariaDB Server Core Developer and Lawrin Novitsky, lead developer of MariaDB’s Connector/ODBC) will be available too for answering your questions and discussing/collecting your ideas around MariaDB, Percona, MySQL related and other database management systems.

Celebrate free speech, but also free beer (sponsored by MariaDB Corporation) and join on us on Feb. 26th in Heidelberg!

Location:
Pub “Gunnars”
Kettengasse 9
69117 Heidelberg Altstadt

Link: http://www.meetup.com/MariaDB-Rhein-Neckar-Usergroup-Meetup/events/220524015/