Evaluating the performance of database systems is a very demanding task. There are a lot of hard choices to be made, e.g.:

  • What operating system and operating system version is to be used
  • What configuration setup is to be used
  • What benchmarks are to be used and how long are the warm-up and measure times
  • What test setups are to be used
  • What version of the database management system is used
  • What storage engine is used

While performance evaluation is mostly machine time, there is still a lot of hard work for the human monitoring the tests. In this blog post we have made following choices:

  • We’re using an Intel Xeon E5-2690 @ 2.9GHz CPU containing 32-cores and Linux 3.4.12 with 132G main memory. The database is stored on a Fusion-IO ioDrive2 Duo 2.41TB Firmware v7.2.5, rev 110646, using Driver 3.3.4 build 5833069. The database file system is using NVMFS and all test logs and outputs are stored on EXT4.
  • We have selected following benchmarks:
    • LinkBench [1] which is based on traces from production databases that store social graph data at Facebook, a major social network. LinkBench provides a realistic and challenging test for persistent storage of social and web service data.
    • Percona provides an on-line transaction processing (OLTP) benchmark which is an implementation of TPC Benchmark C [2] like workload, which simulates a warehouse application. Approved in July of 1992, TPC Benchmark C is an on-line transaction processing (OLTP) benchmark. TPC-C is more complex than previous OLTP benchmarks such as TPC-A because of its multiple transaction types, more complex database and overall execution structure. TPC-C involves a mix of five concurrent transactions of different types and complexity either executed on-line or queued for deferred execution. The database is comprised of nine types of tables with a wide range of record and population sizes. TPC-C is measured in transactions per minute (tpmC).
  • We are using LinkBench with 10x database size i.e. about 100G and TPC-C with 1000 warehouses. InnoDB buffer pool is set to 50G, thus on both benchmarks the database does not fit in main-memory.
  • We selected alpha versions of MariaDB 10.1.0 and MySQL 5.7.4-labs-tplc, because we are interested how their development releases perform. We selected exactly these versions because they both include support for FusionIO hardware, atomic writes [3,4] and page compression [5].
  • We decided to use InnoDB storage engine on both servers tested.
  • We use FusionIO hardware atomic writes [3,4] on all tests.

Our reasons for these choices are the following:

  • We want to evaluate performance on modern hardware not some old soon-to-be-deprecated hardware. The selected hardware is something that could be used when new hardware is acquired today for a database server.
  • We do not want to evaluate the performance of any corner cases, i.e. single threaded client, all data in main-memory, ridiculous high or low number of client threads, only read-only or write-only workloads, and etc…. These cases are interesting for some applications but not all.
  • We want to use benchmarks that are well known and approved by the database community.
  • XtraDB is not available for MySQL and we have evidence that XtraDB does not perform that well on FusionIO hardware.
  • We use atomic writes on all tests because it provides better performance and thus we have disabled the doublewrite buffer, because when using atomic writes it is unnecessary.

MySQL/MariaDB configuration

We can’t use exactly the same configuration file (my.cnf) for both MariaDB and MySQL because there are some differences on the naming of the parameters. So, below are the configuration variable values so that first common variables and their values are shown followed with tag oracle containing MySQL parameters and with tag mariadb for MariaDB parameters. All parameters are selected so that they are fair and representative, but we do not claim that they optimal values for this hardware or these benchmarks.

Benchmark setup

As mentioned before, for LinkBench we use 10x database size, this will create a database size of about 100G. The load command is as follows

The LinkBench measure phase uses 64 client threads and we use maxtime 86300 seconds, i.e. 24 hours, and warm-up is default 180 seconds. The measure command is as follows:

For TPC-C we use 1000 warehouses, thus the load command is as follows:

We use the default number of threads, i.e. 64 client threads (if not something else mentioned). We have run TPC-C with time limit 10800 seconds, i.e. 3 hours, and warmup is 30 seconds, i.e. the measure command is as follows:


Performance results: LinkBench

Both servers contain support for atomic writes [3,4] and page compression [5] for FusionIO hardware. Below we compare the operations = transactions/second reported using this feature:


Both servers provide stable performance after the buffer pool is warm and full. In both servers the performance drops after the buffer pool is full and we really need to fetch pages from FusionIO hardware. However, MariaDB provides clearly superior performance after the buffer pool is full. In this test we used the LZ4 compression method that is provided by both servers. MariaDB can provide about 30% better performance and the difference at the end of the test is 5654 operations/second.

Both servers provide the following compression methods if available on the system:


MariaDB also provides support for the following compession methods if available on the system:

Below we present the same results using uncompressed tables on both servers.


Both servers start with very good performance, however performance slowly drops when the buffer pool is full and write amplification starts to slow down the I/O performance. This is due to Write amplification (WA), which  can happen on Flash memory and Solid-state drives (SSDs) where the actual amount of physical information written is a multiple of the logical amount intended to be written. Because flash memory must be erased before it can be rewritten, the process to perform these operations results in moving (or rewriting) user data and metadata more than once. This multiplying effect increases the number of writes required over the life of the SSD which shortens the time it can reliably operate. The increased writes also consume bandwidth (computing) to the flash memory which mainly reduces random write performance to the SSD [6]. MariaDB provides about 8% better performance and at the end of the tests the difference is 2225 operations/second.

The next research point is row compressed tables. Because both MySQL and MariaDB contain the same, or almost the same, implementation, we do not expect any significant differences on performance. Results below:


Results are almost identical as expected and the small differences seen are clearly inside a statistical variation. Finally, let’s combine all previous results to see the differences between uncompressed, row-compressed, and page-compressed results.


As expected, the best performance is achieved by using uncompressed tables. Note that this result does not contradict the results presented on [5] because here we use atomic writes and have disabled the doublewrite buffer on all tests. At the end of the measure time, page compressed performance is about 22% less than uncompressed performance and the difference is 5500 operations/second. Similarly, at end of the measure time, row compressed performance is about 68% less than uncompressed performance and row compressed performance is about 37% less than page compressed performance. Finally, the difference is 11800 operations/second between uncompressed and row compressed and 6568 operations/second between page compressed and row compressed.

For our final test, we measured the load time used to create the 10x LinkBench database using MariaDB 10.1.



Clearly, creating a page-compressed database has the shortest time followed by an uncompressed database. Creating a row compressed database is significantly slower than either of those.

Performance results: TPC-C

We again start from presenting the results using TpmC from page compression feature [5] using a different number of client treads:



Again MariaDB provides clearly better performance on all tests. The server used has 32 cores and that is the reason why performance numbers drop when the number of client treads are increased to higher than 32 cores. Let’s see the performance results from New Order transactions and using the default number of client threads:



While MySQL occasionally can provide a higher number of completed New Order transactions the variation is significantly higher and the median is significantly lower than the MariaDB results. MariaDB can provide quite stable performance over time after the buffer pool is warm. Clearly, the MySQL flush implementation can’t keep up with very fast FusionIO hardware. The bottleneck on all tests on this post is not the FusionIO hardware, instead the bottleneck is in the MySQL/MariaDB implementation.

In the following graph we have compared the results using uncompressed tables and page compressed tables while varying the number of TPC-C test suite client threads from 4 to 512:


When client threads are between 4 and 32 MariaDB can provide clearly better performance for both uncompressed and page compressed tables. Actually, MariaDB performance for page compressed tables is the same or even better than MySQL performance for uncompressed tables. When the number of client threads is increased, MariaDB performance drops earlier than MySQL performance but not as significantly as MySQL performance seen on the final results using 512 client threads. With high numbers of client threads, MariaDB page compressed can offer significantly better performance compared with uncompressed tables.


First of all, both alpha versions of these database servers were very stable. We did not encounter even a single crash or other problem while the performance evaluations were being done. This is a very good sign for the quality of these development releases.

Secondly, there is clearly a need to identify and fix new bottlenecks introduced by the significantly faster FusionIO storage device. In all tests we could not even nearly use the full throughput provided by the FusionIO hardware. FusionIO hardware can be seen as a new level on memory hierarchy [7] that requires new kinds of optimizations and research in a similar way as main-memory has required [8]. Some early research results can be seen on [9].

Finally, MariaDB can clearly offer better performance to applications having a similar workload as presented by LinkBench and TPC-C.


[1] Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: a database benchmark based on the Facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD ’13). ACM, New York, NY, USA, 1185-1196.


[3] MariaDB Introduces Atomic Writes,

[4] Xiangyong Ouyang and David W. Nellans and Robert Wipfel and David Flynn and Dhabaleswar K. Panda: Beyond block I/O: Rethinking traditional storage primitives, in Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pages 301-311,

[5] Significant performance boost with new MariaDB page compression on FusionIO,

[6] Write amplification.

[7] Memory hierarchy.

[8] Stefan Manegold, Peter A. Boncz, and Martin L. Kersten. 2000. Optimizing database architecture for the new bottleneck: memory access. The VLDB Journal 9, 3 (December 2000), 231-246. DOI=10.1007/s007780000031

[9] Seok-Hoon Kang, Dong-Hyun Koo, Woon-Hak Kang and Sang-Won Lee: A Case for Flash Memory SSD in Hadoop Applications,
International Journal of Control and Automation, Vol. 6, No. 1, February, 2013.


Eventual consistency [1] is a consistency model, which is used in many large distributed databases. Such databases require that all changes to a replicated piece of data eventually reach all affected replicas. Furthermore, the conflict resolution is not handled in these databases, and the responsibility is pushed up to the application authors in the event of conflicting updates. Eventual consistency is a specific form of weak consistency: the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value [1]. If no failures occur, the maximum size of the inconsistency window can be determined based on the factors such as communication delays, the load on the system, and the number of replicas involved in the replication scheme. We earlier in studied is the MariaDB eventually consistent and found out that in the most of the configurations it can be.

DEFINITION: Eventual consistency.

  • Eventual delivery: An update executed at one node eventually executes at all nodes.
  • Termination: All update executions terminate.
  • Convergence: Nodes that have executed the same updates eventually reach an equivalent state (and stay).

EXAMPLE: Consider a case where data item R=0 on all three nodes. Assume that we have the following sequence of writes and commits: W(R=3) C W(R=5) C W(R=7) C in node 0. Now read on node 1 could return R=5 and read from node 2 could return R=7. This is eventually consistent as long as eventually read from all nodes return the same value. Note that this final value could be R=5. Eventual consistency does not restrict the order in which the writes must be executed.

In this blog we shortly introduce database management systems using eventual consistency and we evaluate reviewed databases based on popularity, maturity, consistency and use cases. Based on this we present advantages and disadvantages of eventual consistency. A longer and more thorough version of this research has been published on [13].

Databases using eventual consistency

MongoDB [2] is a cross-platform document-oriented NoSQL database system, and uses BSON to store data as its data model. MongoDB is free and open source software, and has official drivers for a variety of popular programming languages and development environments. Web programming language Opa also has built- in support for MongoDB, and offers a type-safety layer on top of MongoDB. There are also a large number of unofficial or community-supported drivers for other programming languages and frameworks.

CouchDB [3] is an open source NoSQL database, and uses JSON as its data mdoel, JavaScript as its query language and HTTP as API. CouchDB was first released in 2005 and later became an Apache project in 2008. One of CouchDB’s distinguished features is multi-master replication.

Amazon SimpleDB [4] is a distributed database written in Erlang by It is used as a web service with Amazon Elastic Compute Cloud (EC2) and Amazon S3, and is part of Amazon Web Services. It was announced on December 13, 2007.

Amazon DynamoDB [5,6] is a fully managed proprietary NoSQL database service that is offered by as part of the Amazon Web Services portfolio. DynamoDB uses a similar data model as Dynamo, and derives its name also from Dynamo, but has a different underlying implementation: DynamoDB has a single master design. DynamoDB was announced by Amazon CTO Werner Vogels on January 18, 2012.

Riak [7] is an open-source, fault-tolerant key-value NoSQL database, and implements the principles from Amazon’s Dynamo. Riak uses the consistent hashing to distribute data across nodes, and buckets to store data.

DeeDS [8] is a prototype of a distributed, active real-time database system. It aims to provide a data storage for real-time applications, which may have hard or firm real-time requirements. As database, DeeDS uses OBST (Object Management system of STONE) and TDBM (DBM with transactions), which replaces the OBST storage manager. One main reason for introducing TDBM is to add support of nested transaction into DeeDS. TDBM is a transaction processing data store with a layered architecture, and provides DeeDS with:

  • Nested Transactions
  • Volatile and persistent databases
  • Support for very large data items

ZATARA database [9] is a distributed database engine that features an abstract query interface and plug-in-able internal data structures. ZATARA is designed for the framework, where it is flexible enough to be used by any software application, and guarantees data integrity and achieves high performance and scalability.

Both DeeDS and ZATARA are the result from research projects and not yet mature enough for production usage.

We use the following criteria to evaluate the database systems that support the eventual consistency:

  • Popularity
  • Maturity
  • Consistency
  • Use cases


We evaluate the popularity of the presented database systems based on DB Engines ranking ( The DB Engines Ranking ranks database management systems according to their popularity. At the beginning of 2014, MongoDB was ranked 7th with a score of 96.1. In June 2014, it is ranked 5th with the score 231.44. At the beginning of 2014, CouchDB was ranked 16th. In June 2014,CouchDB is ranked 21st with the score 22.78. At the beginning of 2014, Riak was ranked 27th. In June 2014, Riak is ranked 30 th with the score 10.82. At the beginning of 2014, DynamoDB was ranked 35th with a score of 7.20. In June 2014 DynamoDB is ranked 32nd with the score 9.58. At the beginning of 2014, SimpleDB was ranked 46th. In June 2014 SimpleDB, is ranked 52th with the score 2.94.

According to this ranking, MongoDB is clearly the most popular and widely known database system supporting the eventual consistency. As a reference MySQL is ranked 2nd on June 2014 and MariaDB is ranked  28th.


Based on the authors’ research, MongoDB is clearly the most mature database system using eventual consistency. It has a large user and customer base and is actively developed. MongoDB has official drivers for several popular programming languages and development environments. There are also a huge number of unofficial or community-supported drivers for other programming languages and frameworks.

Riak is available for free under the Apache 2 License. In addition, Riak uses Basho Technologies to offer commercial licenses with subscription support and the ability for MDC (Multi Data Center) Replication. Riak has official drivers for Ruby, Java, Erlang, Python, PHP, and C/C++. There are also many community-supported drivers for other programming languages and frameworks.

CouchDB is a NoSQL database. CouchDB uses JSON to store data, supports MapReduce query functions in JavaScript and Erlang. CouchDB was first released in 2005 and became an Apache project in 2008. The replication and synchronization features of CouchDB make it ideal for mobile devices, where network connection is not guaranteed but the application must keep on working offline. CouchDB is also suited for applications with accumulating, occasionally changing data, on which pre-defined queries are to be run and where versioning is important (CRM, CMS systems, for example). The master-master replication is an especially interesting feature of CouchDB, which allows easy multi-site deployments. CouchDB is clearly a mature system and used in production environments.

DynamoDB is a commercially managed NoSQL database service, offered by as part of the Amazon Web Services portfolio. There is also a local development version of DynamoDB, with which developers can test DynamoDB-backed applications locally. The programming languages with DynamoDB binding include Java, Node.js, .NET, Perl, PHP, Python, Ruby, and Erlang. Therefore, DynamoDB is a mature and production-quality service.

Amazon SimpleDB is on the Beta phase and thus we do not suggest its use in production. ZATARA and DeeDS are in the research phase and there are no publicly available systems for testing. Therefore, they are at most in the Alpha phase and wedo not recommend their use in production as well.


From earlier research, we know that Amazon SimpleDB’s inconsistency window for eventually consistent reads was almost always less than 500ms [10], while another study found that Amazon S3’s inconsistency window lasted up to 12 seconds [10]. However, to the author’s knowledge, there is not a widely known and accepted workload for the databases using eventual consistency. Therefore, the comparison of consistency or inconsistency must be based solely on system features. From a point of view of consistency, Riak offers the most configurable consistency feature, which allows selecting the consistency level.

MongoDB, SimpleDB and DynamoDB offer the possibility to read the latest version of the data time, thus providing strong consistency as well as eventual consistency. All other systems reviewed offer only eventual consistency, and may return an old version of the data when performing read operations.

Use Cases

MongoDB has been successfully used on operational intelligence, especially on storing log data, creating pre-aggregated reports and in hierarchical aggregation. Furthermore, MongoDB has been used on product management system to store product catalogs, manage inventory and category hierarchy. In content management systems, MongoDB is used to store metadata, asset management and store user comments on content, like blog posts and media.

Riak has been successfully used on simple high read-write applications for session storage, serving advertisements, storing log data and sensor data. Furthermore, Riak has been used in content management and social applications for storing user accounts, user settings and preferences, user events and timelines, and articles and blog posts.

The replication and synchronization capabilities of CouchDB are well suited in mobile environment, where network connection is not guaranteed, but the application must keep on working offline. CouchDB is also ideal for the applications with accumulating, occasionally changing data, on which pre-defined queries are to be run, and where versioning is important. CRM, CMS systems are the examples of such applications. CouchDB has an especially interesting feature: master-master replication, which allows easy multi-site deployments.

SimpleDB is well suited for logging, online games, and metadata indexing. However, one cannot use SimpleDB for aggregate reporting: there are no aggregate functions such as SUM, AVERAGE, MIN, etc. in SimpleDB. Metadata indexing is a very good use case for SimpleDB. One can also have data stored in S3 and use SimpleDB domains to store pointers to S3 objects with more information about them. Another class of applications, for which SimpleDB is ideal, is sharing information between isolated components of an application. SimpleDB also provides a way to share indexed information, i.e., the information that can be searched. A SimpleDB item is limited in size, but one can use S3 for storing bigger objects, such as images and videos, and point to them from SimpleDB. This could be called the metadata indexing.

Advantages and Disadvantages


Eventual consistency is easy to achieve and provides some consistency for the clients[11]. Building an eventually consistent database has two advantages over building a strongly-consistent database: (1) It is much easier to build a system with weaker guarantees, and (2) database servers separated from the larger database cluster by a network partition can still accept writes from applications. Unsurprisingly, the second justification is the one given by the creators of the first generation NoSQL systems that adopted eventual consistency.

Eventual consistency is often strongly consistent. Several recent projects have verified the consistency of real-world eventually consistent stores [10].


While eventual consistency is easy to achieve, the current definition is not precise [11]. Firstly, from the current definition, it is not clear what the state of eventually consistent databases is. A database always returning the value 42 is eventually consistent, even if 42 were never written. One possible definition would be that eventually all accesses return the last updated value, and thus the database cannot converge to an arbitrary value[1]. Even this new definition has another problem: what values can be returned before the eventual state of the database is reached? If replicas have not yet converged, what guarantees can be made on the data returned? In this case, the only possible solution would be to return the last known consistent value. The problem here is how to know what version of data item was converged to the same state on all replicas[1].

Eventual consistency requires that writes to one replica will eventually appear at other replicas, and that if all replicas have received the same set of writes, they will have the same values for all data. This weak form of consistency does not restrict the ordering of operations on different keys in any way, thus forcing programmers to reason about all possible orderings and exposing many inconsistencies to users. For example, under eventual consistency, after Alice updates her profile, she might not see that update after a refresh. Or, if Alice and Bob are commenting back-and-forth on a blog post, Carol might see a random non-contiguous subset of that conversation. When an engineer builds an application on an eventually consistent database, the engineer needs to answer several tough questions every time when data is accessed from the database:

  • What is the effect on the application if a database read returns an arbitrarily old value?
  • What is the effect on the application if the database sees modification happen in the wrong order?
  • What is the effect on the application if a client is modifying the database as another tries to read it?
  • And what is the effect that my database updates have on other clients, which are trying to read the data?

That is a hard list,and developers must work very hard in order to answer these questions. Essentially, an engineer needs to manually do the work to make sure that multiple clients do not introduce inconsistency between nodes.

One way to address these questions at least partly is to use a stronger version of eventual consistency.

DEFINITION: Strong Eventual consistency.

  1. Eventual delivery: An update executed at one node eventually executes at all nodes.
  2. Termination: All update executions terminate.
  3. Strong Convergence: Nodes that have executed the same updates have equivalent state.

To the authors’ knowledge, there is currently no database system that uses strong eventual consistency. This could be because it is harder to implement. Eventual consistency represents a clear weakening of the guarantees that traditional databases provide, and places a requirement for software developers. Designing applications, which maintain correct behavior even if the accuracy of the database cannot be relied on, is hard.

In fact, Google addressed the pain points of eventual consistency in a recent paper on its F1 database [12] and noted: “We also have a lot of experience with eventual consistency systems at Google. In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date. We think this is an unacceptable burden to place on developers and that consistency problems should be solved at the database level.”


Clearly, there are several very mature and popular database systems using eventual consistency. Most of these are actively developed and there is a strong community behind them. We believe that we will see more database systems in the future using eventual consistency or strong eventual consistency.


[1] Vogels, W.: Scalable Web services: Eventually Consistent, ACM Queue, vol. 6, no. 6, pp. 14-16, October 2009.

[2] MongoDB: Agile and Scalable.

[3] Anderson, C. J., Lehnardt, J., and Slater, N.: CouchDB: The Definitive Guide. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. January 2010, First Edition

[4] Amazon SimpleDB,

[5] DynamoDB,

[6] Sivasubramanian, S.: Amazon DynamoDB: a seamlessly scalable non-relational database service. In proceeding SIGMOD International Conference on Management of Data, ACM New York, NY, USA, 2012, pp. 729-730.

[7] Riak,

[8] F. Andler, J. Hansson, J. Mellin, J. Eriksson, and B. Eftring: An overview of the DeeDS real-time database architecture. In Proc. of 6th International Workshop on Parallel and Distributed Real-Time Systems, 1998

[9] Bogdan Carstoiu and Dorin Carstoiu: Zatara, the Plug-in-able Eventually Consistent Distributed Database. AISS, 2(3), 2010.

[10] Bermbach, D. and Tai S: Eventual Consistency: How soon is eventual? Proceedings of the 6th Workshop on Middleware for Service Oriented Computing, pp1-5, 2011.

[11] Bailis, P., and Ghodsi, A: Eventual consistency today: limitations, extensions, and beyond. In communications of the ACM vol. 56, no. 5, pp.55-63, May 2013.

[12] Shute Jeff, Vingralek Radek, Samwel Bart, Handhy Ben, Whipkey Chad, Rollins Eric, Oancea Mircea, Littlefield Kyle, Menestina David, Ellner Stephqan, Cieslewicz John, Rae Ian, Stancescu Traian, Apte Himani: F1: A Distributed SQL Database That Scales, VLDB, 2013.

[13] Mawahib Musa Elbushra, Jan Lindström: Eventual Consistent Databases: State of the Art, OJDB 2014, 1(1), Pages 26-41, 2014,

Today marks a milestone in terms of the MariaDB project – going forward, the MariaDB project plans to use Github and git for source code management. The migration happens from Launchpad and the bzr tool.

The 10.1 server development (under heavy development now) will happen on Github. You can check it out here: Feel free to watch, star or even fork the code, and send us contributions!

Previous maria-captains should now provide their Github IDs so that they can be accorded similar status. Send the IDs to the maria-developers mailing list.

The project eventually wants to move the 10.0, 5.5, 5.3, 5.2, and 5.1 trees to Github, but in the meantime, fixes still go into the 10.0 or 5.5 trees on Launchpad using bzr. This may change in the future.

This completes a top voted feature, MDEV-5240. We’re not the only project interested in this of course – there was definitely inspiration from Emacs and from Mozilla!