This article describes how I tuned MariaDB to give the best write throughput with SSD based storage.

When you have a write-heavy application writing into InnoDB, you will probably experience the InnoDB Checkpoint Blues. The effect manifests as stalls – short periods of time where the troughput falls to zero and I/O activity goes crazy. The phenomenon is well known and described i.e. here. More background about checkpointing can be found here.

The XtraDB fork of the InnoDB engine (and heart of Percona Server) contains some patches with the goal to overcome this odd behavior. MariaDB uses XtraDB as default InnoDB implementation, so we can configure some extra variables and hopefully avoid the checkpoint blues.

The first and most important setting is innodb_io_capacity. This is the approximate number of write operations that your hardware can do. If you don’t know that number, then you can easily find it out. Wait until you experience a stall and run iostat -x. You should see fairly high numbers for wrqm/s and w/s on the device holding your InnoDB table spaces. W/s is the number of write requests that hit the device, wrqm/s is the number of requests that could be merged. The sum of both is what InnoDB was effectively using.

In my case the disk is a RAID-0 of 3 SAS SSDs and I have seen up to 25.000 writes per second. So I set innodb_io_capacity=20000 and leave everything else at the default:

Here we can already see that the stall is gone. However after ~370 seconds we see a rather heavy I/O spike. This is, when XtraDB starts eager flushing when the checkpoint age reaches 75% of the log group capacity.

Now lets see if we can do better. The Percona Server documentation claims: innodb_adaptive_flushing_method=keep_average: … is designed for use with SSD cards. So lets try that next:

FAIL! Lets quickly revert that to the default.

Next thing to look at, is neighbor page flushing. This works like so: InnoDB pages are organized in blocks of 64 pages. When the checkpointing algorithm has picked a dirty page to be written to disk, it checks if there are more dirty pages in the block and if yes, writes all those pages at once. The rationale is, that with rotating disks the most expensive part of a write operation is head movement. Once the head is over the right track, it does not make much difference if we write 10 or 100 sectors.

The default setting is innodb_flush_neighbor_pages=area. For SSD there is however no head movement penalty. So it makes sense to experiment with the other settings. Lets try innodb_flush_neighbor_pages = cont next:

We still see a write spike around 400s run time, but it’s less pronounced than last time. Now lets finally try innodb_flush_neighbor_pages=none

And here we are! For SSD based storage the best parameter combination seems to be: innodb_flush_neighbor_pages = none, innodb_io_capacity = what_your_hardware_can_do and all others at the defaults.

If the storage uses rotating disks, we are probably better off with innodb_flush_neighbor_pages = cont or even innodb_flush_neighbor_pages = area, but at the moment I don’t have the hardware to test that.

Disclaimer: the above graphs use the sysbench OLTP multi table read/write benchmark. Data set size was 10G, InnoDB buffer pool 16G, InnoDB total log capacity 4G.

A few days ago MariaDB, MySQL and Percona all three released new versions of the 5.5 server. So I decided it’s time to run sysbench once more and compare the OLTP performance. The test candidates are:

  • MariaDB-5.5.24, using either XtraDB (default) or InnoDB
  • MySQL-5.5.25
  • Percona Server 5.5.24-26.0

For the benchmarks I used our trusty old pitbull machine which has 24 cpu cores, 24G of RAM and a nice RAID-0 composed of 3 SAS SSD.

The benchmark was sysbench-0.5 multi-table OLTP, using 8 tables with total 10G of data. InnoDB buffer pool was 16G, InnoDB log group capacity 4G (the maximum for MySQL). The mysqld process was constrained to 16 cores, sysbench to the other 8.

The only nonstandard tuning was for I/O scalability: innodb_io_capacity = 20000 and innodb_flush_neighbor_pages = none (for Percona Server and MariaDB/XtraDB). See my recent article on I/O tuning for an explanation.

The result for read-only OLTP is nicely balanced. This plot shows the median throughput:

also the response time is nicely flat (the plot shows the 99% quantile, means at most 1% of the requests was slower than that):

In fact the behavior shows nearly perfect scaling: up to 16 threads (= number of cpu cores for mysqld) the response time is constant. For higher concurrency we see a straight line in the double-logarithmic plot.

For read/write OLTP the picture looks quite similar, if we look at the transactions per second graph first:

The picture changes when we look at the response time (99% quantile again). The InnoDB based servers exhibit severe stalls caused by checkpointing as soon as we approach 16 client threads:

The XtraDB based servers handle the write load much better due to the better checkpointing algorithm.

Conclusion: if your workload is read-mostly OLTP, then it makes not much difference performancewise if you chose MariaDB, MySQL or Percona Server. So you can base your decision on other facts: costs, additional features (in MariaDB and Percona Server) or your internal know how.

The servers based on InnoDB (MySQL, MariaDB optionally) show slightly higher throughput, but suffer checkpointing stalls at high write load. MariaDB gives you the highest flexibility here, as you can chose between XtraDB and InnoDB.

Disclaimer: the raw benchmark results as well als scripts and config files used can be found in the public mariadb-benchmarks repository at Launchpad. There are also additional plots.

If you’re not on the announce mailing list (low traffic) or following us on Facebook, you’ve missed the announcement of MariaDB 5.5.24 (release notes, changelog). Highlights include:

  • Includes all changes up to MariaDB 5.3.7 and MySQL 5.5.24
  • We’re providing RPM packages for RHEL/CentOS 5 & 6, as well as a YUM repository
  • Ubuntu 12.04 has been released, so we have “Precise” packages
  • There are now BSD 9 and Mac OS X 10.5 binaries as well

Many new ways to get and use MariaDB, download MariaDB 5.5.24 and let us know how you’re liking it.