When dealing with high performance, low latency storage devices, such as SSD cards, one finds bottlenecks in new places. This is a story about such a bottle neck and how to work around it. Continue reading
Oracle has now launched MySQL-5.6.10-GA, so it is time to come up with some new benchmark results. The test candidates in this benchmark run are
The 5.5 versions are in because I wanted to check for any regressions. In the past we have often seen performance regressions in newer versions which were caused by new features.
This time the benchmark was run on a different box. The main difference is that this box does not have SSD but a high performance RAID-5 with 512M of battery-backed cache. Besides that the machine has 16 cores out of which 12 were used for mysqld and the other 4 for sysbench.
The benchmark uses sysbench-0.5 OLTP with 8 tables and 10G worth of data. InnoDB buffer pool was 16G, InnoDB log group capacity 4G (the maximum for MySQL-5.5). The different disk system required different InnoDB configuration:
- innodb_io_capacity = 1000 (was 20000 for SSD)
- innodb_flush_neighbors = 1 (was 0 for SSD)
Now for the results. OLTP read only comes first:
And here is the first surprise: MySQL-5.6 behaves significantly different. It competes well up to 8 threads, it even wins 16 threads. But at higher concurrency performance drops off rapidly, even compared to MySQL-5.5. MariaDB-10.0 shows also a slight drop in performance compared to MariaDB-5.5, but it’s much less pronounced.
The response time graph is nice and smooth though:
Both MySQL-5.6 and MariaDB-10.0 look a little better which means they distribute cpu cycles more evenly on concurrent requests.
Disclaimer: no thread pool was used in this benchmark. The Oracle implementation of the thread pool is closed source and thus cannot be benchmarked or used by anybody. It seemed a bit unfair to use the MariaDB thread pool under those cirumstances.
If you want to see the impact of the MariaDB thread pool, have a look at the benchmarks published previously:
Next stop: OLTP read/write:
The picture is very similar. Both MySQL-5.6 and MariaDB-10.0 show a performance drop, compared to the 5.5 versions. For MySQL the drop is more than 10% and thus rather heavy.
However it’s a well known fact that MySQL-5.5 exhibits severe write stalls under high load when InnoDB starts synchronous flushing. The response time graph is good to spot this:
This is the good news. While the 5.5 versions both show heavy write stalls at 64 threads and more, the behavior is much less pronounced with MySQL-5.6 and MariaDB-10.0. So it seems the new adaptive flushing algorithm is working well.
There is however one problem here: if you use multiple buffer pool instances, then you see write stalls more often. For the above results I have run the read-only tests with 16 buffer pools and the read-write tests with only 1.
- MySQL-5.6 shows a rather severe performance regression, especially at higher concurrency levels. This does not match the results published by Oracle. I can only speculate why the results are so different, but I guess it’s the (closed source) thread pool and maybe the fact that Oracle benchmarks were done on much bigger hardware.
- with a single buffer pool you don’t have to be afraid of write stalls any more. Also MySQL-5.6 allows now up to 512G redo log capacity which further reduces the odds to run into synchronous flushing (MariaDB-5.5 lifted this limit with XtraDB already)
As always the scripts used for the benchmark as well as the results are available from launchpad:
I invite anybody to rerun this benchmark and share the results.
This article describes how I tuned MariaDB to give the best write throughput with SSD based storage.
When you have a write-heavy application writing into InnoDB, you will probably experience the InnoDB Checkpoint Blues. The effect manifests as stalls – short periods of time where the troughput falls to zero and I/O activity goes crazy. The phenomenon is well known and described i.e. here. More background about checkpointing can be found here.
The XtraDB fork of the InnoDB engine (and heart of Percona Server) contains some patches with the goal to overcome this odd behavior. MariaDB uses XtraDB as default InnoDB implementation, so we can configure some extra variables and hopefully avoid the checkpoint blues.
The first and most important setting is innodb_io_capacity. This is the approximate number of write operations that your hardware can do. If you don’t know that number, then you can easily find it out. Wait until you experience a stall and run iostat -x. You should see fairly high numbers for wrqm/s and w/s on the device holding your InnoDB table spaces. W/s is the number of write requests that hit the device, wrqm/s is the number of requests that could be merged. The sum of both is what InnoDB was effectively using.
In my case the disk is a RAID-0 of 3 SAS SSDs and I have seen up to 25.000 writes per second. So I set innodb_io_capacity=20000 and leave everything else at the default:
Here we can already see that the stall is gone. However after ~370 seconds we see a rather heavy I/O spike. This is, when XtraDB starts eager flushing when the checkpoint age reaches 75% of the log group capacity.
Now lets see if we can do better. The Percona Server documentation claims: innodb_adaptive_flushing_method=keep_average: … is designed for use with SSD cards. So lets try that next:
FAIL! Lets quickly revert that to the default.
Next thing to look at, is neighbor page flushing. This works like so: InnoDB pages are organized in blocks of 64 pages. When the checkpointing algorithm has picked a dirty page to be written to disk, it checks if there are more dirty pages in the block and if yes, writes all those pages at once. The rationale is, that with rotating disks the most expensive part of a write operation is head movement. Once the head is over the right track, it does not make much difference if we write 10 or 100 sectors.
The default setting is innodb_flush_neighbor_pages=area. For SSD there is however no head movement penalty. So it makes sense to experiment with the other settings. Lets try innodb_flush_neighbor_pages = cont next:
We still see a write spike around 400s run time, but it’s less pronounced than last time. Now lets finally try innodb_flush_neighbor_pages=none
And here we are! For SSD based storage the best parameter combination seems to be: innodb_flush_neighbor_pages = none, innodb_io_capacity = what_your_hardware_can_do and all others at the defaults.
If the storage uses rotating disks, we are probably better off with innodb_flush_neighbor_pages = cont or even innodb_flush_neighbor_pages = area, but at the moment I don’t have the hardware to test that.
Disclaimer: the above graphs use the sysbench OLTP multi table read/write benchmark. Data set size was 10G, InnoDB buffer pool 16G, InnoDB total log capacity 4G.