When dealing with high performance, low latency storage devices, such as SSD cards, one finds bottlenecks in new places. This is a story about such a bottle neck and how to work around it. Continue reading
Oracle has now launched MySQL-5.6.10-GA, so it is time to come up with some new benchmark results. The test candidates in this benchmark run are
The 5.5 versions are in because I wanted to check for any regressions. In the past we have often seen performance regressions in newer versions which were caused by new features.
This time the benchmark was run on a different box. The main difference is that this box does not have SSD but a high performance RAID-5 with 512M of battery-backed cache. Besides that the machine has 16 cores out of which 12 were used for mysqld and the other 4 for sysbench.
The benchmark uses sysbench-0.5 OLTP with 8 tables and 10G worth of data. InnoDB buffer pool was 16G, InnoDB log group capacity 4G (the maximum for MySQL-5.5). The different disk system required different InnoDB configuration:
- innodb_io_capacity = 1000 (was 20000 for SSD)
- innodb_flush_neighbors = 1 (was 0 for SSD)
Now for the results. OLTP read only comes first:
And here is the first surprise: MySQL-5.6 behaves significantly different. It competes well up to 8 threads, it even wins 16 threads. But at higher concurrency performance drops off rapidly, even compared to MySQL-5.5. MariaDB-10.0 shows also a slight drop in performance compared to MariaDB-5.5, but it’s much less pronounced.
The response time graph is nice and smooth though:
Both MySQL-5.6 and MariaDB-10.0 look a little better which means they distribute cpu cycles more evenly on concurrent requests.
Disclaimer: no thread pool was used in this benchmark. The Oracle implementation of the thread pool is closed source and thus cannot be benchmarked or used by anybody. It seemed a bit unfair to use the MariaDB thread pool under those cirumstances.
If you want to see the impact of the MariaDB thread pool, have a look at the benchmarks published previously:
Next stop: OLTP read/write:
The picture is very similar. Both MySQL-5.6 and MariaDB-10.0 show a performance drop, compared to the 5.5 versions. For MySQL the drop is more than 10% and thus rather heavy.
However it’s a well known fact that MySQL-5.5 exhibits severe write stalls under high load when InnoDB starts synchronous flushing. The response time graph is good to spot this:
This is the good news. While the 5.5 versions both show heavy write stalls at 64 threads and more, the behavior is much less pronounced with MySQL-5.6 and MariaDB-10.0. So it seems the new adaptive flushing algorithm is working well.
There is however one problem here: if you use multiple buffer pool instances, then you see write stalls more often. For the above results I have run the read-only tests with 16 buffer pools and the read-write tests with only 1.
- MySQL-5.6 shows a rather severe performance regression, especially at higher concurrency levels. This does not match the results published by Oracle. I can only speculate why the results are so different, but I guess it’s the (closed source) thread pool and maybe the fact that Oracle benchmarks were done on much bigger hardware.
- with a single buffer pool you don’t have to be afraid of write stalls any more. Also MySQL-5.6 allows now up to 512G redo log capacity which further reduces the odds to run into synchronous flushing (MariaDB-5.5 lifted this limit with XtraDB already)
As always the scripts used for the benchmark as well as the results are available from launchpad:
I invite anybody to rerun this benchmark and share the results.
We haven’t posted any Windows benchmarks for a while, and MariaDB for Windows contains some specific improvements which might not be widely know since we haven’t talked much about them yet. This post is an attempt to fix that. We’ll also share current MySQL 5.5 numbers.
My setup is an 8 core 2 socket server (yes, a little bit dated for today, but it is the best machine I have at my disposal), 10K SAS disks with RAID1. I ran sysbench 0.4 single table / 1,000,000 records. I ran the benchmark over a network, with the number of concurrent clients ranging from 4 to 4096.
Here is what OLTP-readonly throughput looks like:
- For most of the tests, MariaDB’s throughput is approx 10% higher than MySQL’s
- For 4096 concurrent clients, MariaDB’s throughput is better than MySQL 5.5 by 476% (2382 vs 413 TPS).
Many people rightfully remark that throughput does not necessarily represent performance as a whole, and that in OLTP benchmarks, fairness also matters. It actually often matters more than throughput. I agree, so below are the results for 95% of response time, meaning that 95% transactions were finished under the given time.
OLTP Readonly response time, 95th percentile
As the above table shows, MariaDB 5.5 outperforms MySQL 5.5 in both row throughput, and in fairness. Fairness (response time) results are in fact more impressive than the row throughput.
But, why does MariaDB perform better than MySQL 5.5 on Windows in readonly tests? Why it is at all better than 5.5? It is, after all, based on the same codebase, so performance should be about equal. The answer to this is not our optimizer enhancements, nor XtraDB serving as InnoDB in MariaDB. The answer is the MariaDB threadpool. This is my feature in 5.5 and I’m glad it performs so well, and I’m glad that it is switched “on” by default on Windows (so, Windows users, you do not have to do anything to switch it on).
Still, why does using threadpool result in such better performance? The answer is that the implementation delegates all responsibility for sizing the pool and running callbacks to the Windows native threadpool, so it is a sort of black box inside the OS that just delivers good results. The heart of the Windows native threadpool, if you want to know, is the good old IO completion port, something that Windows has had since the NT 3.5 days. It’s a unique Windows feature, and something that any server application running on this platform is using or should be using. And the tricks that make it run well are:
- It does not let too many threads run “on CPU” at the same time (this reduces context switches). Reducing context switches is, in my experience, the single most important factor to increasing throughput.
- It activates threads waiting on completion LIFO order — hot threads remain hot, which reduces cache misses
- IO completions are processed in FIFO order — this is why fairness performance is so good
- And yes, it naturally reduces contention on hot locks.
Enough about readonly performance already, and about threadpool.
The next interesting question is whether MariaDB can do as well on write workloads. So let’s run the sysbench in the ultimate “write-only” mode, also known as update_non_index (every query only increments a non-indexed int column). To maximize the write throughput, I set the innodb_flush_log_at_trx_commit parameter to 0, so the log is flushed once a second rather than each time at transaction commit.
Here are the results:
OLTP write-only (update_non_index/flush_log=0) throughput:
This looks pretty good. The difference could be a combination of many factors. XtraDB write performance, group commit, and threadpool could all have a positive (for MariaDB) influence on the results. However I think the Windows version of MariaDB’s asynchronous IO optimization (it, again, uses completion ports) is the main reason for the big difference here. I, actually, first saw the difference when I was implementing the feature.
All IO related parameters and Innodb parameters in the test above are default. The result looks almost too good to be true, so I really wanted to bring MySQL to the point it can achieve better IO throughput by varying innodb_io_capacity and/or innodb_write_io_threads, to no avail. Does someone know the right trick here?
OLTP writeonly (update_non_index/flush_log=0) response time, 95 percentile
For completeness, here are database parameters I’m using.