Over the past few days extensive conversations around a new security vulnerability in MariaDB and MySQL have taken place.

It all started as a chain reaction when Monty Program publicly disclosed information about the flaw they had found and about how to make sure your MariaDB and MySQL installations can be fixed. The initial information got assigned the security vulnerabitlity identifier CVE-2012-2122 and the contents can be seen e.g. here http://seclists.org/oss-sec/2012/q2/493.

The bug was found two months ago on April 4th.

Before disclosing the information publicly, given the seriousness of this bug and considering the millions of MySQL and MariaDB installations deployed worldwide, Monty Program informed the biggest distributors of MySQL and MariaDB as a precaution.

On April 6th, Monty Program informed Oracle about it in bug report http://bugs.mysql.com/bug.php?id=64884 and provided a suggested fix.

The other big distributors of MySQL and MariaDB are the major Linux distributions that were alerted also in April and provided with a fix for old (unsupported by Oracle) MySQL versions. This gave Oracle and the Linux distributions some lead time to check if their MySQL and/or MariaDB builds were vulnerable and apply the provided fix if needed.

Whether your MySQL or MariaDB installation is vulnerable depends on where and how the binaries you use were built.

Official binaries of MariaDB, provided by Monty Program, MySQL binaries provided by Oracle and – in the case you use Percona’s provided binaries of their server – have all been tested. All these vendors have confirmed that the vulnerability isn’t present in their binaries and that it actually has never been present due to the way that the binaries are built.

All binaries listed on the SkySQL website are either official Oracle or official MariaDB binaries mirrored from dev.mysql.com/downloads and downloads.mariadb.org.

If you built your binaries yourself, you (or your database administrator) can easily test if your installation is vulnerable or not by following the instructions found e.g. here http://ronaldbradford.com/blog/repost-a-tragically-comedic-security-flaw-in-mysql-2012-06-11/.

In the case that you build or have built your own binaries another good piece of information is that the fix (getting rid of the problem independently of how you build) was first released in MariaDB in version 5.5.23 on April 10. Oracle followed by having the fix in MySQL 5.5.24 on May 7.

Most of us, MariaDB and MySQL users, do not have a need to build binaries on our own, i.e. we are on a platform that the official MariaDB and MySQL binaries are provided for and we do not have our own patches that would need to be applied before producing our own binaries.

For most of us it’s therefore recommended to get the binaries from the official channels, such as through the Linux distribution you use via the distribution’s repository or through the official download channel of the database, which in the case of MariaDB is http://downloads.mariadb.org.

Also, if you want to make sure that you’re on the latest version of MariaDB and you’re running on CentOS, Fedora, Debian, Red Hat or Ubuntu you should consider adding MariaDB’s official repository, http://downloads.mariadb.org/mariadb/repositories/.

MariaDB and/or MySQL packagers (such as Linux distributions) should make sure they sign up for the MariaDB mailing list intended for packagers at https://lists.askmonty.org/cgi-bin/mailman/listinfo/packagers to receive important notifications including early disclosure of security vulnerabilities, like this one.

We haven’t posted any Windows benchmarks for a while, and MariaDB for Windows contains some specific improvements which might not be widely know since we haven’t talked much about them yet. This post is an attempt to fix that. We’ll also share current MySQL 5.5 numbers.

My setup is an 8 core 2 socket server (yes, a little bit dated for today, but it is the best machine I have at my disposal), 10K SAS disks with RAID1. I ran sysbench 0.4 single table / 1,000,000 records. I ran the benchmark over a network, with the number of concurrent clients ranging from 4 to 4096.

Here is what OLTP-readonly throughput looks like:

  • For most of the tests, MariaDB’s throughput is approx 10% higher than MySQL’s
  • For 4096 concurrent clients, MariaDB’s throughput is better than MySQL 5.5 by 476% (2382 vs 413 TPS).

Many people rightfully remark that throughput does not necessarily represent performance as a whole, and that in OLTP benchmarks, fairness also matters. It actually often matters more than throughput. I agree, so below are the results for 95% of response time, meaning that 95% transactions were finished under the given time.

OLTP Readonly response time, 95th percentile

Concurrent clients 4 8 16 32 64 128 256 512 1024 2048 4096
MariaDB 4.87 6.81 8.83 12.35 22.12 43.56 90.35 180.57 619.05 1003.88 1965.77
MySQL 4.86 7.14 9.96 16.21 37.39 101.33 238.89 499.63 971.07 2241.83 25215.29

As the above table shows, MariaDB 5.5 outperforms MySQL 5.5 in both row throughput, and in fairness. Fairness (response time) results are in fact more impressive than the row throughput.

But, why does MariaDB perform better than MySQL 5.5 on Windows in readonly tests? Why it is at all better than 5.5? It is, after all, based on the same codebase, so performance should be about equal. The answer to this is not our optimizer enhancements, nor XtraDB serving as InnoDB in MariaDB. The answer is the MariaDB threadpool. This is my feature in 5.5 and I’m glad it performs so well, and I’m glad that it is switched “on” by default on Windows (so, Windows users, you do not have to do anything to switch it on).

Still, why does using threadpool result in such better performance? The answer is that the implementation delegates all responsibility for sizing the pool and running callbacks to the Windows native threadpool, so it is a sort of black box inside the OS that just delivers good results. The heart of the Windows native threadpool, if you want to know, is the good old IO completion port, something that Windows has had since the NT 3.5 days. It’s a unique Windows feature, and something that any server application running on this platform is using or should be using. And the tricks that make it run well are:

  • It does not let too many threads run “on CPU” at the same time (this reduces context switches). Reducing context switches is, in my experience, the single most important factor to increasing throughput.
  • It activates threads waiting on completion LIFO order — hot threads remain hot, which reduces cache misses
  • IO completions are processed in FIFO order — this is why fairness performance is so good
  • And yes, it naturally reduces contention on hot locks.

Enough about readonly performance already, and about threadpool.

The next interesting question is whether MariaDB can do as well on write workloads. So let’s run the sysbench in the ultimate “write-only” mode, also known as update_non_index (every query only increments a non-indexed int column). To maximize the write throughput, I set the innodb_flush_log_at_trx_commit parameter to 0, so the log is flushed once a second rather than each time at transaction commit.

Here are the results:

OLTP write-only (update_non_index/flush_log=0) throughput:

This looks pretty good. The difference could be a combination of many factors. XtraDB write performance, group commit, and threadpool could all have a positive (for MariaDB) influence on the results. However I think the Windows version of MariaDB’s asynchronous IO optimization (it, again, uses completion ports) is the main reason for the big difference here. I, actually, first saw the difference when I was implementing the feature.

All IO related parameters and Innodb parameters in the test above are default. The result looks almost too good to be true, so I really wanted to bring MySQL to the point it can achieve better IO throughput by varying innodb_io_capacity and/or innodb_write_io_threads, to no avail. Does someone know the right trick here?

OLTP writeonly (update_non_index/flush_log=0) response time, 95 percentile

Concurrent clients 4 8 16 32 64 128 256 512 1024 2048
MariaDB 0.33 0.63 0.75 1.06 1.94 3.85 8.25 21.38 129.79 274.40
MySQL 0.32 0.61 0.73 1.61 7.62 26.82 96.45 219.29 661.19 2723.36

For completeness, here are database parameters I’m using.

[mysqld]
sql-mode="NO_ENGINE_SUBSTITUTION"
back_log=500
user=root
port=3306
max_connections=4096
max_connect_errors=5000
max_prepared_stmt_count=50000
table_cache=2048
transaction_isolation=REPEATABLE-READ
loose-skip-external-locking
innodb_status_file=0
innodb_data_file_path=ibdata1:200M:autoextend
innodb_buffer_pool_size=1G
innodb_additional_mem_pool_size=20M
innodb_log_file_size=650M
innodb_log_buffer_size=100M
innodb_support_xa=0
innodb_doublewrite=0
innodb_flush_log_at_trx_commit=0
query-cache-size=0
query-cache-type=0
symbolic-links=0
skip-grant-tables

This article describes how I tuned MariaDB to give the best write throughput with SSD based storage.

When you have a write-heavy application writing into InnoDB, you will probably experience the InnoDB Checkpoint Blues. The effect manifests as stalls – short periods of time where the troughput falls to zero and I/O activity goes crazy. The phenomenon is well known and described i.e. here. More background about checkpointing can be found here.

The XtraDB fork of the InnoDB engine (and heart of Percona Server) contains some patches with the goal to overcome this odd behavior. MariaDB uses XtraDB as default InnoDB implementation, so we can configure some extra variables and hopefully avoid the checkpoint blues.

The first and most important setting is innodb_io_capacity. This is the approximate number of write operations that your hardware can do. If you don’t know that number, then you can easily find it out. Wait until you experience a stall and run iostat -x. You should see fairly high numbers for wrqm/s and w/s on the device holding your InnoDB table spaces. W/s is the number of write requests that hit the device, wrqm/s is the number of requests that could be merged. The sum of both is what InnoDB was effectively using.

In my case the disk is a RAID-0 of 3 SAS SSDs and I have seen up to 25.000 writes per second. So I set innodb_io_capacity=20000 and leave everything else at the default:

Here we can already see that the stall is gone. However after ~370 seconds we see a rather heavy I/O spike. This is, when XtraDB starts eager flushing when the checkpoint age reaches 75% of the log group capacity.

Now lets see if we can do better. The Percona Server documentation claims: innodb_adaptive_flushing_method=keep_average: … is designed for use with SSD cards. So lets try that next:

FAIL! Lets quickly revert that to the default.

Next thing to look at, is neighbor page flushing. This works like so: InnoDB pages are organized in blocks of 64 pages. When the checkpointing algorithm has picked a dirty page to be written to disk, it checks if there are more dirty pages in the block and if yes, writes all those pages at once. The rationale is, that with rotating disks the most expensive part of a write operation is head movement. Once the head is over the right track, it does not make much difference if we write 10 or 100 sectors.

The default setting is innodb_flush_neighbor_pages=area. For SSD there is however no head movement penalty. So it makes sense to experiment with the other settings. Lets try innodb_flush_neighbor_pages = cont next:

We still see a write spike around 400s run time, but it’s less pronounced than last time. Now lets finally try innodb_flush_neighbor_pages=none

And here we are! For SSD based storage the best parameter combination seems to be: innodb_flush_neighbor_pages = none, innodb_io_capacity = what_your_hardware_can_do and all others at the defaults.

If the storage uses rotating disks, we are probably better off with innodb_flush_neighbor_pages = cont or even innodb_flush_neighbor_pages = area, but at the moment I don’t have the hardware to test that.

Disclaimer: the above graphs use the sysbench OLTP multi table read/write benchmark. Data set size was 10G, InnoDB buffer pool 16G, InnoDB total log capacity 4G.