Back when the first version of the MariaDB Java Client was released, someone asked in the comments about the performance characteristics of the driver compared to ConnectorJ. I answered with hand-waving, saying that nobody does anything stupid, the performance of the drivers would be roughly the same, but I promised to measure it and tell the world one day. And now that day has come. The day where three MySQL JDBC drivers (ConnectorJ, MariaDB JDBC, and Drizzle JDBC) are compared against each other. Unlike the server, which gets benchmarking attention all the time, there is no standard benchmark for connectors, so I needed to improvise, while trying to keep the overhead of the server minimal. So I did something very primitive to start. I used my two favorite queries:

  • DO 1 — this one does not retrieve a result set, and thus can be seen as a small “update”.
  • SELECT 1 — the minimal SELECT query.

The test program runs a query N times, and if the query was a select, it retrieves all values from the result set, using ResultSet.getObject(i), and calculates the queries-per-second value. (The best thing is that the test program is single-threaded, and how often does one get to run single-threaded tests? :)  the test was run on my own workstation, which runs Windows Server 2008 R2, and I have useConfigs=maxPerformance in the URL for ConnectorJ.

Results (Queries per second,  unprepared)

ConnectorJ-5.1.24 MariaDB-JDBC-1.1.2 Drizzle-JDBC-1.3-SNAPSHOT
DO 1 19543 22104 15288
SELECT 1 17004 19305 13410

jdbc_fast_queries

 

MariaDB JDBC appears to be a little faster (~10%) than  ConnectorJ, and much faster (~30%) than Drizzle JDBC.

Can ConnectorJ do better? I bet it can. Looking into profiler output – CPU profiling, instrumentation mode in NetBeans – for  a test that executes “SELECT 1″ in a loop,  shows com.mysql.jdbc.StatementImpl.findStartOfStatement() taking 7.5% of runtime. Ok, instrumentation results should be taken with a grain of salt, however the single reason string search is used, is because - if an update (DML) statement is executed inside ResultSet.executeQuery(), it is rejected with an exception. This can be done differenty, I believe. If absolutely necessary, throwing an exception can be delayed, until the client finds out that the server sent an OK packet instead of a result set.

Even more interesting is the case with Drizzle JDBC. In theory, since the MariaDB driver has a Drizzle JDBC heritage, the performance characteristics should be similar, but they are not, so there must be a bug somewhere. It appears very easy to find, as according to profiler, 50.2% CPU time (take that number with a big grain of salt) is spent in a function that constructs a hexdump from a byte buffer. Looking at the source code, we find following line that is unconditionally executed:

log.finest("Sending : " + MySQLProtocol.hexdump(byteHeader, 0));

While the result of the hexdump is never used (unless logging level is FINEST), the dump string is still created, using relatively expensive Formatter routines, concatenated with the String “Sending:”, and then thrown away… In Markus’ defense, hexdump() is not his fault, it was contributed 3 years ago. But it remained undetected for 3 years. This bug is now filed  https://github.com/krummas/DrizzleJDBC/issues/21 [UPDATE: this bug was resolved within hours  after reporting]

So, let’s check how much we can gain by putting the offending code into an if (log.getLevel() == java.util.logging.Level.FINEST) condition.
The QPS from “DO 1″ raises from 15288 to 19968 (30%), and for “SELECT 1″ we have increase from 13410 to respectable 16824 (25%). Not bad for a single line fix.
jdbc_fast_queries_drizzle_fix

While the one-liner makes the Drizzle JDBC faster, with slightly better numbers than ConnectorJ, it is still not as fast as MariaDB.

In the MariaDB JDBC connector, there were a couple of improvements to performance which were made since forking. One of the early improvements was to avoid copying data unnecessarily when sending, and to decrease the number of byte buffers.  Another improvement came recently, after profiling and finding that parsing Field packets is expensive (mostly due to the construction of Strings for column name, aliases, and etc…). The improvement was lazy parsing,  delaying string construction, and avoiding it entirely in most cases. For example, if column names are not used, and rows are accessed using integer indexes in ResultSet.getXXX(int i), the metadata won’t be fully parsed. Also, perhaps there were some other fixes that I do not remember anymore. :)

Can we further increase the QPS?

We can try. First, statements can be prepared. MariaDB and Drizzle so far only provide client-side prepared statements (ConnectorJ can do both client and server-side prepared statements) but using them saves having to convert the query to bytes, and JDBC escapes preprocessing. From now on I’ll stay just with “DO 1″ which proved to be the fastest query. Trying it on MariaDB driver shows some minimal QPS increase 22104 (not prepared) vs 22183 (prepared), or 0.3%. Slightly more on ConnectorJ (19543 vs 20096, or 2.9%). Nothing revolutionary so far.

But, We still have not used all of the options in this (admittedly silly) quest for maximizing the performance of “DO 1″. Recall that ConnectorJ can support named pipes on Windows, which are allegedly much faster than TCP connections. Restart server with named pipe, set JDBC URL to “jdbc:mysql:///?socketFactory=com.mysql.jdbc.NamedPipeSocketFactory&namedPipePath=\\\\.\\Pipe\\MySQL&user=root&useConfigs=maxPerformance”, and rerun the test with 1000000 prepared queries. Now the QPS grew to 29542! That is strong, and is a 33% improvement compared to the best result seen so far. Yet, unfortunately, still no cigar, since JVM dumps a stack trace when the named pipe connection is closed. This is a “Won’t fix” (chalked off as a JVM problem) MySQL bug Bug#62518, which renders named pipe support almost useless – though maybe there is a trick to shut up th JVM somehow in this case, but I do not know of such a trick.

How fast is C client library in comparison?

Out of curiosity, I also tested how the native client compares to JDBC. With the TCP protocol, it does slightly better than the fastest JDBC (MariaDB, prepared), but it is not a huge margin – 24063 QPS vs 22183 (8.5% difference), and I believe Java drivers could improve further.
With named pipe, QPS is 33122, which is ~12% better than what ConnectorJ could do, if pipes worked properly there.

 

Accessing benchmark program

I put the benchmark program on Launchpad, together with the drivers. If you’re on Windows, and if you have a server running on port 3306, and the ‘root’ user doesn’t have a password, you can just branch the repository and run bench_all.bat. Those of you who are using other operating systems, I trust you to be able to quickly rewrite the batch files as shell scripts.

Oracle has now launched MySQL-5.6.10-GA, so it is time to come up with some new benchmark results. The test candidates in this benchmark run are

  • MySQL-5.5.29
  • MySQL-5.6.10
  • MariaDB-5.5.28a
  • MariaDB-10.0.1

The 5.5 versions are in because I wanted to check for any regressions. In the past we have often seen performance regressions in newer versions which were caused by new features.

This time the benchmark was run on a different box. The main difference is that this box does not have SSD but a high performance RAID-5 with 512M of battery-backed cache. Besides that the machine has 16 cores out of which 12 were used for mysqld and the other 4 for sysbench.

The benchmark uses sysbench-0.5 OLTP with 8 tables and 10G worth of data. InnoDB buffer pool was 16G, InnoDB log group capacity 4G (the maximum for MySQL-5.5). The different disk system required different InnoDB configuration:

  • innodb_io_capacity = 1000 (was 20000 for SSD)
  • innodb_flush_neighbors = 1 (was 0 for SSD)

Now for the results. OLTP read only comes first:

20130213-sb-ro-tps

And here is the first surprise: MySQL-5.6 behaves significantly different. It competes well up to 8 threads, it even wins 16 threads. But at higher concurrency performance drops off rapidly, even compared to MySQL-5.5. MariaDB-10.0 shows also a slight drop in performance compared to MariaDB-5.5, but it’s much less pronounced.

The response time graph is nice and smooth though:

20130213-sb-ro-rt

Both MySQL-5.6 and MariaDB-10.0 look a little better which means they distribute cpu cycles more evenly on concurrent requests.

Disclaimer: no thread pool was used in this benchmark. The Oracle implementation of the thread pool is closed source and thus cannot be benchmarked or used by anybody. It seemed a bit unfair to use the MariaDB thread pool under those cirumstances.

If you want to see the impact of the MariaDB thread pool, have a look at the benchmarks published previously:

Next stop: OLTP read/write:

20130213-sb-rw-tps

The picture is very similar. Both MySQL-5.6 and MariaDB-10.0 show a performance drop, compared to the 5.5 versions. For MySQL the drop is more than 10% and thus rather heavy.

However it’s a well known fact that MySQL-5.5 exhibits severe write stalls under high load when InnoDB starts synchronous flushing. The response time graph is good to spot this:

20130213-sb-rw-rt

This is the good news. While the 5.5 versions both show heavy write stalls at 64 threads and more, the behavior is much less pronounced with MySQL-5.6 and MariaDB-10.0. So it seems the new adaptive flushing algorithm is working well.

There is however one problem here: if you use multiple buffer pool instances, then you see write stalls more often. For the above results I have run the read-only tests with 16 buffer pools and the read-write tests with only 1.

Conclusions:

  • MySQL-5.6 shows a rather severe performance regression, especially at higher concurrency levels. This does not match the results published by Oracle. I can only speculate why the results are so different, but I guess it’s the (closed source) thread pool and maybe the fact that Oracle benchmarks were done on much bigger hardware.
  • with a single buffer pool you don’t have to be afraid of write stalls any more. Also MySQL-5.6 allows now up to 512G redo log capacity which further reduces the odds to run into synchronous flushing (MariaDB-5.5 lifted this limit with XtraDB already)

As always the scripts used for the benchmark as well as the results are available from launchpad:

http://bazaar.launchpad.net/~ahel/maria/mariadb-benchmarks/revision/20

I invite anybody to rerun this benchmark and share the results.