Let me start with a little story. You sit in your house near the fireplace in the living room and need a book from the library… Eh, no, sorry, wrong century. You’re building a robotic arm that will open your beer or brew your coffee or supply you with whatever other drinks of your choice… while you’ll be building the next robotic arm. So, you — soldering iron in one hand and Arduino in another — ask your little brother to bring a box with specific resistors (that you unexpectedly run out of) from the cellar. The problem — your brother is small and cannot tell a resistor from a respirator. You explain that it’s small thing with two wires sticking out of it. And he starts going back and forth brining you boxes after boxes of different small things with two wires.

This is approximately where we were in MySQL when NDB Cluster was just added. The use wants to find a row, say WHERE number_of_wires=2 AND size='small' AND type='resistor' and ohm=10000. And the optimizer devises an execution plan to use the “ref” access for the number_of_wires column. And NDB Cluster engine starts retrieving all rows that have number_or_wires=2 from all nodes and send all these rows over the network to the server only to see them rejected over and over, because they don’t satisfy the rest of the WHERE condition.

Thus, you see, originally engine condition pushdown was implemented for NDB Cluster. With engine condition pushdown the server can send the complete WHERE clause to NDB, and the latter will send it to all the nodes, so only rows that actually satisfy the complete WHERE clause will be sent over the network back to the server. If the WHERE clause is too complex for nodes to understand (for example, it involves a stored function) the NDB can push only a part of it and let the server enforce the rest.

It wasn’t all perfect, though. If engine condition pushdown is used the server needs to prepare a condition for pushing. In a join it means extracting parts of the WHERE clause that don’t depend on data from tables which weren’t read yet. It creates a sizable overhead for every query even if it doesn’t use NDB Cluster. And most queries don’t. So, engine condition pushdown was disabled by default.

Fast forward to MariaDB. MariaDB doesn’t have NDB Cluster, but it has many other storage engines. For some of them, for example, for SphinxSE, engine condition pushdown is not just a useful optimization, but a requirement, SphinxSE simply cannot function properly if engine condition pushdown is disabled. We’ve solved it in MariaDB 5.2 by implementing an override — the engine could activate engine condition pushdown for its tables, even if it would be disabled for all other tables. No overhead for queries that don’t use SphinxSE. Later other engines started to use this override too.

That’s how we ended up in a strange situation, engine condition pushdown is disabled globally by default, but de facto all engines that can use engine condition pushdown, use it anyway, via this override. And if a user enables engine condition pushdown, it adds no additional optimizations or functionality, but only an overhead of the server preparing pushdown conditions for engines that cannot use them.

This is why in MariaDB 10.1.1 we deprecate engine_condition_pushdown flag. You can still write

but it will not do anything besides printing a deprecation warning. Starting from MariaDB 10.1.1 engine condition pushdown is always enabled for engines that support it and always disabled for engines that do not support it.

I don’t think it’ll surprise anybody if I say that MariaDB or MySQL server knows a lot more about server system variables, then just their values. Indeed, every variable can be session or global only, read-only or writable, it has an associated help text (that is printed on mysqld --help --verbose), certain variables only accept values from a given set of strings (this set of allowed values is also printed in mysqld --help --verbose since MariaDB 10.1.0), numeric variables have lower and upper range boundaries of valid values (that are never printed anywhere), and so on. I always thought it’s kind of a waste that there is no way to query this information. That could’ve been very convenient, in particular for various GUI clients — they could show the help in tooltips, validate values and so on.

But recently we’ve got our users asking for it — precisely, for system variable metadata, whether a variable is read-only, how it was set and so on. You ask for it — you got it. Let me introduce the INFORMATION_SCHEMA.SYSTEM_VARIABLES table. For every system variable it shows both its global and session values, where the global value comes from (config file or a command line, assigned from SQL, auto-configured by the server, compile-time default), the type and the scope of the variable, its default value, range of values for numeric variables, set of allowed values for ENUM/SET variables, whether a variable is read-only, whether it can be set from the command line. And a help text, of course.
It is said that a picture is worth a thousand words, so here you are:

Note that READ_ONLY only means “cannot be changed run-time”, the variable might still be writable from the command line or a config file. Here’s the list of all truly read-only variables that can not be modified at all:

You might wonder what LOG_BIN is doing in this list. It’s because on the command line you can never specify this boolean read-only variable, what you actually specify is a command-line option --log-bin that takes a string. It happens that this new SYSTEM_VARIABLES table is good at highlighting dark corners in the MariaDB/MySQL internal implementation of system variables.


When you e.g. delete rows, these rows are just marked as deleted not really physically deleted from indexes and free space introduced is not returned to operating system for later reuse. Purge thread will physically delete index keys and rows, but still free space introduced is not returned to operating system and this operation can lead holes on page. If you have variable length rows, this could lead to situation where this free space can’t be used for new rows (if these rows are larger than old ones). User may use OPTIMIZE TABLE or ALTER TABLE <table> ENGINE=InnoDB to reconstruct the table.

Unfortunately, running OPTIMIZE TABLE against an InnoDB table stored in the shared table-space file ibdata1 does two things:

  • Makes the table’s data and indexes contiguous inside ibdata1
  • Makes ibdata1 grow because the contiguous data and index pages are appended to ibdata1

New defragmentation

In MariaDB 10.1 we have merged Facebooks defragmentation code prepared for MariaDB by Matt, Seong Uck Lee from Kakao. Only major difference to Facebooks code and Matt’s patch is the fact that in MariaDB we decided not to introduce new literals to SQL and no changes to server code. Instead we use already existing OPTIMIZE TABLE and all code changes are inside InnoDB/XtraDB storage engines. To enable this new feature you need to add following to my.cnf (this requirement is to keep the original behavior of OPTIMIZE TABLE for those users that need it).

This new defragmentation feature works inplace, thus no new tables are created and there is no need to copy data from old table to new table. Instead this feature loads n pages and tries to move records so that pages would be full of records and frees pages that are fully empty after the operation.

New configuration variables

  • innodb_defragment: Enable/disable InnoDB defragmentation. When set to FALSE, all existing defragmentation will be paused. And new defragmentation command will fail. Paused defragmentation commands will resume when this variable is set to TRUE. Default value FALSE.
  • innodb_defragment_n_pages: Number of pages considered at once when merging multiple pages to defragment. Range of 2–32 and default is 7.
  • innodb_defragment_stats_accuracy: How many defragment stats changes there are before the stats are written to persistent storage. Set to 0 meaning disable defragment stats tracking. Default 0.
  • innodb_defragment_fill_factor_n_recs:  How many records of space defragmentation should leave on the page. This variable, together with innodb_defragment_fill_factor, is introduced so defragmentation won’t pack the page too full and cause page split on the next insert on every page. The variable indicating more defragmentation gain is the one effective. Range of 1–100 and default 20.
  • innodb_defragment_fill_factor: A number between [0.7, 1] that tells defragmentation how full it should fill a page. Default is 0.9. Number below 0.7 won’t make much sense. This variable, together with innodb_defragment_fill_factor_n_recs, is introduced so defragmentation won’t pack the page too full and cause page split on the next insert on every page. The variable indicating more defragmentation gain is the one effective.
  • innodb_defragment_frequency: Do not defragment a single index more than this number of time per second.This controls the number of time defragmentation thread can request X_LOCK on an index. Defragmentation thread will check whether 1/defragment_frequency (s) has passed since it worked on this index last time, and put the index back to the queue if not enough time has passed. The actual frequency can only be lower than this given number.

New status variables

  • Innodb_defragment_compression_failures: Number of defragment re-compression failures
  • Innodb_defragment_failures: Number of defragment failures.
  • Innodb_defragment_count: Number of defragment operations.


After CREATE TABLE and INSERT operations we can see following from INFORMATION_SCHEMA:

Now if we delete 3/4 of the records that will leave holes in pages and then we optimize table to execute defragmentation:

After this we can see that some pages are freed and some pages merged:


WebScaleSQL Git repository https://github.com/webscalesql/webscalesql-5.6

Facebook Percona Live presentation: https://www.google.fi/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCQQFjAB&url=https%3A%2F%2Fwww.percona.com%2Flive%2Fmysql-conference-2014%2Fsites%2Fdefault%2Ffiles%2Fslides%2Fdefragmentation.pdf&ei=UgNKVNnZMcHhywP7qwI&usg=AFQjCNGREUpen21jCcy0bchUa6Ro83ol_A&sig2=MDZU2Ue9sX1kB9OusvdiFA