A MariaDB Howto authored by: Erkan Yanar.

This is a Howto about installing MariaDB Galera Cluster on Debian/Ubuntu. Because a lot of people were having problems installing MariaDB Galera Cluster, elenst from #maria on freenode forced me to write this Howto :)

Installing MariaDB Galera Cluster is in fact quite easy and actually kind of boring in the end. This Howto is written for (and tested on) on Debian 7.1 (Wheezy) and Ubuntu 12.04 (Precise).

What we need

In our setup we assume 3 nodes (node01, node02, node03) with one interface each. We assume following IP addresses: 172.16.8.5, 172.16.8.6, and 172.16.8.4. We need three packages installed on all nodes:

  • rsync
  • galera
  • mariadb-galera-server

As Galera does not ship with the distribution repositories, go for the repo configurator and follow the instructions to include the repository fitting your system. Keep in mind to Choose “5.5″ in Step 3 (Choose a Version). Doing this you can jump directly to Install Packages

Adding the Repository

Alternatively you can just take following steps.

Debian Wheezy

Ubuntu Precise

Yes, they are nearly identical. :)

Install Packages

(Just another shortcut for the impatient)

After installing the packages you will have a running MariaDB server on each node. But none of them will be configured to run as a node in a MariaDB Galera Cluster.

Configuring Galera

So we have to do some configuration next. There is a MariaDB configuration part and one part to configure Galera (starting with wsrep_). As we do the most basic and simple installation in this Howto, it is sufficient you just change the IP’s (Remember: 172.16.8.5, 172.16.8.6, 172.16.8.4) with your IP’s.
This will be needed to define the wsrep_cluster_address Variable (the list of nodes a starting mysqld contacts to join the cluster).

The following configuration file has to be distributed on all nodes. We use a separate configuration file /etc/mysql/conf.d/galera.cnf with the following settings:

FYI: The shared library for wsrep_provider is provided by the installed galera package.

We could also change the cluster name by changing the value of wserp_cluster_name to fit our style. This setting also works as a shared secret to control the access to the cluster. With wsrep_cluster_address you see the IP addresses of our setup. The wsrep_sst_method tells what method to use to synchronise the nodes. While there are also mysqldump and xtrabackup available, I prefer rsync because it is easy to configure (i.e. it does not need any credentials set on the nodes). If you are considering using the xtrabackup method, don’t forget to install xtrabackup.

Starting the Galera Cluster

First we stop mysqld on all nodes.

The configuration file (galera.cnf) is already distributed to all nodes, so we next start the first mysqld. This node initializes/starts the cluster (creates a GTID).

To have a look and see if everything really worked we’ll check the cluster size status variable.

If you see the above, great! That’s what we would expect. Now that the Cluster already exists, we let the next nodes just start and join the cluster.

We can ignore the above error for now. This node is still starting fine.

Let’s pause here and do a quick check. As we are running a cluster it is not important if we execute the following on node01 or node02.

If you see the above, very nice! Now let’s start the third node:

Ok we are finished. We have a running MariaDB Galera Cluster \o/

Having fun with Debian/Ubuntu init scripts

But we’ve got to fix some things because of some Debian/Ubuntu oddities.

Remember the error we saw when starting node02 and node03? What happened? Well, Debian/Ubuntu uses a special user ('debian-sys-maint'@'localhost') in their init script and the credentials for that user are stored in /etc/mysql/debian.cnf. This user is used to make some checks starting MySQL. Checks I don’t think belong into a service script anyway.

We could simply ignore it, but the user user is also used to shutdown mysqld. This is also not required, as a SIGTERM is sufficient to shutdown the mysqld :/

As we copied the data from node01 to all other nodes, the credentials in /etc/mysql/debian.cnf don’t match on node02 and node03. So we will not be able to shutdown mysqld on either of these nodes.

So we’ve got to fix it, by copying /etc/mysql/debian.cnf from the first node (node01) to all other nodes. So the data and configuration files have the same data again.

After that we are able to shutdown the daemon:

Great.

So if we would have a proper init script the Howto would have been even shorter ;)

Follow the Bug :)

Enjoy your MariaDB Galera Cluster and have fun!

– Erkan Yanar

Thx to teuto.net for providing me an OpenStack tenant, so I could run the tests for this Howto.

Lets start by considering a scenario where records are being inserted in a single auto-increment table via different nodes of a multi-master cluster. One issue that might arise is ‘collision’ of generated auto-increment values on different nodes, which is precisely the subject of this article.

As the cluster is multi-master, it allows writes on all master nodes. As a result of which a table might get same auto-incremented values on different nodes on INSERTs. This issue is discovered only after the writeset is replicated and that’s a problem!

Galera cluster suffers with the similar problem.

Lets try to emulate this on a 2-node Galera cluster :

As expected, the second commit could not succeed because of the collision.

So, how do we handle this issue? Enter @@auto_increment_increment and @@auto_increment_offset! Using these two system variables one can control the sequence of auto-generated values on a MySQL/MariaDB server. The trick is to set them in such a way that every node in the cluster generates a sequence of non-colliding numbers.

For instance, lets discuss this for a 3-node cluster (n=3):

As you can see, by setting each node’s auto_increment_increment to the total number of nodes (n) in the cluster and auto_increment_offset to a number between [1,n], we can assure that auto-increment values, thus generated, would be unique across the cluster, thus, would avoid any conflict or collision.

In Galera cluster this is already taken care of by default. As and when a node joins the cluster, the two auto-increment variables are adjusted automatically to avoid collision. However, this capability can be controlled by using wsrep_auto_increment_control variable.

With this setting the last COMMIT in the above example would succeed.

I’ve continued building on my MariaDB GIS and node.js example application that I wrote about two weeks back, https://blog.mariadb.org/node-js-mariadb-and-gis/. The application shows how to load GPX information into MariaDB, using some MariaDB GIS functionality, and making use of the node.js platform together with MariaDB’s non-blocking client.

With the GPX data converted into GIS points in the MariaDB database, I wanted to further expand a little on both the GIS aspect and also look at how some additional data could be shown in the application by using jQuery’s Ajax calls to update a piece of the web based application UI.

To start with, an interesting thing to do when you have a bunch of GIS points in a database table is to do distance calculation with the end result being to get the complete distance of the track formed by the points. There of course exists a bunch of different formulas for this, but since MariaDB yet doesn’t have the third coordinate in GIS, which is altitude (or elevation), I chose to use the concept of the Great Circle Distance, http://en.wikipedia.org/wiki/Great-circle_distance  and the Haversine approach. The algorithm for counting the distance between two points in this way is:

Image1: Haversine formula from http://en.wikipedia.org/wiki/Great-circle_distance

Since we need to make the distance calculations for all distances between the points it makes sense to create a database function for counting the distance between two points:

With the function in place in the database, we can test it with a query that actually will be the base of the query the application will be using to retrieve the distance for the whole track:

In the SELECT query I’ve given the id of the first point of the track and then done an INNER JOIN over the same table to be able to get the second point and calculating the distance between the points. I’ve made sure when inserting the points into the database that they are ordered in such a way that the next point on a track after the previous one always has the following pointId, so that I for example now in this case can tell the query that the join is done on pointId + 1. The outcome of running the SELECT –query is:

The next step is to do this for every point on the track and sum the distances together to get the full length of the track. This is done by changing the query slightly so that there is no specific pointId restriction and making use of the SUM function to sum all the point distances:

With that in place, let’s move to the application side. For the background of setting the whole application up in node.js, please refer to my previous blog post, https://blog.mariadb.org/node-js-mariadb-and-gis/. On the application side I’ll start with defining a new URL mapping through which the distance information will be retrieved. This happens in app.js where a new row is added to the URL mapping section:

In the URL mapping the common.trackInfo –function is called, which is a really simple function just calling data methods for getting a database connection, querying the distance and closing the database connection. Inside the data method [name of method for retrieving distance] the SELECT query for summing up the distances between the points can be seen. The only parameter being given to the query is trackId, which is read from the URL querystring in the normal node.js way of req.param(“trackId”).

On the UI side, let’s make use of the page that plots the track onto Google Maps. In that web page Google’s implementation of jQuery is used, which can be seen in the source code:

Also notice in the source code how the points that are being plotted onto the map are requested through a $.getJSON function call. I’m going to do something similar for getting the distance and displaying it. First though let’s create a link through which the retrieval of the distance is fired and add a placeholder for the distance information by adding a div –block in the HTML: 

With that in place let’s take a look at the jQuery function that will get us the distance:

The function calls the url /trackinfo URL mapping, which returns a JSON formatted output holding the distance information. The distance is easily picked out from JSON and finally placed in the div –block.

Image2: A part of the web page where the distance is shown

As you can see, it’s very straight forward to use MariaDB with node.js and make use of jQuery. Also there are many interesting things that can be done by using the GIS capabilities of MariaDB. In MariaDB 10.1 the third coordinate will be present, which then means that the height differences could be considered in the distance calculations and the outcome would be more precise. In this particular case we were looking at a GPX track from a half marathon run I did. I had a running watch on me, which gathered the information for the track. It showed the final distance of the run to be 21.23 kilometers, while the distance counted with the Haversine -algorithm approach was 21.15 kilometers. The difference could very well be that the watch actually included the altitude into the calculations, but I’m of course not sure about that since I don’t know what algorithms it uses.

This example application’s source code is available on Github. Try it out if you’re interested in MariaDB GIS or using node.js and jQuery with MariaDB.