MariaDB connect engine XML

In January I wrote my first post about node.js and MariaDB . In February I continued with a second post about using jQuery and some GIS calculations. Now it is time for the third and this time the main focus is not so much on GIS functionality, but instead on the capabilities MariaDB has for handling piles of unstructured data. In this case I’ll be focusing on crunching a pile of XML files without importing the XML data itself.

MariaDB 10.0, which had its GA launch in early April, includes the Connect engine. It has been developed to dynamically access all kinds of data sources, from other databases over ODBC connections to web log files. The Connect engine establishes a connection to the data source and exposes the data source to MariaDB as a table. Let’s get started!

To begin with, check that you do not have the Connect engine installed, which you shouldn’t if you haven’t played with it before. It’s a separate install. To verify if it’s installed (or not) run the SHOW ENGINES command. The Connect engine is simply called CONNECT in the output. I’m running this demo on Ubuntu 13.10 so the following instructions focus on that platform, but I’ll indicate how it’s done on other platforms as well.

Make sure you’ve configured APT to use the MariaDB repository. Instructions for that are found in the MariaDB repository configuration tool part of downloads.mariadb.org. Once done with adding the repository the Connect engine can be installed with a normal apt-get command. For example:

Notice that on RPM platforms you would do something similar with yum, i.e. yum install MariaDB-connect-engine .

Next, tell MariaDB to enable the Connect engine:

Now we’re ready with all prerequisites to actually start implementing the solution. Since I’m continuing on the development demo I built earlier, which is about showing running tracks on Google Maps I’m going to continue with that. Also in this case I will use GPX files. GPX files are stored by many GPS devices including running watches and smartphones. GPX itself is a data representation format with a specific XML schema:

For the demo I’ll use a directory /home/rasmus/nodegpxmariadb/public/uploads/ which I’ve preloaded with a set of GPX files. Now starts the interesting part. I want to make the data in these XML formatted files available to MariaDB. It’s pretty straight forward by creating a new table with the Connect engine:

In the CREATE TABLE statement you can see that Connect is specified as the engine, the table_type option is set to XML, multiple is set to 1 to allow for multiple files and file_name includes the path and the pattern of the XML files that I want to make accessible. In the tabname option the XML node under which the data resides is given. Data in this node will be represented as table rows. In this case the node is called trkseg. This node includes all the trkpt -nodes that we’re interested in. A single trkpt node represents a track point stored at certain intervals while running. One track point includes longitude, latitude, elevation and time. By combining these trackpoints you’ll have a representation of the running track. That is what I’m going to do.

In the CREATE TABLE statement there is one more thing to notice. The option tabname and the table field ele both include namespace directives to give to the XML parser so that it knows how to get hold of the right node. This was a bit tricky, but it has to do with the use of the namespaces of GPX and the XML parser that is used. The Connect engine uses libxml2 on Linux.

With the table GPXSource now in place let’s try to do a SELECT over it:

Voila! We can see that it works and I get out the number of trackpoints per GPX file.

Let’s try another SELECT that I will make use of in the actual application. We want to get the duration of each of the runs by calculating the difference between first and last trackpoint for each GPX file:

Now I have all the data I need to be able to create my sample application.  In this blog post I won’t dive into the Node.js application code part, but I have made all source code available in the same way as before on Github, https://github.com/rasmushoj/nodejs-gpx-mariadb . Hopefully you find it useful. The end result however accessing the XML files through MariaDB without importing the data looks like this:

MariaDB connect engine XML

I’ve continued building on my MariaDB GIS and node.js example application that I wrote about two weeks back, https://blog.mariadb.org/node-js-mariadb-and-gis/. The application shows how to load GPX information into MariaDB, using some MariaDB GIS functionality, and making use of the node.js platform together with MariaDB’s non-blocking client.

With the GPX data converted into GIS points in the MariaDB database, I wanted to further expand a little on both the GIS aspect and also look at how some additional data could be shown in the application by using jQuery’s Ajax calls to update a piece of the web based application UI.

To start with, an interesting thing to do when you have a bunch of GIS points in a database table is to do distance calculation with the end result being to get the complete distance of the track formed by the points. There of course exists a bunch of different formulas for this, but since MariaDB yet doesn’t have the third coordinate in GIS, which is altitude (or elevation), I chose to use the concept of the Great Circle Distance, http://en.wikipedia.org/wiki/Great-circle_distance  and the Haversine approach. The algorithm for counting the distance between two points in this way is:

Image1: Haversine formula from http://en.wikipedia.org/wiki/Great-circle_distance

Since we need to make the distance calculations for all distances between the points it makes sense to create a database function for counting the distance between two points:

With the function in place in the database, we can test it with a query that actually will be the base of the query the application will be using to retrieve the distance for the whole track:

In the SELECT query I’ve given the id of the first point of the track and then done an INNER JOIN over the same table to be able to get the second point and calculating the distance between the points. I’ve made sure when inserting the points into the database that they are ordered in such a way that the next point on a track after the previous one always has the following pointId, so that I for example now in this case can tell the query that the join is done on pointId + 1. The outcome of running the SELECT –query is:

The next step is to do this for every point on the track and sum the distances together to get the full length of the track. This is done by changing the query slightly so that there is no specific pointId restriction and making use of the SUM function to sum all the point distances:

With that in place, let’s move to the application side. For the background of setting the whole application up in node.js, please refer to my previous blog post, https://blog.mariadb.org/node-js-mariadb-and-gis/. On the application side I’ll start with defining a new URL mapping through which the distance information will be retrieved. This happens in app.js where a new row is added to the URL mapping section:

In the URL mapping the common.trackInfo –function is called, which is a really simple function just calling data methods for getting a database connection, querying the distance and closing the database connection. Inside the data method [name of method for retrieving distance] the SELECT query for summing up the distances between the points can be seen. The only parameter being given to the query is trackId, which is read from the URL querystring in the normal node.js way of req.param(“trackId”).

On the UI side, let’s make use of the page that plots the track onto Google Maps. In that web page Google’s implementation of jQuery is used, which can be seen in the source code:

Also notice in the source code how the points that are being plotted onto the map are requested through a $.getJSON function call. I’m going to do something similar for getting the distance and displaying it. First though let’s create a link through which the retrieval of the distance is fired and add a placeholder for the distance information by adding a div –block in the HTML: 

With that in place let’s take a look at the jQuery function that will get us the distance:

The function calls the url /trackinfo URL mapping, which returns a JSON formatted output holding the distance information. The distance is easily picked out from JSON and finally placed in the div –block.

Image2: A part of the web page where the distance is shown

As you can see, it’s very straight forward to use MariaDB with node.js and make use of jQuery. Also there are many interesting things that can be done by using the GIS capabilities of MariaDB. In MariaDB 10.1 the third coordinate will be present, which then means that the height differences could be considered in the distance calculations and the outcome would be more precise. In this particular case we were looking at a GPX track from a half marathon run I did. I had a running watch on me, which gathered the information for the track. It showed the final distance of the run to be 21.23 kilometers, while the distance counted with the Haversine -algorithm approach was 21.15 kilometers. The difference could very well be that the watch actually included the altitude into the calculations, but I’m of course not sure about that since I don’t know what algorithms it uses.

This example application’s source code is available on Github. Try it out if you’re interested in MariaDB GIS or using node.js and jQuery with MariaDB.

Screenshot 3: The GPX track shown on Google Maps

The availability of the node.js binding for MariaDB’s non-blocking client library together with the GIS capabilities of MariaDB inspired me to make an example of using node.js and MariaDB to import so-called GPX tracks to a MariaDB database and then show them on a map. GPX tracks are what are stored by many GPS devices including running watches and smartphones.

My project makes use of MariaDB’s non-blocking client library together with the node.js platform and on top of that uses the GIS functionality found in MariaDB 5.5 and 10.0.

To start with let’s go through the software and components I’m using:

  • Node.js – The popular Node.js platform built on Chrome’s JavaScript runtime. An event-driven and non-blocking architecture
  • Express.js – a web application framework for node.js (similar to Sinatra in Ruby). Puts a nice structure on the application and makes configuration easy.
  • Jade – A node.js template engine for outputting in most cases HTML, but can also be used for other XML structured languages
  • node-mariasql – A node.js binding to MariaDB’s non-blocking (MySQL-compatible) client library created by Brian White
  • xml2js – Node module for parsing XML
  • Google Maps – To show the end result on a map

The source code for my application can be found on Github: https://github.com/rasmushoj/nodejs-gpx-mariadb
I’ve licensed it under the MIT license, so feel free to grab it, copy it, reuse it or distribute it. If you build on top of it, it would of course be nice to know and it would appreciated if you dropped an email to me or added a comment here.

Installing the platform

I chose to use Ubuntu 13.10 x64 as the OS for the application. Below is outlined the steps for installing each of the components needed for this application. Before I begin going through them it’s always good to create a new OS user and not run as root. adduser is handy for the purpose. Remember also to add the user to the sudo group if you’re going to make the installation with this new user. In my case I chose to “sudo adduser rasmus sudo”.

Install MariaDB 10.0 by following the instructions found on the downloads repository for MariaDBRemember to choose the right Ubuntu release on that page, which in my case is 13.10 “saucy”. Then follow the instructions on the page. While doing this tutorial the version of MariaDB getting installed was 10.0.7. The application uses MariaDB’s root user with password password1 so you might want to do the same to get this example working without code changes. Create a database named running with the following commands:

In the GitHub repository for the application you’ll find running.sql in the root. Run that with the following command on the server to create the schema for the running -database:

Prepare the environment:
Update the apt repository: sudo apt-get update
Install a compiler: sudo apt-get install build-essential
Install curl which will be needed when installing node.js: sudo apt-get install curl

Install node.js and Node Package Manager (NPM):
sudo apt-get install nodejs
sudo apt-get install nodejs-legacy
sudo apt-get install npm

Install Express.js and Jade:
npm install express
npm install jade

Install the MariaDB non-blocking client:
npm install mariasql
(there might be a few compiler warnings, I at least had a few)

Finally install the xml2js module which we will need:
npm install xml2js

With that done, the platform is set up and the next step is to create an Express.js web application by running the command express nodegpxmariadb, where nodegpxmariadb is the name of the web application. I did it in the home directory of my user. Go to the nodegpxmariadb -directory and make sure all required modules are installed by running npm install.

Configuring the application

In app.js a few different things are specified.

All module dependencies are listed. For my application the two additions to the normal modules set up by express.js is a data.js and a common.js module in which I’ve put all the database handling. The module dependency is defined by adding these rows:

Now the common and data modules are available throughout the application. Common follows the principles of the express.js framework and is done as a so-called route which include the logic for url path mappings mentioned below.

The default  express.js environment configurations are in use like the server running on port 3000. Make sure JSON support is on. It should be there by default:

Another configuration needed is to tell express.js where to store uploaded files:

In addition app.js is the place where all the URL path mappings are defined. The following are needed:

As you can see most of the url path mappings point to the common module, where it’s defined what to do when there is an incoming request for the url path. For /showmap there is no server side logic needed so an html file can just be read and sent to the client.

Start the application

With the configuration in place the application can be started:

You should see a “Welcome to Express” -message if you open up the http://[server ip]:3000 in a browser.

Upload GPX file

Let’s take a closer look at what actually is done in the common.js module. Let’s start with the upload part. Exports.fileForm and exports.fileUpload are handling the displaying of an upload form and respectively actually doing the upload of the file chosen by the user.

node_upload
Screenshot 1: File upload


Parse GPX file

Once a file has been uploaded it gets more interesting. This is when the parsing of a GPX file starts. GPX files are full of tracking points with a certain interval. For example:

As seen above each track point is its own XML node, which holds the longitude and latitude coordinates, elevation, time and some other interesting information such as heart rate. An easy way to handle XML of this type (or in general) is to make use of the node.js library xml2js which parses XML and outputs JSON. It’s very straightforward. I make use of it in in the parseGPX –function.

When we have the GPX content in JSON it’s time to extract the track points and insert them into a MariaDB database. We start by calling the connect –method of the data –module to get a connection to MariaDB. After that we iterate over the JSON contents:

The lon variable holds the longitude coordinate of the trackpoint and lat the latitude. Unfortunately MariaDB doesn’t yet support the 3rd coordinate in GIS, elevation, otherwise we could have added it also. The lon and lat are given to the data.insertPoint –method which stores them as a GIS Point in the database:

Make sure you check out the data.js file for how to connect and disconnect to MariaDB.

Screenshot 2: Parsing the GPX file
Screenshot 2: Parsing the GPX file


Show on map

The final thing I wanted to do was to see how easily I could plot the stored GIS points on a map. To start with we need to retrieve all the GIS points from the database, which is done is data.js in the following way:

From the above we can see that an array is built which is passed directly as an http response and of course by default in JSON format. This is very good for Google Maps which now can point to a URL, which in this case is /track and read the JSON formatted array of points from there and plot a so-called Polyline connecting each of the points to a nice track.

Screenshot 3: The GIS points shown as a polyline on Google Maps
Screenshot 3: The GIS points shown as a polyline on Google Maps

A lot more could of course be done when having the GIS points stored in the database, including:

  • distances could be counted
  • I could view which of my runs that intersect with each other (well this is actually a MariaDB 10.1 feature since INTERSECT will be introduced there)
  • count how many of my runs are inside a certain area
  • time could be added so that duration and pace could be measured

To sum it up, I would say that whenever you think about creating your next app, being it a web application or a smartphone app with the need of a back-end node.js + MariaDB is a great choice and all the pieces needed are available. You’ll have a full blown and proven RDBMS backing up your application.

Also location awareness and other geographical functionality seems to be almost mandatory in new development. MariaDB includes quite a nice (and properly implemented) set of spatial data types, GIS functions and other relevant functionality.