In the #maria IRC channel the other day I fielded a question someone had about a DATETIME column where they wanted to SELECT by the year. The answer (which is probably obvious to many of you) is to use the YEAR() function like so:

(The above assumes the table is named “t1” and the DATETIME column is named “d“.)

In my reply I provided a link to the Date and Time Functions section of the AskMonty Knowledgebase, but when I looked at the entry for the YEAR() function, I noticed that the example given (which originated from the file scripts/fill_help_tables.sql which is found in the MySQL and MariaDB source) was not very helpful:

The above is certainly a valid use of the YEAR() function, but it’s not a real-world use (at least, I don’t know why anyone would use the above SELECT statement in a real application). So I added an example which I feel is more useful, especially to someone unfamiliar with using SQL functions:

Looking at the entries for other functions in both the Date and Time functions section and elsewhere there is a pattern of examples that are useful, but — like the original YEAR() example — useless in the real world.

I have now added better examples to several of the date and time functions but they can probably be improved even more and there are many more entries in the Knowledgebase that would also benefit from better examples. The good news is adding examples to Knowledgebase entries is very easy to do. The bad news is that if I think of all of the examples on my own they will be similar and probably boring, so I’d like to ask for some help.

If you’ve ever wanted to contribute to MariaDB, but didn’t know where to begin, I have a suggestion: Look under the Functions and Operators section of the AskMonty Knowledgebase, find a function or two in need of better examples, and add some (either directly or via a comment). Don’t worry if you are unfamiliar with editing the Knowledgebase. I can clean things up and make your examples look pretty after the fact. Just make sure your awesome new examples actually work. :)

Thanks!

  • http://datacharmer.blogspot.com Giuseppe Maxia

    Your code (select * from t1 where YEAR(d) = 2011) looks useful at first sight, but it is actually something that should not be in a reference manual, because it sets a bad example.
    Using a function in the WHERE clause often means that your query will not use an index. If your table had an index on column ‘d’ and 1 million rows, the database would call the function YEAR 1 million times.
    Sadly, the efficient query in this case should be
    select * from t1 where d between ‘2011-01-01′ and ‘2011-12-31 23:59:59′;
    The end result will be the same, but this query will run much faster if there is an index on column ‘d’.

    mariadb > select version()G
    *************************** 1. row ***************************
    version(): 5.2.6-MariaDB
    1 row in set (0.00 sec)

    mariadb > alter table t1 add key(d);
    Query OK, 6 rows affected (0.00 sec)
    Records: 6 Duplicates: 0 Warnings: 0

    mariadb > explain select * from t1 where YEAR(d) = 2011G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: t1
    type: index
    possible_keys: NULL
    key: d
    key_len: 9
    ref: NULL
    rows: 6
    Extra: Using where; Using index
    1 row in set (0.00 sec)

    # Notice that, even though it says “using index”, it doesn’t.
    # Try adding a few thousand rows, and you will always get a
    # full table scan with this query.

    mariadb > explain select * from t1 where d between ‘2011-01-01′ and ‘2011-12-31 23:59:59’G
    *************************** 1. row ***************************
    id: 1
    select_type: SIMPLE
    table: t1
    type: range
    possible_keys: d
    key: d
    key_len: 9
    ref: NULL
    rows: 3
    Extra: Using where; Using index
    1 row in set (0.00 sec)

  • Anders Karlsson

    Should the YEAR function not be used, because using it on an indexed column will prohibit the index in question to be used? In some cases, yes, but I really do think it woul be a mistake to always choose performance over readable, possibly easier to maintain code. And if there IS no index, or if the benefits of having or using one is close to 0, then I would sure go for the easier to maintain and read code.
    And NOT using an index is sometimes faster! In the example above, lets say we add another column “c”, i.e.:
    CREATE TABLE t1 (d DATETIME, c INT, KEY(d));

    Then using an index on “d” might not be that useful. If all, or a wast majoritry of, the rows have a value in “d” which is in 2011, then a FULL table scan would be more effective than using the index as:
    1) We would hit most of the rows anyway.
    and
    2) The column “c” would have to be gotten from the data blocks anyway, as it isn’t part of the index.

    Which is not to say I’m not pro-performance, but always vote for performance above usability would be a mistake in most cases. And in the cases where that is NOPT the case, then why use an RDBMS at all? But the trick Giuseppe shows is still useful of course, when this particular SQL really is a performance bottleneck and there is a useable index on the column “d” and the data distribution is such that the index will be useful. But performance isn’t everything! And better performance isn’t always a better application! But sometimes it is. But often you also want maintainability and ease of use.

    /Karlsson