8 MySQL gotchas worth a rant

A schema of gripes about (nearly) everyone’s favorite relational DB

8 MySQL gotchas worth a rant

MySQL is easy to install, relatively fast, and loaded with features. If that's not enough, it’s also one of the most prominent flagships of the open source movement, the big success story that showed us that a winning company could be built around open source code.

Yet anyone who has worked with the bits has shaken a fist at the screen on any number of occasions for any number of reasons. You can’t build a technology that stores a bazillion new rows of Internet blather each second and not have a few cracks show.

In the interest of being cranky this summer, we’ve rolled up eight reasons why the open source relational database of choice sometimes gets us grumbling. Not all of the reasons given below are limited to MySQL alone. Some are broader salvos aimed at relational databases in general. But if we don't think clearly about relational databases and MySQL, we'll be stuck in the 1990s forever. We need to tear down to build up. (Or switch to a newfangled database that hasn't existed long enough for us to develop a list like this.)

Deep-seated bugs and oddities

Any big software package has bugs. But dig a little deeper, and MySQL’s bugs take on a life of their own. Suddenly, you have to pay attention because it's not clear that NULL will always behave the same way, nor will the foreign key constraints be enforced as you might expect ... nor will auto increment do the right thing.

There are dozens of small issues, and they don't always get fixed. That's why some people keep gotcha lists. At least MySQL maintains a good bug reporting system, so we can know we're not imagining things. Others are experiencing the same frustrations.

The inflexibility of relation tables

Tables bring discipline and discipline is good -- until it forces programmers to fudge or shoehorn data into the inflexible columns defined by schema. One of the big reasons that NoSQL is becoming popular is that it gives programmers enough flexibility to enhance the data model on the fly. If one street address needs another line, well, you can insert it easily into a NoSQL document. If you want to add an entire new block of who knows what, the document model is also ready to accept your data for what it is, not what it wants your data to be.

Imagine you build a table full of ZIP codes stored as integers. It's wonderfully efficient, and it enforces the rules well. Then someone sends along a nine-digit ZIP with a hyphen. Or maybe you get a customer from Canada who has letters in the postal code.

Everything breaks. The boss wants the website working in hours. There's no time to re-architect the solution. What does the programmer do? Maybe create some hack that treats the Canadian postal code as a Base64 number and converts it back to base 10? Or set up some secondary table with a special escape code that indicates the real postal code is elsewhere? Who knows? There are dozens of hacks and all of them are dangerous. But you have mere minutes to figure it out.

MySQL’s relational rules keep everyone honest and careful, but it can force us to hide our troubles through hacks and dodges.


For a time, the idea of breaking data into multiple tables was the greatest notion in computer science. Not only were the resulting tables dramatically smaller, but they enforced simplicity for everyone. But that discipline, and pleasure, comes at a price in the form of JOIN statements.

No part of SQL sends developers into fits of confusion and despair more than trying to build a complex sequence of JOINs. Then the storage engine needs to figure out the optimal way to unpack the JOINs efficiently. The developers build an elaborate query and the database unpacks it.

This is why many developers who want speed give up on this grand experiment and  denormalize their tables. Instead of splitting up items, they dump it all into one big, wide table and side-step the complexity. Everything is faster, and the server doesn't run out of memory (as easily).

Disk space is cheap now. There are 8TB drives on the market, and bigger ones are coming. We no longer need to be pulling out what’s left of our hair working with JOINs.

Fork confusion

Yes, the existence of a solid, well-supported fork of MySQL brings competition and choice, but it also breeds confusion and chaos. To make matters worse, MariaDB, the fork, is run by Monty Widenius, the same guy who helped make MySQL what it is. Is MariaDB the true sovereign that deserves our allegiance? Or is it MySQL? Should we stick with the central code run by the organization that built the database originally? Or should we join the rebels who are certainly clever and often cooler?

And how do we unpack the message about compatibility? On one hand, we're assured that MariaDB and MySQL are pretty much interchangeable. On the other, we have to believe there are differences -- why else would everyone be sparring? Maybe they're in the range of performance and our queries work the same way in both camps? But maybe they don't -- or won’t in the future.

Storage engine chaos

MySQL isn't really one database; it's several, all squirreled away under a surface that hides most of the details. In the beginning, there was MyISAM, an engine that was fast and not particularly careful with consistency. That was OK because sometimes you need speed and can live with inconsistent results.

When the people wanted more, InnoDB appeared with full transaction support. But that wasn't enough. Now there may be 20 storage engine options -- more than enough to drive a DBA insane. Sure, there will be some times when it's nice to switch between the engines without rewriting your SQL, but there will always be confusion later. Did I choose MyISAM or InnoDB for that table? Or did I decide to write out the data in CSV format instead?

The profit motive

For all its success as an open source product, MySQL is still a business full of professional developers who earn a paycheck. While most users continue to enjoy the best features of the open source license, there's no doubt the company must continue to earn enough money to keep the lights on. This leads to a strange tension between the free code distributed as a "community edition" and the full product that is sold to businesses.

Should you pay? Do you get much for the money? Is it unfair to run a business on the community edition? Are the extra features in the enterprise edition simply gimmicks to lure us into paying forever? At the very least, this is another set of questions that demands an answer. Which edition? Which license? Which feature set?

Lack of native JSON support

The best way to see the age of MySQL is to install it, then realize that you need to add more drivers to make it useful. MySQL generally talks on port 3306, and it generally spits out data in its own inscrutable format. If you want your code to talk with it, you must add another layer of code that converts the MySQL-speak into something useful. These layers of code, distributed as libraries, often send people to buy a commercial license.

Modern data storage layers usually speak JSON directly. While MySQL and MariaDB now have the ability to parse JSON as part of SQL, this isn't the same as a nice, native JSON interface like the ones that can be found on CouchDB, MongoDB, or any of the latest tools.

The rise of closed-source, proprietary modules

Did I say that MySQL is open source? It is -- except for some of the newer, closed-source, proprietary modules built around the "open core." Programmers need to eat. Oracle needs to monetize its hard work. This is one of the realities of business. It's not like the hospitals that use MySQL are giving away medical care. It's not like the farmers who use MySQL are giving away food.

It's a bit unfair to hold MySQL to a higher standard, but open source success can be a trap. Just because it starts off free doesn't mean that it can remain so. If businesses want many of the newer features, they'll have to pay one way or another. Sometimes paying Oracle is much cheaper than writing the code in-house. Sometimes commercial, closed-source code makes sense. If only it were easy to figure out when this is true.

Related articles


Copyright © 2015 IDG Communications, Inc.