While NoSQL may be getting all the buzz, in many cases an old fashioned relational database, such as MySQL, may work just as well if not better. That was the message from a number of MySQL users who presented their stories at Oracle's first MySQL Connect conference, held Saturday and Sunday in San Francisco.
On Sunday, engineers and executives from Twitter, PayPal and Verizon discussed their use of MySQL or MySQL Cluster. In each case, MySQL was being used for large high volume, distributed workloads that have been increasingly thought of as the province of NoSQL data stores such as MongoDB and Cassandra.
[ Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. | Keep up with the latest approaches to managing information overload and compliance in InfoWorld's Enterprise Data Explosion Digital Spotlight. ]
"A lot of people think they have a big data problem, and a lot of times they don't," said Daniel Austin, who is the chief architect for PayPal. "They have an urge to find a big data solution to a problem, because it looks good."
Austin admits that the typical relational database system (RDMS) architecture has not scaled very well to meet the influx of new data in many organizations, but he cast doubt on the idea that NoSQL data stores provide the answer. "You don't have to give up your relational model to have big data," Austin said.
Paypal itself deals with large amounts of data for its global payment transaction system, which must be fast and accurate. The system must be able to manage 100TB of fixed storage, and once data is written to the system, it must be able to be read from anywhere else in the world in less than a single second. This can be a challenge given that the fastest data can travel between the two most distant places on earth is about 67 milliseconds, thanks to the hard limit of the speed of light. "So that puts a lower bound of how fast things can go," Austin said. The company uses Amazon Web Services, spread across six different locations. All the live data is handled in-memory, rather than being immediately written to disk.
PayPal uses MySQL Cluster for a number of reasons, most notably that true High Availability (HA), which ensures that all the data entered into the system is captured immediately. Another advantage MySQL Cluster offered was scalability.
"We had to think about how to build the architecture a little bit," Austin said. The approach they used, called architectural tiling, was designed to "build a system that scales to an arbitrary number of users. And we did that with SQL," Austin said. " We feel confident we can scale beyond 100 million users no problem."
Another big user of MySQL is Twitter. Currently, Twitter has over 140 million active users, who issue about 400 million Twitter messages every day, all of which have to be stored, indexed and annotated with metadata. The company uses a modified version of MySQL 5.5 to handle this load. The company has six full time database administrators, to maintain "a few thousand database servers," said Jeremy Cole, the chief database architect for Twitter. The company also employs one full-time MySQL developer.
Cole spoke to why Twitter uses MySQL even when NoSQL databases would seemingly be better suited for such a heavy workload. "It comes down to a few key points," Cole said.
One is basic familiarity. "We have very extensive at-scale knowledge. We know how MySQL works internally. We know how to upgrade it, we know how to fix bugs and push out new releases," Cole said.
The company also appreciates MySQL's performance. With a bit of tuning, most of Twitter's MySQL servers are running "tens of thousands of queries per second," Cole said. The latency of the queries must be in the level of micro-seconds. Cole said that he will get pitches from NoSQL vendors making claims that NoSQL is faster than a relational SQL database. "Often that is not true," Cole said.
Data safety is another crucial component for Twitter. Twitter's internal database engine, InnoDB "does not lose our data," Cole said. Another advantage includes a strong ecosystem, including support and development from companies like Oracle and Percona.
While the company uses MySQL for many things, it also uses other data storage technologies for those cases where MySQL won't fit. For instance, the company developed a sharding and replication software, called Gizzard, that runs on top of MySQL. "I prefer to treat MySQL as a building block -- use it as a really strong core of features that we understand, and build solutions on top of that core," Cole said.
The online gaming company Playful Play was another customer that testified about its success of high-volume MySQL usage. This Mexico-based online game company recently found itself with a huge hit on its hands. The company's "La Vecindad de El Chavo," based on the popular long-running Mexican comedy series "El Chavo," has attracted over 3 million users since its launch in March, a number that is growing by 30,000 subscribers daily. "We had 100,000 in the first day and we got very scared. We didn't know what we had on our hands," said Ricardo Rocha, Playful Play CEO.
The company initially used the free version of MySQL Cluster, but when traffic suddenly spiked so that their need for servers exceeded Oracle's free licensing, the company contracted with Oracle for support. In fact the growth of traffic was so sudden that when the company experienced performance issues and Oracle flew support personnel to the company's headquarters before the support contract was actually signed.
Today the company runs MySQL Cluster Carrier Grade Edition across 24 servers, to capture data on user and avatar profiles, as well as play and advertising data. Twelve are for production, and the rest are for marketing analysis, pre-production and load-balancing.
The company is currently expanding its infrastructure to support as many as 100 million users, as part of its plans to expand to through Latin America, as well as Turkey, Spain, the Philippines, and Malaysia, countries where the show is also aired.