Mr. Sulu steers Facebook to MySQL solution

Facebook database engineer reveals daily struggles with MySQL to keep the social networking site running smoothly

Facebook's engineering team has given a public nod to "Star Trek" celebrity George Takei for helping them fix a site performance problem that could be traced to MySQL. Judging by a recent account from Facebook engineer Mark Callaghan, however, the site's database infrastructure team struggles on a daily basis with wringing snappy, reliable performance out of database that, by some accounts, is simply ill-suited for a site like Facebook.

In a recent post to the Facebook Engineering Notebook, Callaghan credited Takei -- who played Mr. Sulu on "Star Trek" and currently has over 1.2 million Facebook followers -- for posting an update about inconsistency with his Facebook experience. In a post dated Feb. 23, Takei observed that his posts hadn't been showing up in some of his followers' news streams.

"George Takei has a lot of fans with us and since we've all Liked his Page, a while back some of us saw an update from him about an inconsistency in his Facebook experience. We realized what he was experiencing was an issue we were already trying to fix on the database side, so when we saw him post, it gave us more information that helped us get closer to resolving the issue," Callaghan wrote, praising the power of crowdsourcing.

Callaghan did not specify what, exactly, the fix entailed -- but his post did reveal the challenges he and his team face day-to-day getting MySQL to scale and perform reliably as Facebook traffic grows. One takeaway from his account: It takes a lot of sweat and elbow grease to squeeze the kind of performance from MySQL that Facebook needs. Callaghan said that he spends of half of his programming time just "fixing things that stall MySQL, and the other half is devoted to making MySQL faster.

"At Facebook, the quality of service we get from MySQL is much better than what you might expect if you just read the manual from MySQL. Our operations team is able to work around MySQL's imperfections in a way that allows engineering to move really fast," he wrote.

One challenge, for example, has been to devise ways to get the database to scale on the site's multicore servers with its fast storage. "In the past, the database servers that MySQL were using wouldn't do any more than a thousand evictions per second, but now we need them to do 10,000 per second. Any inefficiencies in that part of the server are magnified now, and we're still trying to figure out how to remove them."

Another feat for Callaghan and team was to modify MySQL when they upgraded servers, as the machines newer CPUs had a potentially better way to check database pages using the CRC32 for checksums. "The hard part there was upgrading the servers on the fly from using the old check zones to the new checksums without taking the site down," he wrote.

Callaghan said that his team is trying to make use of innoDB compression. The problem is, innoDB isn't well suited for Facebook's demanding, complex workload, what with the continuous writes as status updates, Likes, shares, and uploads flow into the database. "We're trying to adapt innoDB so it works on a more challenging workload and on faster storage devices," wrote Callaghan. "This presents interesting challenges, since the bottlenecks and pileups that occur at the serialization point are exaggerated because there's more work being done at that choke point than there used to be."

This story, "Mr. Sulu steers Facebook to MySQL solution," was originally published at Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest developments in business technology news, follow on Twitter.