Expert interview: How to scale Django

Eventbrite's John Shuping and Simon Willison reveal all about scaling Django, the Python Web framework developers love

1 2 3 4 5 Page 2
Page 2 of 5

Willison: The Django ORM has a few low-level hooks for letting you switch to a different database connection, but out of the box it won't solve sending inserts to one place and updates to another. We've written custom code at Eventbrite for that.

Another trick is we have separate slaves for things like long-running reporting queries. Expensive SQL queries aren't running on anything serving production traffic.

Freeman: What about database problems in production that required immediate changes?

Shuping: One really interesting thing came up probably three years ago. As you are buying a ticket to an event, it writes to the database and your subsequent page may want to read that info from the database to generate your confirmation email. We noticed that if a slave lagged by a half-second behind a master, you could end up writing to a master and reading from a slave shortly thereafter and it not actually being there.

So we devised two layers of protection around this database slave lag. The first is within Django; it's what we call DB pinning. Basically, it means if your code writes to the master, then any subsequent reads that it does for say, two seconds, are going to go to the master.

We also use a set of HAProxy load balancers in front of our database slaves that the Django config is actually pointing to. The load balancers are looking at all the slaves and doing a real-time health check on them. If it detects that one of the slaves has more than a two-second lag, we take it out of the pool of available slaves and it doesn't serve any traffic until it catches back up.

Freeman: With DB pinning, are you storing in the session? It sounds sort of like sticky sessions.

Shuping: We definitely do not do sticky sessions, which, to Simon's point, makes it really easy to scale horizontally. But you're right, for DB pinning, we use memcache. We have a cluster of four memcache servers that we're consistently hashing across and storing DB pinning tokens for your guest cookie or whatever may be in memcache.

Freeman: On the Lanyrd side, you switched from MySQL to PostgreSQL, right?

Willison: Yes, that's right. We made that switch about a year ago for a few reasons. People have written huge amounts of stuff about MySQL versus PostgreSQL and so on, but the one killer feature we cared about is that in PostgreSQL you can add a new column to the table without locking up the whole table. You can't do this in MySQL. We were getting to the point in Lanyrd that some of our larger tables were large enough that it became painful adding new columns to those tables. The big benefit we got from PostgreSQL is that, having moved over, it was much easier to make modifications to our database tables.

[Note: Lanyrd's transition from MySQL to PostgreSQL was done in two hours with the site up, but in a read-only mode.]

Willison: Eventbrite runs on MySQL and uses a technique called pt-online-schema-change to add new columns to MySQL tables. This means you can modify your tables at runtime without any downtime to the site. But there are features in MySQL you can't use, such as foreign key constraints at the database level because those aren't compatible with the way we do replication and the way we do online schema changes.

Shuping: To jump ahead to your point about maintenance, we have an operational goal of not having any planned maintenance or downtime or read-only mode; pt-online-schema-change is one key thing that contributes to us being able to do that.

1 2 3 4 5 Page 2
Page 2 of 5