How to make PHP apps scale

The most popular language for Web apps, PHP tends to buckle under heavy loads -- unless you opt for cloud scaling and a NoSQL back end

The power of PHP and an RDBMS is the ability to nail the major features of an application with cheaply paid developers in a record amount of time. Unfortunately, the default runtime environment used by PHP is simply an unscalable mess.

A lot of the folks I've worked with do not care about maintainability; their PHP applications are throwaway, but heavily loaded and highly concurrent. For example, I worked with a company that developed a PHP marketing application with an Oracle back end, where you bought its products and could exchange your "points" for features of an online game. It worked great -- until it reached a few million users.

[ See InfoWorld's review of fabulous PHP frameworks and MongoDB's flexible, scalable take on NoSQL. | Follow the latest issues in software development with InfoWorld's Developer World newsletter. ]

The truth is that if you have enough servers and enough database servers, you don't have contention. But with a PHP Web app on top, an RDBMS like Oracle just can't be scaled cost-effectively to deliver both good read and write performance.

As it turns out, there's a modern solution to the problem: the cloud plus NoSQL. Cloud infrastructure gives us the ability to spin up enough servers, and a NoSQL database enables us to shard our data effectively. But first, let's examine why PHP's runtime environment is such dog to begin with.

Why PHP's runtime environment sucks

The most common runtime for PHP is the Apache Web Server in prefork mode, which means that the Web server runs a series of separate subprocesses to support concurrent requests. When you combine this concurrency characteristic with the use of a traditional relational database like MySQL, PostgreSQL, or Oracle, this choice implies unpooled database connections because database connection pooling requires a shared memory space.

Native threads, on the other hand, have a shared memory space as part of their master process. Subprocesses do not have a shared memory space unless you use a specific operating system area called "shared memory." This isn't as fast as being able to pass memory by reference -- besides, the Apache Web Server's "prefork" module doesn't support the use of shared memory for this purpose anyhow. It is sometimes possible to run PHP with native threads, aka worker mode, but this is heavily dependent on the modules you use and whether those modules are "thread safe."

php1_lg.jpg

The PHP concurrency model has a major impact on vertical scalability when using a traditional RDBMS. While it's possible to open thousands of unshared concurrent connections to MySQL or Oracle, this has a fairly negative impact on the number of concurrent requests. A typical PHP application -- indeed, any Web application -- consists of logic along these lines:

request -> getData -> doStuff -> getMoreData -> doMoreStuff -> WriteData -> sendReponse

In this type of code, there are relatively long periods of time where the application is not actually interacting with the database and another request could "share" the same database connection -- if only database connections could be pooled. Since the PHP process model precludes this, you are forced to make a decision: Hold the connection for the duration of the request/response cycle or let go each time the application is done.

1 2 Page 1
Page 1 of 2