Several weeks ago there was a notable bit of controversy over some comments made by James Gosling, father of the Java programming language. He has since addressed the flame war that erupted, but the whole ordeal got me thinking seriously about PHP and its scalability and performance abilities compared to Java. I knew that several hugely popular Web 2.0 applications were written in scripting languages like PHP, so I contacted Owen Byrne - Senior Software Engineer at digg.com to learn how he addressed any problems they encountered during their meteoric growth. This article addresses the all-to-common false assumptions about the cost of scalability and performance in PHP applications.
At the time Gosling’s comments were made, I was working on tuning and optimizing the source code and server configuration for the launch of Jobby, a Web 2.0 resume tracking application written using the WASP PHP framework. I really hadn’t done any substantial research on how to best optimize PHP applications at the time. My background is heavy in the architecture and development of highly scalable applications in Java, but I realized there were enough substantial differences between Java and PHP to cause me concern. In my experience, it was certainly faster to develop web applications in languages like PHP; but I was curious as to how much of that time savings might be lost to performance tuning and scaling costs. What I found was both encouraging and surprising.
What are Performance and Scalability?
Before I go on, I want to make sure the ideas of performance and scalability are understood. Performance is measured by the output behavior of the application. In other words, performance is whether or not the app is fast. A good performing web application is expected to render a page in around or under 1 second (depending on the complexity of the page, of course). Scalability is the ability of the application to maintain good performance under heavy load with the addition of resources. For example, as the popularity of a web application grows, it can be called scalable if you can maintain good performance metrics by simply making small hardware additions. With that in mind, I wondered how PHP would perform under heavy load, and whether it would scale well compared with Java.
My first concern was raw horsepower. Executing scripting language code is more hardware intensive because to the code isn’t compiled. The hardware we had available for the launch of Jobby was a single hosted Linux server with a 2GHz processor and 1GB of RAM. On this single modest server I was going to have to run both Apache 2 and MySQL. Previous applications I had worked on in Java had been deployed on 10-20 application servers with at least 2 dedicated, massively parallel, ultra expensive database servers. Of course, these applications handled traffic in the millions of hits per month.
To get a better idea of what was in store for a heavily loaded PHP application, I set up an interview with Owen Byrne, cofounder and Senior Software Engineer at digg.com. From talking with Owen I learned digg.com gets on the order of 200 million page views per month, and they’re able to handle it with only 3 web servers and 8 small database servers (I’ll discuss the reason for so many database servers in the next section). Even better news was that they were able to handle their first year’s worth of growth on a single hosted server like the one I was using. My hardware worries were relieved. The hardware requirements to run high-traffic PHP applications didn’t seem to be more costly than for Java.
Next I was worried about database costs. The enterprise Java applications I had worked on were powered by expensive database software like Oracle, Informix, and DB2. I had decided early on to use MySQL for my database, which is of course free. I wondered whether the simplicity of MySQL would be a liability when it came to trying to squeeze the last bit of performance out of the database. MySQL has had a reputation for being slow in the past, but most of that seems to have come from sub-optimal configuration and the overuse of MyISAM tables. Owen confirmed that the use of InnoDB for tables for read/write data makes a massive performance difference.
There are some scalability issues with MySQL, one being the need for large amounts of slave databases. However, these issues are decidedly not PHP related, and are being addressed in future versions of MySQL. It could be argued that even with the large amount of slave databases that are needed, the hardware required to support them is less expensive than the 8+ CPU boxes that typically power large Oracle or DB2 databases. The database requirements to run massive PHP applications still weren’t more costly than for Java.
PHP Coding Cost
Lastly, and most importantly, I was worried about scalability and performance costs directly attributed to the PHP language itself. During my conversation with Owen I asked him if there were any performance or scalability problems he encountered that were related to having chosen to write the application in PHP. A bit to my surprise, he responded by saying, “none of the scaling challenges we faced had anything to do with PHP,” and that “the biggest issues faced were database related.” He even added, “in fact, we found that the lightweight nature of PHP allowed us to easily move processing tasks from the database to PHP in order to deal with that problem.” Owen mentioned they use the APC PHP accelerator platform as well as MCache to lighten their database load. Still, I was skeptical. I had written Jobby entirely in PHP 5 using a framework which uses a highly object oriented MVC architecture to provide application development scalability. How would this hold up to large amounts of traffic?
My worries were largely related to the PHP engine having to effectively parse and interpret every included class on each page load. I discovered this was just my misunderstanding of the best way to configure a PHP server. After doing some research, I found that by using a combination of Apache 2’s worker threads, FastCGI, and a PHP accelerator, this was no longer a problem. Any class or script loading overhead was only encountered on the first page load. Subsequent page loads were of comparative performance to a typical Java application. Making these configuration changes were trivial and generated massive performance gains. With regard to scalability and performance, PHP itself, even PHP 5 with heavy OO, was not more costly than Java.
Jobby was launched successfully on its single modest server and, thanks to links from Ajaxian and TechCrunch, went on to happily survive hundreds of thousands of hits in a single week. Assuming I applied all of my new found PHP tuning knowledge correctly, the application should be able to handle much more load on its current hardware.
Digg is in the process of preparing to scale to 10 times current load. I asked Owen Byrne if that meant an increase in headcount and he said that wasn’t necessary. The only real change they identified was a switch to a different database platform. There doesn’t seem to be any additional manpower cost to PHP scalability either.
It turns out that it really is fast and cheap to develop applications in PHP. Most scaling and performance challenges are almost always related to the data layer, and are common across all language platforms. Even as a self-proclaimed PHP evangelist, I was very startled to find out that all of the theories I was subscribing to were true. There is simply no truth to the idea that Java is better than scripting languages at writing scalable web applications. I won’t go as far as to say that PHP is better than Java, because it is never that simple. However it just isn’t true to say that PHP doesn’t scale, and with the rise of Web 2.0, sites like Digg, Flickr, and even Jobby are proving that large scale applications can be rapidly built and maintained on-the-cheap, by one or two developers.