Several weeks ago there was a notable bit of controversy over some comments made by James Gosling, father of the Java programming language. He has since addressed the flame war that erupted, but the whole ordeal got me thinking seriously about PHP and its scalability and performance abilities compared to Java. I knew that several hugely popular Web 2.0 applications were written in scripting languages like PHP, so I contacted Owen Byrne - Senior Software Engineer at digg.com to learn how he addressed any problems they encountered during their meteoric growth. This article addresses the all-to-common false assumptions about the cost of scalability and performance in PHP applications.
At the time Gosling’s comments were made, I was working on tuning and optimizing the source code and server configuration for the launch of Jobby, a Web 2.0 resume tracking application written using the WASP PHP framework. I really hadn’t done any substantial research on how to best optimize PHP applications at the time. My background is heavy in the architecture and development of highly scalable applications in Java, but I realized there were enough substantial differences between Java and PHP to cause me concern. In my experience, it was certainly faster to develop web applications in languages like PHP; but I was curious as to how much of that time savings might be lost to performance tuning and scaling costs. What I found was both encouraging and surprising.
What are Performance and Scalability?
Before I go on, I want to make sure the ideas of performance and scalability are understood. Performance is measured by the output behavior of the application. In other words, performance is whether or not the app is fast. A good performing web application is expected to render a page in around or under 1 second (depending on the complexity of the page, of course). Scalability is the ability of the application to maintain good performance under heavy load with the addition of resources. For example, as the popularity of a web application grows, it can be called scalable if you can maintain good performance metrics by simply making small hardware additions. With that in mind, I wondered how PHP would perform under heavy load, and whether it would scale well compared with Java.
Hardware Cost
My first concern was raw horsepower. Executing scripting language code is more hardware intensive because to the code isn’t compiled. The hardware we had available for the launch of Jobby was a single hosted Linux server with a 2GHz processor and 1GB of RAM. On this single modest server I was going to have to run both Apache 2 and MySQL. Previous applications I had worked on in Java had been deployed on 10-20 application servers with at least 2 dedicated, massively parallel, ultra expensive database servers. Of course, these applications handled traffic in the millions of hits per month.
To get a better idea of what was in store for a heavily loaded PHP application, I set up an interview with Owen Byrne, cofounder and Senior Software Engineer at digg.com. From talking with Owen I learned digg.com gets on the order of 200 million page views per month, and they’re able to handle it with only 3 web servers and 8 small database servers (I’ll discuss the reason for so many database servers in the next section). Even better news was that they were able to handle their first year’s worth of growth on a single hosted server like the one I was using. My hardware worries were relieved. The hardware requirements to run high-traffic PHP applications didn’t seem to be more costly than for Java.
Database Cost
Next I was worried about database costs. The enterprise Java applications I had worked on were powered by expensive database software like Oracle, Informix, and DB2. I had decided early on to use MySQL for my database, which is of course free. I wondered whether the simplicity of MySQL would be a liability when it came to trying to squeeze the last bit of performance out of the database. MySQL has had a reputation for being slow in the past, but most of that seems to have come from sub-optimal configuration and the overuse of MyISAM tables. Owen confirmed that the use of InnoDB for tables for read/write data makes a massive performance difference.
There are some scalability issues with MySQL, one being the need for large amounts of slave databases. However, these issues are decidedly not PHP related, and are being addressed in future versions of MySQL. It could be argued that even with the large amount of slave databases that are needed, the hardware required to support them is less expensive than the 8+ CPU boxes that typically power large Oracle or DB2 databases. The database requirements to run massive PHP applications still weren’t more costly than for Java.
PHP Coding Cost
Lastly, and most importantly, I was worried about scalability and performance costs directly attributed to the PHP language itself. During my conversation with Owen I asked him if there were any performance or scalability problems he encountered that were related to having chosen to write the application in PHP. A bit to my surprise, he responded by saying, “none of the scaling challenges we faced had anything to do with PHP,” and that “the biggest issues faced were database related.” He even added, “in fact, we found that the lightweight nature of PHP allowed us to easily move processing tasks from the database to PHP in order to deal with that problem.” Owen mentioned they use the APC PHP accelerator platform as well as MCache to lighten their database load. Still, I was skeptical. I had written Jobby entirely in PHP 5 using a framework which uses a highly object oriented MVC architecture to provide application development scalability. How would this hold up to large amounts of traffic?
My worries were largely related to the PHP engine having to effectively parse and interpret every included class on each page load. I discovered this was just my misunderstanding of the best way to configure a PHP server. After doing some research, I found that by using a combination of Apache 2’s worker threads, FastCGI, and a PHP accelerator, this was no longer a problem. Any class or script loading overhead was only encountered on the first page load. Subsequent page loads were of comparative performance to a typical Java application. Making these configuration changes were trivial and generated massive performance gains. With regard to scalability and performance, PHP itself, even PHP 5 with heavy OO, was not more costly than Java.
Conclusion
Jobby was launched successfully on its single modest server and, thanks to links from Ajaxian and TechCrunch, went on to happily survive hundreds of thousands of hits in a single week. Assuming I applied all of my new found PHP tuning knowledge correctly, the application should be able to handle much more load on its current hardware.
Digg is in the process of preparing to scale to 10 times current load. I asked Owen Byrne if that meant an increase in headcount and he said that wasn’t necessary. The only real change they identified was a switch to a different database platform. There doesn’t seem to be any additional manpower cost to PHP scalability either.
It turns out that it really is fast and cheap to develop applications in PHP. Most scaling and performance challenges are almost always related to the data layer, and are common across all language platforms. Even as a self-proclaimed PHP evangelist, I was very startled to find out that all of the theories I was subscribing to were true. There is simply no truth to the idea that Java is better than scripting languages at writing scalable web applications. I won’t go as far as to say that PHP is better than Java, because it is never that simple. However it just isn’t true to say that PHP doesn’t scale, and with the rise of Web 2.0, sites like Digg, Flickr, and even Jobby are proving that large scale applications can be rapidly built and maintained on-the-cheap, by one or two developers.
Further Reading
Scalability:
Performance:

3 webservers with *8* database slaves? Information on how this configuration was arrived at would be interesting, as well as the run-time management of this setup...
Great article. Hopefully it helps dispell the "scalability" myth. I get to hear this one all the time. Of course it's not only aimed at PHP -- you also hear the same argument applied to Python, Ruby, Cold Fusion, Classic ASP, etc. And it usually comes from the Java / .NET crowd. (I don't know why -- maybe it's just in defense of all the extra code these guys need to write! No, just kidding... JOKE! IT'S JUST A JOKE!)
And it obviously isn't true. Unless you're blind you can see large and complex applications written in almost any popular scripting language: PHP (Flickr), Python (Google), Cold Fusion (Lockheed Martin E-STARSĀ®), and Classic ASP (Dell until recent times). And of course you see plenty of great apps on the web written in both Java and .NET.
This isn't a Java or .NET bash. PangoMedia (my company) develops in Java, .NET, Python, Cold Fusion and even Visual Basic. We also happen to do a lot of PHP development using Brian's excellent WASP framework. And in seven years of working with all of these technologies we've never had a language-related performance bottleneck. That's not to say that we haven't had to work on performance turning in same cases. But it's never been a big issued. And the better part of this sort of work tends to happen at the database layer anyway.
I do have one small bone to pick with this article, however. I suggests the common MySQL MyISAM good / InnoDB bad misconception I often see these days. MySQL is somewhat unusual in that it offers multiple storage engines. Many developers seem to get confused by this. These storage engines are optimized for different tasks. InnoDB is certainly a better general-purpose table type, but it doesn't allow for the use of FULLTEXT indexes. This feature is a critical component of MySQL scalability and it's really one of the best arguments for using MySQL in the first place. It's both easy to use and insanely fast.
This article provides a nice introduction to the feature:
http://www.onlamp.com/pub/a/onlamp/2003/06/26/fulltext.html
Here's a link to a presentation by the Flickr folks their scalability adventures. Note how important the use of MyISAM tables are for their "little" app:
http://www.niallkennedy.com/blog/uploads/flickr_php.pdf
Anyway as said before -- nice article Brian. I just just wanted to point out that one small error.
Great article. Hopefully it helps dispell the "scalability" myth. I get to hear this one all the time. Of course it's not only aimed at PHP -- you also hear the same argument applied to Python, Ruby, Cold Fusion, Classic ASP, etc. And it usually comes from the Java / .NET crowd. (I don't know why -- maybe it's just in defense of all the extra code these guys need to write! No, just kidding... JOKE! IT'S JUST A JOKE!)
And it obviously isn't true. Unless you're blind you can see large and complex applications written in almost any popular scripting language: PHP (Flickr), Python (Google), Cold Fusion (Lockheed Martin E-STARSĀ®), and Classic ASP (Dell until recent times). And of course you see plenty of great apps on the web written in both Java and .NET.
This isn't a Java or .NET bash. PangoMedia (my company) develops in Java, .NET, Python, Cold Fusion and even Visual Basic. We also happen to do a lot of PHP development using Brian's excellent WASP framework. And in seven years of working with all of these technologies we've never had a language-related performance bottleneck. That's not to say that we haven't had to work on performance turning in same cases. But it's never been a big issued. And the better part of this sort of work tends to happen at the database layer anyway.
I do have one small bone to pick with this article, however. I suggests the common MySQL MyISAM good / InnoDB bad misconception I often see these days. MySQL is somewhat unusual in that it offers multiple storage engines. Many developers seem to get confused by this. These storage engines are optimized for different tasks. InnoDB is certainly a better general-purpose table type, but it doesn't allow for the use of FULLTEXT indexes. This feature is a critical component of MySQL scalability and it's really one of the best arguments for using MySQL in the first place. It's both easy to use and insanely fast.
This article provides a nice introduction to the feature:
http://www.onlamp.com/pub/a/onlamp/2003/06/26/fulltext.html
Here's a link to a presentation by the Flickr folks their scalability adventures. Note how important the use of MyISAM tables are for their "little" app:
http://www.niallkennedy.com/blog/uploads/flickr_php.pdf
Anyway as said before -- nice article Brian. I just just wanted to point out that one small error.
"3 webservers with *8* database slaves? Information on how this configuration was arrived at would be interesting, as well as the run-time management of this setup..."
From what I found, the typical way to scale MySQL databases is to initially start with 1 master DB, and then replicate to a growing number of slaves. The master handles all writes, and replicates down to the slaves, which can be used to load balance reads. There are obvious shortcomings to this, the main one being there isn't a good way to load balance writes.
Brad Fitzpatrick from LiveJournal put out some slides outlining their growth process (pdf). Very interesting read.
What database platform are they moving to? Why are they moving from MySQL?
You and I have different definitions of "large scale applications" obviously. I wouldn't consider any of those "large-scale". Furthermore, anything that can be "rapidly built and maintained on-the-cheap, by one or two developers" would not meet my definition of "large scale".
Maybe they handle a lot of traffic, but their business requirements and level of functionality are dead simple.
"You and I have different definitions of "large scale applications" obviously. I wouldn't consider any of those "large-scale". Furthermore, anything that can be "rapidly built and maintained on-the-cheap, by one or two developers" would not meet my definition of "large scale".
Maybe they handle a lot of traffic, but their business requirements and level of functionality are dead simple."
Fair enough. Certainly Digg and Flickr don't represent large scale "enterprise" applications, however it doesn't seem too much of a stretch to say that it would be feasible to write one in PHP, considering that it is in fact scalable.
Jason, I think you missed the point. Scalability != "large scale".
It seems strange to me that PostgreSQL was not mentioned when talking about a scalable DBMS. Certainly for a big site like Digg it would be wiser to use it rather than using MySQL et al.
Of course ORACLE is a prime candidate, but with a very small improvement over PostgreSQL. Mostly in the feature department and less in the scalability department. Plus, the dough you have to produce just by looking at ORACLE.
Normally you would put up a PHP page which access mysql and then hit it with a stresstester. A simple one, which would do the job, is apache benchmark. Much easier and faster than to go interview digg.com. But of course then you would not be able to write an article about it and mention your own site a couple of hundred times in your own article.
This must be the most obvious and rotten plug I have seen in a long time for a new web-site. Especially considering that you wrote WASP yourself. This is low man! And the article sucks.
Interesting article :)
At www.last.fm we use Memcached, a distributed memory object caching system that has apis for many languages, including PHP and Java. Worth a look if the bottleneck is the database. You can reduce your database reads by 90% easily, provided you have a clean API in the first place..
I am a PHP developer whos been plagued by the 'scalability' question and constantly being looked down by arrogant Java "enterprise" developers.
Throughout the years, I have asked myself many times if PHP is up to par and if it really do have scalability issues.
After reading some documents and noticing that some huge sites are PHP based (e.g. riteaid.com, flickr.com), I felt much better and I was confident enough to brush the 'not scalable' argument aside.
If only I can find a way to handle those Java snobs.
:(
Does PHP run reliably with apache worker threads? I thought that wasn't supported?
Totally great article. I thouroughly enjoyed it.
Keep up the great work all.
Very interesting article. You might also like to explore what applications other Digg Tools are using.
"Throughout the years, I have asked myself many times if PHP is up to par and if it really do have scalability issues.
After reading some documents and noticing that some huge sites are PHP based (e.g. riteaid.com, flickr.com), I felt much better and I was confident enough to brush the 'not scalable' argument aside.
If only I can find a way to handle those Java snobs."
Speaking as a former "Java snob", it's a shame that it has to work that way. Experience has made me much less likely to engage in a my-language-is-better-than-your-language debate. Frankly, such arguments are childish and miss the point. Languages are a tool for getting work done, and you should always pick the best tool for the job at hand (despite how painfully cliche that sounds).
If I were writing an application that needed to do intensive reporting, I'd proably use a language like Java. I might even use a PECL extension to have PHP call the Java code.
dont forget Yahoo! - they are in a php migration from Aapche C modules and get far more hits then digg.
Hi. You mentioned Digg is moving to a new db platform. Anyone know what platform?
"Hi. You mentioned Digg is moving to a new db platform. Anyone know what platform?"
When I interviewed Owen for the article, they were considering the idea. I have since emailed him to get clarification and he replied that they are now planning on staying with MySQL, but changing their architecture to use memcachd and "shards".
I'm thinking about doing research for an article that goes into detail about what database architectures work best for these sorts of PHP applications.
Jobby is a waste of space.
Thank you for putting effort into this. This is a great article.
(newbie question) I am also interested in another form of scaling. If I were writing a web application serving data in multiple presentation formats (for example HTML, WML) would PHP still be a suitable platform? Are there any other businesses using PHP in a similar situation.
"(newbie question) I am also interested in another form of scaling. If I were writing a web application serving data in multiple presentation formats (for example HTML, WML) would PHP still be a suitable platform? Are there any other businesses using PHP in a similar situation."
Yes, provided you use a solid MVC architecture. WASP, for example, uses a template manager to handle its UI, and the choice of which template to use can be made at runtime. You could have a set of HTML templates along side a set of WML templates and choose which one to render based on the URL.
This was a nice article and I enjoyed reading it, however, I would like to know how well you think PHP compares to Java when dealing with issues like:
1) Keeping multiple data centers in sync with each other, in real time
2) Dealing with complex access control systems
3) Dealing with complex ecommerce systems
I don't doubt that PHP is scalable or capable of performing well at the 2 million hits per month range on a simple web application, but I do have concerns when it comes down to the available tool sets to handle enterprise level problems in PHP versus J2EE.
I'm very curious why they are moving away from MySQL? Any update is appreciated.
Sorry guys, what dou you mean by 'shards'?? I'm not that good
at techie slang ;D
Goog site, really! Thank you! download mp3 music
nice blogpost. the links are very useful. thank you.
In my experience of C++, Delphi (pascal) and VB, PHP, as a language, is a jolly awkward customer. It's got so many nooks and crannies that it's barely usable in my opinion. At least C++ makes a token nod at simplifying. Interestingly MySql is exactly the same: riddled with idiosyncratic nonsense. They are a perfect match for each other, even if the result is hell-like. Granted they are accessible to beginners, unlike Java or many other languages. Perhaps performance-wise PHP scales, but I can;t beleive that they would scale language-wise (ie. for big projects).
However the author of this article gives himself away when we find out that he didn't know how to optimise PHP in order to 'compile' only once. I can't trust his opinion. It's a good job he referred to someone with more authority, but he has no ability to evaluate what the guy has said and then to give us an interesting and informed opinion. Ah well.
In anycase its actually all the fault of ducks. That's right. If you didn't know then you certainly should. Kill those ducks!
http://www.javatag.com find php doc by javadoc styles
I'm under the impression that the discussion about what language is faster/better/+scalable are pointless, because, as pointed out, the bottleneck is in the database (i.e., disk access).
As you can see at any pure processing speed benchmark (http://www.timestretch.com/FractalBenchmark.html), C is about 10 times faster than Java, and Java is about 30 times faster than PHP.
At this point, processors are so fast that it doesn't really matter anymore what kind of language you are using. You could even use assembler or C, and I suspect that you would not gain in performance/scalability, why? Because faster languages give you gains of microseconds, while your real bottleneck is in the database I/O.
That means that even "slow", interpreted languages like Ruby and PHP perform quite well doing database processing.
This article provides a good lesson, but the value of a language is related to its purpose. Seems to be that this days the main problem is not execution-scalability, but "development scalability". When the problem size and complexity increases, does the development time go linear or exponential? Quite frankly, I don't know enough to give an answer, but seems like this is the problem that at this point we need to worry about.
Luis
Hello,admin!Your site is fantastic! buy isoptin
I love you so much! Great place to visit! flooring laminate wilsonart
Please consider either fixing or removing the links to Large Scale PHP and High performance PHP referred to in this article so they work in IE. There is no reason other than some obnoxious developer making things difficult for these links to work the way they do. Those of us that actually make our livings providing web based applications to cusomters work in IE because outside the developer community 99% of users use IE.
"Hosted Linux server" - does that mean the one that you can get from any dedicated hosting company? How do you move to your own multiple servers at a different datacenter without shutting down the site?
Resume tracking application with bunch of searches for the information appears to be much more aggressive on PHP and DB resources then digg with their get/update DB entry behavior. It is interesting to know the further developments with Jobby.
medvegonok
A thorough article to increase the performance of apache & php:
http://kevin.vanzonneveld.net/techblog/article/survive_heavy_traffic_with_your_webserver