The PHP Scalability Myth
Subject:   Scalability
Date:   2003-10-17 10:29:40
From:   anonymous2
The first incorrect premise of this artical was that performance is related to scalability. As an architectual choice, it is inevitable that by choosing to have scalability, you will be sacrificing some performance. There is a tradeoff between performance & scalability in the same way that there is a tradeoff between speed and memory. It is one thing to be able to create a nice web application that can handle 10,000 concurrent users. Its a completely different thing to build one that can handle 5,000,000 concurrent users. If you are expecting 5,000,000 concurrent active sessions, you need to build a lot of code to handle this properly. All of the handling slows down each individual transaction.

The defining point to scalability is to eliminate bottlenecks from the architecture that generally only occur when the overall throughput is massive. If each session requires 32K in core, when there are 10,000 its easy. When there are 5,000,000 a single computer is probably going to thrash itself into a crash.

A great study in understanding scalability is comparing the Teradata RDBMS to Oracle. One is implemented in hardware, slow as molasses, but its virtually impossible to issue enough queries to significantly change the throughput of all of the queries. In the other case, as Oracle splats its databases over more and more inter-networked machines, it will eventually reach a maximum where adding a new machine won't significantly boost the overall throughput of the system. There is clearly a limit to the number of queries that Oracle can handle. The limit for Teradata is likely at least an order of magnitude above the Oracle limit.

The second incorrect premise of the article is that scalability matters. Most of the existing systems out there have been so crippled by poor algorithms (N^2 or worse) and bad code that it really doesn't matter if the underlying architecture could scale. As well, most businesses are effectiviliy limited by their own business models that the technological issues really don't matter. If a company is running a system that handles 10,000 users now, then its more than likely that the usage will predicable grow slowly enough that someone can re-write the system to support 50,000 people. If they don't then the existing technology will essentially throddle the users and keeps the number in and around 10,000. This happens because when there are too many users they get pissed off at the performance and go elsewhere.

In truth, I've seen no evidence that the J2EE architecture could in fact handle some massive number like 5,000,000 reasonably sized querys concurrently. While I'm not entirely sure that it is necessary I know that when Sun claims Java is scalable it is really just using it as a buzz word for "we can run this architecture for 50,000 concurrent users, when the other guys can only get to 20,000". It is clear that when they were developing the specifications for the EJB architecture they were explicitly trying to address the scalability issues. If at the end of the day, PHP, which didn't try to address these issues can be considered an alternative to the EJB design, it doesn't prove that PHP is scalable. What it really implies is that Java isn't!

Main Topics Oldest First

Showing messages 1 through 1 of 1.

  • Scalability
    2003-10-18 22:44:52  anonymous2 [View]

    >It is one thing to be able to create a nice web application that can handle 10,000 concurrent users. Its a completely different thing to build one that
    can handle 5,000,000 concurrent users.

    First of all, your numbers are off. But nevermind. The trick isn't to increase complexity, though to a point, that might work. But complexity doesn't scale. At some point (long before it actually pays off with EJBs) you hit a wall of complexity that can not be maintained. In _any_ system, this happens long before 5 million concurrent users.

    The solution is not to increase the number of concurrent sessions available, but to reduce the number needed. A stateless session protocol is the answer. That's why the internet is so popular.

    An EJB-style solution, no matter what, fails to do this. You are increasing the number of sessions, and trying to use some generic passivity algorithm to keep them down, in the hope that someday hardware will catch up. Let the users help you decide how long their session is.

    It turns out the reinitializing the session is less expensive than maintaining it.