The PHP Scalability Myth
Subject:   Two-tier vs. three-tier
Date:   2003-10-17 12:12:03
From:   anonymous2
To begin with, the author apparently doesn't understand the reasons behind three-tiered architecture. It has nothing to do with "code factoring". It has to do with scalability. Not performance. Scalability. They are completely different issues, especially in the context of complex business applications.

In enterprise applications, it's not good enough that your end-to-end performance be good, it's also important that those machines service requests quickly, which is where a two-tiered architecture fails (and which is why people came up with the idea of a three-tiered architecture in the first place). This is the difference between performance and scalability.

For instance, suppose you get a sudden flood of requests coming in. Your webserver might be able to handle the number of incoming connections, but now, since all of the non-DB processing is happening on the same machine, you become CPU-bound, so that, even though the web server can accept more connections, it can't get enough of the CPU to service all of the requests it has.

Now, instead, think of a three-tier system. The web server handles the request, and passes off to another server (load balanced on the back end) to do the business processing (we're assuming here that the business processing is substantial compared to the display processing, which is usually the case in an enterprise context). Now, the web server can continue accepting requests up to its networking capacity, but it can offload its business processing onto *multiple* other machines if needs be, so that each request can happen faster, which is what the end user perceives as "better performance".

*That's* scalability. And that's what you won't achieve with PHP, or any two-tier based language. Now, if you want to talk about adding hooks to do remote procedure calls to PHP, or some other scripting language so you *can* do three-tier architecture, then we can talk about scalability (at least, scalability for an application that has non-trivial business logic). Otherwise, they are, and will continue to be, inadequate for complex business applications, because they aren't scalable. Period.

Jack Lund
STA Group, LLC

Full Threads Oldest First

Showing messages 1 through 6 of 6.

  • Two-tier vs. three-tier
    2003-12-12 09:11:13  anonymous2 [View]

    Once a web server accepts an HTTP connection, that connection cannot be "passed off" to another system: it must handle that connection to completion.

    Suppose an HTTP connection has been established between the client (e.g., browser) and the web server. That connection must remain open until the same web server generates a response. Also, even though information from the initial request may be is passed to another tier, the result of that information must be returned _back_ through the same web server for final processing before being returned to the client. During that time the web server must maintain all data structures necessary for the
    - HTTP connection, and
    - connection to the next tier (e.g., database connection).
    This is where any scalability limitations apply.

    It is better to maintain a redirector _ahead_ of the web servers. The purpose of this component is to redirect the request to the least-loaded web server.
  • Two-tier vs. three-tier
    2003-10-17 12:39:26  anonymous2 [View]

    Physically separating the tiers is more hurtful than helpful. If your web server and application are both sharing the same hardware, and your application needs more power but your web server doesn't, how will it hurt anything to simply run the combination on more servers? It won't. But it will cut down on the overhead of communicating with remote servers, which is very significant.

    The only thing that matters is being able to run the CPU-bound section (or whatever the bottleneck is) on more hardware. The fact that the web server is now getting more power than it needs is not a problem.
    • Two-tier vs. three-tier
      2003-10-17 12:53:13  anonymous2 [View]

      It hurts things in several ways:

      1) It means you now have to load balance across more web servers, which means you need more IP addresses mapped to your DNS entry (assuming you don't have a hardware-based web load balancer, which are expensive), which means changing your DNS entry every time you need more horsepower, which gets to be a pain to maintain.

      2) You now have more webservers to maintain, rather than N webservers (N constant), and many more "generic" application machines. If you've every tried to maintain more than 10 webservers at a time, you'll realize this is a royal pain.

      3) If you have a DMZ (which you probably do if you're in an enterprise situation), you don't really want your machines running your business logic in the DMZ (after all, we all know a DMZ is untrusted, right?). You especially don't want to multiply the number of DMZ machines because you're not getting the performance you need.

      4) The assumption here is that the business logic is fairly non-trivial, i.e., that the hit you take from the network hop is trivial compared to the performance increase you get from being able to amortize the load across N machines.

      5) It's MUCH easier to load balance back-end machines than webservers, simply because the discovery process for the webserver is DNS, whereas you can use something like a location service to find machines in the backend that can service your request. Even better, this means that you can add machines ON THE FLY in the backend without your front-end presence having to change at all. It's truly plug-and-play.

      Folks, this is the reason for three-tiered architecture. Believe it or not, people didn't just make it up to sell more machines. It's a real solution to a real problem - how do you scale an application where the display processing is relatively trivial, but the business logic is where most of the processing occurs.
      • Two-tier vs. three-tier
        2003-10-17 20:09:46  tarrant [View]

        Sorry, but I couldn't imagine anyone using DNS-based load-balancing when you can do it with Squid sitting in front of the webservers, or better yet set up a hardware load balancer like Big/IP. With DNS round-robin, you get the round-robin, but you don't get any kind of protection against machine failure.

        Your assumption on (4) isn't valid either, in my experience. How long does a page normally take to respond, in your experience? If it takes more than 0.5 seconds, then you're going to start getting antsy; more than 2 seconds and you're really going to get agitated. Meanwhile, for that 2 seconds of calculation, you've still got 8-30 seconds of image and css downloading, depending on your modem speed. Now, it's true that most of the CPU time in total is either biz logic or database, but on a 0.5s page you can't afford the latency involved in marshaling a request, sending it over the LAN, unmarshaling the request, marshaling the response, sending that over the LAN, unmarshaling the response, and repeating all of that for each remote method call. It's too much overhead.

        I don't know about you, but I've got 6 years of solid experience specifically architecting, building and maintaining high-traffic e-com sites. Sites with 1K-20K visitors per hour. And my experience is that presentation should be separated logically from business logic, but should not be physically separated. I believe there should be a "3-tier architecture", but tier-1 is a caching/proxy layer, tier-2 is presentation+bizlogic (i.e. the webserver), and tier-3 is the db:

        loadbalancer <--> cache/squid <--> apache/tomcat/whatever <--> database

        It's the architecture that's always worked best for me, and no matter what the starting point is on a new project, we always seem to gravitate to this solution. The cache/squid layer caches stuff like images, but it also buffers connections, allowing the webserver to get freed up faster when some guy's downloading a big page on a 9600bps modem.
      • Two-tier vs. three-tier
        2003-10-17 17:28:59  anonymous2 [View]

        which means changing your DNS entry every time you need more horsepower, which gets to be a pain to maintain.

        ? Yeah, yahoo adds dns entries everytime they add a webserver. This is nonsense.
  • Two-tier vs. three-tier
    2003-10-17 12:18:50  anonymous2 [View]

    A lot of the 'heat' being generated here appears to stem from the absence
    of a good definition for "scalability" in the article. This is a common

    Scalability is an abstact notion. So are terms like "reliability", and the
    word "performance" itself. One person's meat is another's poison. With
    regard to computer system performance, you might desire to optimize such
    metrics as, throughput, response time, or resource utilization; to name a
    few. The goal depends on the context. If you're printing reports at the end
    of the financial year, MAXIMUM throughput is likely to be your goal. In
    contrast, many web applications focus on MINIMIZING user response time.

    Simply put, 'scalability' is a relation among variables (performance
    metrics) that characterizes the rate of diminishing returns as the
    dimensions of the system are increased. This means that scalability can
    actually be expressed in a mathematical form. I've tried to illuminate this
    point elsewhere Specifially, and should be of interest.

    --Neil Gunther