Over the summer, I came across a capacity calculation in the manual for a Cisco Voice over IP phone detailing the number of simultaneous calls that can be supported on an access point. Intrigued, I extended the analysis. Voice and data on wireless LANs require opposing preconditions for good performance. High-quality voice requires that frames containing voice data can be transmitted very quickly after arrival, and they need to be transmitted on a very regular schedule with tight timing requirements. Good data throughput comes from stuffing the transmission queue as full as possible. Individual frames might suffer long delays, but the overall capacity is high. Voice quality is often very sensitive to network load.
This article develops a simple model to determine the maximum theoretical capacity of an access point to carry voice calls. (Unlike the previously cited Cisco calculation, it does not attempt to model the effects of medium contention.) Most 802.11-based phones use 802.11b, which has a maximum capacity of 23 telephone calls. 802.11a and 802.11g have somewhat higher capacities, even at comparable data rates, due to a more efficient physical layer design. As with data, 802.11g must reduce its effective data rates in the presence of older devices. Protection overhead reduces the voice capacity by roughly a quarter.
A codec is the component of any voice system that translates between analog speech and the bits used to transmit them. Every codec transmits a burst of data in a packet that can be reconstructed into voice. Each burst in a codec covers a short period of time, typically 20 milliseconds, though some codecs have adaptive periods. Within the codec, there are many different ways of encoding the analog data. Even though two codecs may have the same payload size, they may have different performance characteristics due to their coding methods. Several different codecs are commonly used, and many of them have found application in VoIP systems.
Common codecs:
To hear the difference between codecs, this Analog Zone article has example captures for each of the common codecs.
To move a phone call across an IP network, the codec data is encapsulated within the Real-time Transport Protocol (RTP). RTP is a UDP-based protocol, which leads to a three-level encapsulation of the codec data. Each layer adds its own header information. The data is carried within an RTP packet (with a 12-byte header), which is carried in UDP (with an 8-byte header), which is carried in IP (with a 20-byte header). The overhead for each level of encapsulation can require significant additional capacity. With a codec operating every 20ms, there will be 50 packet headers per second. Each header is 32 bytes, so the RTP encapsulation adds 16kbps. Table 1 shows the characteristics of each of the common codecs. Codecs are typically described for a one-way stream. For a telephone call, two voice streams are needed, and the total data rate would need to be doubled.
| Codec | Period (ms) |
Payload size (bytes) |
Packet size (bytes) |
Payload data size (bytes)* |
Total data rate (kbps)* |
|---|---|---|---|---|---|
| G.711 | 20 | 160 | 200 | 64 | 80 |
| G.729 | 20 | 20 | 60 | 8 | 24 |
| G.723.1 | 30 | 24 | 64 | 6.4 | 17 |
| GSM FR | 20 | 33 | 73 | 13.2 | 29.2 |
| GSM EFR | 20 | 31 | 71 | 12.4 | 28.4 |
| iLBC 20 ms | 20 | 38 | 78 | 15.2 | 31.2 |
| iLBC 30 ms | 30 | 50 | 90 | 13.3 | 24 |
*In telephony, a kilobit per second is 1,000 bits per second, not the 1,024 bits per second that would be more common in computing. Thus, a 64kbps codec uses 64,000 bits per second, not 65,536 bits per second.
|
The simplest analysis to do is a calculation of the absolute maximum capacity of a network. Networks have a variety of effects that are hard to model. In 802.11, one of the hardest network effects to model is the loss of capacity due to contention for the network medium. A maximum-capacity analysis assumes that there is a medium coordinator with God-like powers to eliminate contention. As soon as one network transmission finishes, there is another one ready to start without delay. There is never contention for the medium, and no management traffic gets in the way of moving data. Although the resulting model is simplistic, it offers valuable insight into the maximum capacity of different types of access points. With time, the simple maximum capacity analysis is likely to better reflect real-world capabilities as improved wireless-LAN quality-of-service standards are developed. Cisco's calculation assumes that each frame must wait half the contention window before transmission, but many VoIP systems use various tricks to shorten that delay. As improved QoS reduces medium contention, the real-world performance will begin to more closely approach the theoretical maximum.
802.11 data transmission consists of a data frame plus an acknowledgement. Each IP packet must be put into an 802.11 data frame, and that data frame must be positively acknowledged. The simple exchange of a data frame consists of the following components:
Each frame must receive a positive acknowledgement. 802.11 is adding features for block acknowledgements, but the delay-sensitive nature of VoIP may limit their use. The acknowledgement consists of the following components:
Take the case of a G.711 codec operating on an 802.11b network at 11Mbps. Breaking down each component of the encapsulation gives the results in Table 2.
| Component | Time (ms) |
Bytes |
|---|---|---|
| Data frame | ||
| DIFS | 50 | |
| PLCP preamble (short) | 72 | |
| PLCP header (short) | 24 | |
| MAC frame | ||
| Header | 24 | |
| SNAP | 6 | |
| IP packet | 200 | |
| FCS | 4 | |
| Security (WEP) | 8 | |
| Total size | 242 | |
| MAC time at 11 Mbps | 176 | |
| TOTAL: Data frame | 322 | |
| ACK | ||
| SIFS | 10 | |
| PLCP preamble (short) | 72 | |
| PLCP header (short) | 24 | |
| MAC frame | 14 | |
| MAC time at 11 Mbps | 11 | |
| TOTAL: ACK | 117 | |
| TOTAL for sequence | 439 | |
Each data frame transmitted by the codec requires 439 microseconds. Therefore, 2,277 codec packets can be transmitted per second. G.711 operates at a 20-millisecond period, and uses 50 frames per second for each voice stream. A phone call is bidirectional, and will therefore require 100 frames per second. Therefore, the maximum theoretical capacity of an 802.11b AP is 22 WEP-encrypted telephone calls.
|
Table 3 shows the calculated number of telephone calls with no security overhead. Including WEP encapsulation overhead drops most of the numbers by one. Using CCMP or TKIP instead drops many by two. Table 3 assumes the use of the short PLCP headers in 802.11b, which are now commonplace. Figure 1 shows the same results graphically.
| Codec | Transmission rate | |||
|---|---|---|---|---|
| 11 Mbps |
5.5 Mbps |
2 Mbps |
1 Mbps |
|
| G.711 | 23 | 16 | 8 | 4 |
| G.729 | 30 | 24 | 14 | 8 |
| GSM-FR | 29 | 23 | 13 | 8 |
| GSM-EFR | 29 | 23 | 13 | 8 |
| G.723.1v | 44 | 36 | 21 | 13 |
| iLBC 20 ms (Skype) | 28 | 22 | 13 | 7 |
| iLBC 30 ms (Skype) | 42 | 33 | 18 | 11 |

Figure 1. Maximum theoretical number of voice calls per 802.11b AP (longer bar is better)
Most 802.11 telephones on the market today are based on 802.11b, but that will not be the case forever. 802.11b has low power consumption, but limited capacity. Many of the newer dual-mode (802.11/cellular) telephones use 802.11a. If the same maximum capacity analysis is repeated for 802.11a or 802.11g (without protection), it leads to the much higher capacities shown in Table 4.
| Codec | Transmission rate | |||
|---|---|---|---|---|
| 54 Mbps |
36 Mbps |
18 Mbps |
6 Mbps |
|
| G.711 | 78 | 69 | 51 | 24 |
| G.729 | 92 | 86 | 73 | 45 |
| GSM-FR | 92 | 86 | 71 | 42 |
| GSM-EFR | 92 | 86 | 71 | 43 |
| G.723.1 | 138 | 129 | 110 | 66 |
| iLBC 20 ms (Skype) | 89 | 83 | 69 | 40 |
| iLBC 30 ms (Skype) | 133 | 124 | 101 | 57 |
However, 802.11g protection imposes a significant limit on throughput. Protection is triggered by any 802.11b frame in the area, so it is a practical reality of 802.11g that it almost always operates in protected mode. Using the most common and lowest-overhead form of protection, CTS-to-self, the results are quite striking. Depending on the codec and data transmission rate, about a quarter of the voice capacity is lost. The loss is worse at higher data rates, and the percentage loss is slightly higher with the more efficient codecs. Table 5 shows the number of telephone calls lost to protection overhead.
| Codec | Transmission rate | |||
|---|---|---|---|---|
| 54 Mbps |
36 Mbps |
18 Mbps |
6 Mbps |
|
| G.711 | 22 | 20 | 18 | 14 |
| G.729 | 29 | 29 | 26 | 24 |
| GSM-FR | 29 | 27 | 26 | 23 |
| GSM-EFR | 29 | 27 | 26 | 23 |
| G.723.1 | 43 | 40 | 39 | 35 |
| iLBC 20 ms (Skype) | 27 | 27 | 24 | 21 |
| iLBC 30 ms (Skype) | 22 | 20 | 18 | 14 |
Figure 2 shows the loss due to protection as a fraction of the capacity for representative data rates. High-efficiency codecs suffer greater losses because they already devote a large part of the network time to traffic service. G.711 does not have as big a proportional loss because it is inefficient. As a rule of thumb, expect a loss of between one quarter and one third of the network capacity when protection is lost. Alternatively, that is likely to be the difference between using 802.11a and 802.11g.

Figure 2. Percentage loss of capacity due to 802.11g protection (shorter is less loss)
As a final comparison, the three 802.11 physical layers can be compared. Figure 3 compares the capacity of 802.11b, 802.11g with protection, and 802.11a for roughly comparable data rates. Even though the data rate on 802.11a and 802.11g is only slightly higher, the improved physical layer efficiency gives it much greater call-handling capacity.

Figure 3. Comparison capacity of 802.11a, b, and g (longer is higher)
Enabling encryption protocols has only minor effects on the call capacity. In 802.11b, enabling WEP reduces the capacity by a call for most data rates and codecs, with TKIP or CCMP reducing the capacity by one or two calls. In 802.11g, the capacity loss is numerically higher (one or two calls for WEP, three or four for TKIP/CCMP), but roughly equivalent on a percentage basis. As a rule of thumb, expect WEP to cost 1 to 3 percent of the call capacity, and TKIP or CCMP to cost 3 to 5 percent of the call capacity.
Matthew Gast is the author of 802.11 Wireless Networks: The Definitive Guide, Network Printing, and T1: A Survival Guide.
Return to O'Reilly Emerging Telephony
Copyright © 2007 O'Reilly Media, Inc.