I respond to a lot of questions about Windows 2000 disk performance. In this article I've provided answers to the most frequently asked questions I've received from experienced computer performance professionals. What I've found is when these professionals first start to look seriously at the data available on disk performance on a Windows NT/2000/XP machine, they usually ask one or more of the six questions raised here. These questions arise because something doesn't seem right when they look closely at the data. I have tried to answer the questions succinctly and in such a way that a person who already knows their way around disk performance issues can make immediate sense of the Windows 2000 environment.
1. The Physical Disk
% Disk Time counters look wrong. What gives?
Often when you add the
% Disk Read Time and
% Disk Write Time counters together, they do not add up to
% Disk Time. The
% Disk Time counters are capped in the System Monitor at 100 percent because it would be confusing to report disk utilization greater than 100 percent. This occurs because the
% Disk Time counters do not actually measure disk utilization. The Explain text that implies that they do represent disk utilization is very misleading.
% Disk Time counters actually do measure is a little complicated to explain.
%Disk Time counter is not measured directly. It is a value derived by the
diskperf filter driver that provides disk performance statistics.
diskperf is a layer of software sitting
in the disk driver stack. As I/O Request packets (IRPs) pass through this layer,
diskperf keeps track of the time I/O's start and the time they finish. On the way to the device,
diskperf records a timestamp for the IRP. On the way back from the device, the
completion time is recorded. The difference is the duration of the I/O request. Averaged over the collection interval, this becomes the
Avg. Disk sec/Transfer, a direct measure of disk response time from the point of view of the device driver.
diskperf also maintains byte
counts and separate counters for reads and writes, at both the Logical and Physical Disk level. (This allows
Avg. Disk sec/Transfer to be broken out into reads and writes.)
Avg. Disk sec/Transfer measurement reported is based on the complete roundtrip time of a request. Strictly speaking, it is a direct measure of disk response time-–which means it includes queue time. Queue time is the time spent waiting for the device because it is busy with
another request or waiting for the SCSI bus to the device because it is busy.
% Disk Time is a value derived by
diskperf from the sum of all IRP roundtrip times (
Avg.Disk sec/Transfer) multiplied by
Disk Transfers/sec, and divided by duration, or essentially:
% Disk Time = Avg Disk sec/Transfer * Disk Transfers/sec
which is a calculation (subject to capping when it exceeds 100 percent) that you can verify easily enough for yourself.
Avg. Disk sec/Transfer that
diskperf measures includes disk queuing,
% Disk Time can grow greater than 100 percent if there is significant disk queuing (at either the Physical or Logical Disk level). The Explain text in the official documentation suggests that this
Avg. Disk sec/Transfer and
Disk Transfers/sec measures % Disk busy. If (and this a big "if") the IRP roundtrip time represented only service time, then the
% Disk Time calculation would correspond to disk utilization. But
Avg. Disk sec/Transfer includes queue time, so the formula used actually calculates something entirely different.
The formula used in the calculation to derive
% Disk Time corresponds to Little's Law, a well-known equivalence relation that shows the number of requests in the system as a function of the arrival rate and service time. According to Little's Law,
Avg. Disk sec/Transfer times
Disk transfers/sec properly yields the average number of requests in the system, more formally known as the average Queue length. The average Queue length value calculated in this fashion
includes both IRPs queued for service and those actually in service.
A direct measure of disk response time such as
Avg. Disk sec/Transfer is a useful metric. Since people tend to buy disk hardware based on a service-time expectation, it is unfortunate that there is no way to break out the disk service time and the queue time separately in NT 4.0. (The situation is greatly improved in Windows 2000, however.) Given the way
diskperf hooks into the I/O driver stack, the software RAID functions associated with
Ftdisk, and the SCSI disks that support command tag queuing, one could argue this is the only feasible way to do things in the Windows 2000 architecture. The problem of interpretation arises because of the misleading Explain text and the arbitrary and surprising use of capping.
Microsoft's fix to the problem beginning in NT 4.0 is a different version of the counter that is not capped. This is
Avg. Disk Queue Length. Basically, this is the same field as
% Disk Time without capping and without being printed as a percentage.
For example, if
% Disk Time is 78.3 percent,
Avg. Disk Queue Length is 0.783. When
% Disk Time is equal to 100 percent, then
Avg. Disk Queue Length shows the actual value before capping.
We recently had a customer reporting values like 2.63 in this field. That's a busy disk! The interpretation of this counter is the average number of disk requests that are active and queued-–the average queue length.
2. I see a value of 2.63 in the
Ave Disk Queue Length counter field. How should I interpret this value?
Avg. Disk Queue Length counter is derived from the product of
Avg. Disk sec/Transfer multiplied by
Disk Transfers/sec, which is the average response of the device times the I/O rate. Again, this corresponds to a well-known theorem of Queuing Theory called Little's Law, which states:
N = A * Sr
where N is the number of outstanding requests in the system, A is the arrival rate of requests, and Sr is the response time. So the
Avg. Disk Queue Length counter is an estimate of the
number of outstanding requests to the (Logical or Physical) disk. This includes any requests that are currently in service at the device, plus any requests that are waiting for service. If requests are currently waiting for the device inside the SCSI device driver layer of software
diskperf filter driver, the
Current Disk Queue Length counter will have a
value greater than 0. If requests are queued in the hardware, which is usual for SCSI disks and RAID controllers, the
Current Disk Queue Length counter will show a value of 0, even though requests are queued.
Avg. Disk Queue Length counter value is a derived value and not a direct measurement, you do need to be careful how you interpret it. Little's Law is a very general result that is often used in the field of computer measurement to derive a third result when the other two values are measured directly. However, Little's Law does require an equilibrium
assumption in order for it be valid. The equilibrium assumption is that the arrival rate equals the completion rate over the measurement interval. Otherwise, the calculation is meaningless. In practice, this means you should ignore the
Ave Disk Queue Length counter value for any
interval where the
Current Disk Queue Length counter is not equal to the value of
Current Disk Queue Length for the previous measurement interval.
|Do you have other Windows 2000 disk performance suggestions?|
Suppose, for example, the
Avg. Disk Queue Length counter reads 10.3, and the
Current Disk Queue Length counter shows four requests in the disk queue at the end of the measurement interval. If the previous value of
Current Disk Queue Length was 0, the equilibrium assumption necessary for Little's Law does not hold. Since the number of arrivals is
evidently greater than the number of completions during the interval, there is no valid interpretation for the value in the
Avg. Disk Queue Length counter, and you should ignore the
counter value. However, if both the present measurement of the
Current Disk Queue Length
counter and the previous value are equal, then it is safe to interpret the
Avg. Disk Queue
Length counter as the average number of outstanding I/O requests to the disk over the
interval, including both requests currently in service and requests queued for service.
You also need to understand the ramifications of having a total disk roundtrip time measurement instead of a simple disk service time measure. Assuming M/M/1, a disk at 50 percent busy has one request waiting on average and disk response time is 2 times service time. This means that at 50 percent busy--assuming M/M/1 holds--an
Avg. Disk Queue Length value of 1.00 is
expected. That means that any disk with an
Avg. Disk Queue Length value greater than 0.70 probably has a substantial amount of queue time associated with it. The exception, of course, is
when M/M/1 does not hold, such as during a back-up operation when there is only a single user of the disk. A single user of the disk can drive a disk to nearly 100 percent utilization without a queue!
3. How was the problem with the
% Disk Time counter fixed in Windows 2000?
It may not be fixed exactly, but ultimately, this problem is addressed quite nicely in Windows 2000 (although it would arguably be better had the older, now obsolete
% Disk Time counters not been retained).
Windows 2000 adds a new counter to the Logical and Physical Disk objects called
% Idle Time. Disk idle time accumulates in
diskperf when there are no outstanding requests for a volume.
Having a measure of disk idle time permits you to calculate
% Disk Busy equals 100 minus
% Idle Time, which is a valid measure of disk utilization.
Then you can calculate Disk Service Time equals % Disk Busy divided by
Disk transfers/sec. This is an application of the Utilization Law, namely:
u = service time * arrival rate
Finally, calculate Disk Queue Time equals
Avg. Disk secs/transfer minus Disk Service Time, which
follows from the definition of response time equals service time plus queue time.
So, measuring Logical and Physical Disk
% Idle Time solves a lot of problems. It allows us to calculate disk utilization and derive both disk-service time and queue-time measurements for disks in Windows 2000.
4. Why are the Logical Disk counters zero?
The answer is because you never issued the
diskperf –yv command to enable the Logical Disk measurements. When
diskperf is not active, the corresponding counters in the System Monitor are zero. In Windows 2000, only the Physical Disk counters are enabled by default (this is equivalent to issuing the
diskperf –yd command).
In Windows NT, neither Logical or Physical Disk counters are enabled by default. To enable both sets of Disk counters, issue the
diskperf –y command in NT 4.0. You must reboot
the system in both Windows 2000 and NT 4.0 in order to activate the new
5. In Windows NT 4.0, when is it appropriate to issue the
diskperf –ye command?
Almost never. I recommend that you use the
diskperf –ye option only if you are using the software RAID functions (these include creating extendable volume sets and establishing disk striping, disk mirroring, and RAID 5 Logical volumes) in the Disk Administrator. Setting
diskperf –ye allows you to collect accurate Physical Disk statistics when you are using software RAID functions in NT 4.
diskperf –ye command loads the
diskperf.sys filter driver beneath the optional fault tolerant
ftdisk.sys disk driver that provides software RAID functions in Windows NT 4.0. When striped, mirrored, or RAID 5 Logical Disks are defined using Disk
Administrator functions, the
ftdisk.sys module that is responsible for remapping Logical Disk I/O requests to the appropriate Physical Disk is loaded in the I/O driver stack below the NTFS file system driver and before the SCSI Physical Disk driver. When the normal
command is issued,
diskperf.sys is loaded in front of
ftdisk.sys. This allows
diskperf to capture information about Logical Disk requests accurately. But because
Logical Disk requests are transformed by the
ftdisk.sys layer immediately below it, the
Physical Disk statistics reported are inaccurate. To see accurate Physical Disk statistics, issue
diskperf –ye command to load
Creating extendable volume sets is by far the most common use of the software RAID functions
in the NT 4.0 Disk Administrator. You may prefer loading
ftdisk.sys (using the normal
diskperf –y command) to obtain accurate Logical Disk statistics for a volume set.
This problem is addressed in Windows 2000 by allowing
diskperf to be loaded twice, once above
ftdisk.sys to collect Logical Disk statistics and once below it to collect Physical Disk stats. In Windows 2000,
diskperf is loaded below
ftdisk.sys by default. To load it a second time, issue the
diskperf –yv command to activate the Logical Disk measurements.
6. I am concerned about the overhead of the
diskperf measurements. What does this feature cost?
Not much. I strongly recommend that you enable all disk performance data collection on any system where you care about performance.
Even if you don't care that much about performance, you should turn on Logical Disk reporting at a minimum. The Logical Disk Object contains two counters,
Free Megabytes and
Space, which will alert you in advance to potential out-of-disk space conditions.
diskperf measurement layer does add some code to the I/O Manager stack, so there
is added latency associated with each I/O request that accesses a Physical Disk when measurement is turned on. However, the overhead of running the
layer, even twice, on Windows 2000 machines, is trivial. In a benchmark environment where a
550MHz, four-way Windows 2000 Server was handling 40,000 I/Os per second, enabling the
diskperf measurements reduced its I/O capacity by about 5 percent to 38,000 I/Os per second.
In that environment, we estimated that the
diskperf measurement layer added about 3 to 4
microseconds to the I/O Manager path length for each I/O operation. (On a faster processor,
the delay is proportionally less.) For a disk I/O request that you would normally expect to
require a minimum of 3 to 5 milliseconds, this additional latency is hardly noticeable.
Besides, if you do not have disk-performance statistics enabled and a performance problem
occurs that happens to be disk-related (and many are), you won't be able to gather data about
the problem because loading the
diskperf measurement layer requires a reboot.
In my view, you can only justify turning off the disk performance stats in a benchmark
environment where you are attempting to wring out the absolute highest performance level from
your hardware configuration. Of course, you will need to have the
enabled initially to determine how to optimize the configuration in the first place. It is standard
practice to disable disk performance monitoring prior to making your final measurement runs.
O'Reilly & Associates recently released (January 2002) Windows 2000 Performance Guide.
Sample Chapter 5, Multiprocessing, is available free online.
For more information, or to order the book, click here.
Copyright © 2009 O'Reilly Media, Inc.