How to Monitor a Server

One of the most important elements of performance tuning is to maintain Performance Monitor logs. If you don't record this information, you will have nothing to work with. You can set these logs to record only every two or five minutes; this is sufficient for most performance tuning work and doesn't require too much storage space.

There are three major areas to watch when you are monitoring a server with live users.

Trends in submitted loads

Observe whether your users are sending more messages now than they did a few months ago, or whether the average message size is increasing. These factors will change the load on the server, and if you are not aware of them, the load can slowly increase until you have a response-time problem. Microsoft Exchange Server provides counters that give an indication of the overall workload.

Service times

If you monitor a server alone, it is nearly impossible to calculate the overall response times that users will experience. However, you can observe the server components of the response time, and get a general idea of whether slow server response times are becoming an issue. Microsoft Exchange Server provides counters that indicate server response times.

Resource use

By observing the utilization of various resources, you can see where the bottleneck in a system is and even get an idea of where the next bottleneck might occur, after relieving the current one.

The following sections describe the counters of most interest to an administrator, performance analyst, or capacity planner when performing system tuning.

Watching for Trends in the Load

These counters don't provide a complete picture of the load on your Microsoft Exchange Server computer, but they will indicate trends over time if you track them.

Object

Counter

Description

MSExchangeIS

User Count

The number of connected client sessions.

Active User Count

The number of users who have been active in the last 10 minutes.

MSExchangeIS Private

Messages Submitted/min

The rate of messages being submitted to the private information store.

Message Recipients Delivered/min

The rate of messages being delivered by the private information store. This will be higher than the submission rate because many messages have multiple recipients.

MSExchangeIS Public

Messages Submitted/min

The rate of messages being submitted to the public information store.

Message Recipients Delivered/min

The rate of messages being delivered by the public information store. This will be higher than the submission rate because many messages have multiple recipients.

MSExchangeMTA

Messages/sec

The rate at which the MTA is processing messages.

Messages Bytes/sec

The number of bytes in the messages being processed by the MTA. Divide this by the Messages/sec counter, and you can determine the average message size.


Watching Service Times

These counters don't provide a complete picture of the responsiveness of your Microsoft Exchange Server computer, but they will indicate trends over time if you track them.

Object

Counter

Description

MSExchangeIS Private

Send Queue Size

Indicates whether the information store is keeping up with the submitted load. The queue can be non-zero at peak traffic times, but it shouldn't stay there long after the peak has passed.

Average Time for Delivery

Indicates how long it takes the information store to deliver messages.

MSExchangeIS Public

Send Queue Size

Indicates whether the information store is keeping up with the submitted load. The queue can be non-zero at peak traffic times, but it shouldn't stay there long after the peak has passed.

Average Time for Delivery

Indicates how long it takes the information store to deliver messages.

MSExchangeMTA

Work Queue Length

Indicates whether the MTA is keeping up with the submitted load. The queue can be non-zero at peak traffic times, but it shouldn't stay there long after the peak has passed.


Processor Utilization

Windows NT provides additional counters that can help you analyze processor usage, but many are more useful to a developer than to an administrator. The following counters are relevant to bottleneck analysis.

Object

Counter

System

% Total Processor Time

Process

% Processor Time


If you observe processor utilization at a fine granularity, for example every one or five seconds, note that the counters fluctuate rapidly and will frequently hit 100 percent for short periods of time. For this reason, monitoring processor usage is more useful when averaged over a longer period of time. If you are monitoring for longer periods of time and you find that the processor usage reaches 100 percent and stays there for minutes or hours, your users are probably becoming impatient with response times. You may want to size your system for around 60 percent or 70 percent processor utilization during peak times, so that there is extra room for unexpected demands and for growth.

When you are running other services in addition to Microsoft Exchange Server on the server computer, it is recommended that you analyze per-process processor usage. This enables you to determine which services are using most of the CPU time, and how to appropriately balance the load.

If the processor is your bottleneck, consider taking the following actions:

Recommendations

Disk Counters

There are two sets of disk counters: LogicalDisk and PhysicalDisk. Either are fine to use, but LogicalDisk makes it easier to track drive usage. In either case, you must enable the disk counters. They are turned off by default, due to the small performance hit they create.

To enable Performance Monitor disk counters

Following are important disk counters:

Object

Counter

LogicalDisk

Disk Bytes Written/sec

Disk Bytes Read/sec

Disk Reads/sec

Disk Writes/sec

Avg. Disk Queue Length

% Disk Time (general indicator only; not a reliable indicator of disk saturation).


Compare the disk operations per second with the specifications for sustained operations provided by your vendor. If your disk operations per second are getting close to the vendor's specifications, you're nearing capacity. Note that the % Disk Time counter is not a fair indication of disk saturation. A disk that is busy 100 percent of the time may actually be capable of doing much more work, due to smart disk controllers and scheduling methods such as elevator algorithms.

Recommendations

If the disk subsystem is your bottleneck, consider taking the following actions:

Information Store Disk Demands

Object

Counter

MSExchangeDB

Buffer Asynchronous Reads/sec

Buffer Asynchronous Writes/sec

Buffer Synchronous Reads/sec

Buffer Synchronous Writes/sec


Check the amount of disk activity generated by the information store. If you add up the read/write counters shown above, you can determine how much of the disk's activity on your information store database drive is due to the information store, and how much is generated by other services that might share the same drive.

Memory

Following are important memory counters:

Object

Counter

Memory

Pages/sec

Page Faults/sec

Available Bytes

Committed Bytes

Process

Page Faults/sec

Working Set


The Pages/sec counter indicates the rate at which pages are physically read or written on the paging drive. This indicates the contribution that paging makes to the demand for the disk.

The Page Faults/sec counter indicates the rate at which pages are faulted into the working sets of processes. Due to the page pool in the virtual memory system, the number of pages actually being read and written to disk is much less than the number of page faults. The page faults are of interest if you have services in addition to Microsoft Exchange Server running. In such a case, you can examine the rate of page faults on a per-process basis and determine where they are occurring. With this information, you can make application-tuning changes. For example, you might consider (carefully) adjusting the tradeoff between the information store buffer pool and the system memory pool. Also, by checking the per-process working sets, you can see identify the major memory allocations.

Recommendation

The Available Bytes counter indicates how much physical memory is available at any given time. The system adjusts working sets of processes to keep this above a certain threshold, generally 4 MB. If this level is approached, you should see higher paging and page fault rates. The Committed Bytes counter indicates the amount of virtual address space that the system has committed to applications. This must be backed by the paging file on the disk, so make sure that there is space in the paging file.

Recommendation

Buffers

Following are important buffer counters.

Object

Counter

MSExchangeDB

% Buffer Cache Hit

Buffer Asynchronous Reads/sec

Buffer Asynchronous Writes/sec

Buffer Synchronous Reads/sec

Buffer Synchronous Writes/sec


Depending on your usage patterns, you may be able optimize the use of server memory by adjusting the number of information store buffers. Monitor the % Buffer Cache Hit counter. If it is consistently very close to 100 percent, try decreasing the number of buffers. You should also monitor the information store disk activity. If the activity doesn't increase, your setting is correct. However, if you notice that the cache hit rate is less than 95 percent, try increasing the number of buffers. As long as the paging activity does not increase, the setting is correct. Be careful when making these changes! Make small adjustments and monitor the results until you're confident in the changes made.

Recommendations

To change the number of information store buffers

At the command prompt, type PerfWiz -V.

Change the setting number for the information store buffers, but be careful when doing this. Make small adjustments and monitor the results until you are confident with the changes made.

The Network Interface object can be obtained by installing the Windows NT Resource Kit. If your network connection is a point-to-point link, the counters below will show all traffic on the link. If the connection is a LAN, these counters will show the traffic to and from the server being monitored.

Object

Counter

Network Interface

Bytes Received/sec

Bytes Sent/sec

Packets Received/sec

Packets Sent/sec


If you know the capacity ratings of your network and network interface card (NIC), you can compare these ratings to the values for the counters shown above and determine how close to capacity you are operating. For a more complete overview of network traffic, you can also use the Network Monitor tool available with Microsoft Systems Management Server (SMS). Note that Network Monitor is a stand-alone tool. You do not need to have other SMS components installed to run it.

If you find that the server is operating at or near network capacity, you can upgrade the network speed, for example, by moving from a 10-MB Ethernet to a 100-MB Ethernet, or moving from a 64-KB line to a T1 line. You may also want to consider using multiple Ethernet connections for the server, or multiple 64-KB lines, rather than one faster connection or line.

Recommendation

System Bus

If you suspect that bus saturation is an issue that affects your server, you can monitor its activity on Pentium and Pentium Pro servers. Use the p5ctrs from the Windows NT Resource Kit. To view Pentium counters, you must run Pperf and then change the configuration.

Following are important bus counters:

Object

Counter

Pentium

Bus Utilization (clks)/sec

% Code Cache Misses

% Data Cache Misses


If your system bus is near saturation, you have two options:

Recommendations