How to Monitor a Server

One of the most important elements of performance tuning is to maintain Performance Monitor logs. If you don't record this information, you will have nothing to work with. You can set these logs to record only every two or five minutes; this is sufficient for most performance tuning work and doesn't require too much storage space.

There are three major areas to watch when you are monitoring a server with live users.

Trends in submitted loads

Observe whether your users are sending more messages now than they did a few months ago, or whether the average message size is increasing. These factors will change the load on the server, and if you are not aware of them, the load can slowly increase until you have a response-time problem. Microsoft Exchange Server provides counters that give an indication of the overall workload.

Service times

If you monitor a server alone, it is nearly impossible to calculate the overall response times that users will experience. However, you can observe the server components of the response time, and get a general idea of whether slow server response times are becoming an issue. Microsoft Exchange Server provides counters that indicate server response times.

Resource use

By observing the utilization of various resources, you can see where the bottleneck in a system is and even get an idea of where the next bottleneck might occur, after relieving the current one.

The following sections describe the counters of most interest to an administrator, performance analyst, or capacity planner when performing system tuning.

Watching for Trends in the Load

These counters don't provide a complete picture of the load on your Microsoft Exchange Server computer, but they will indicate trends over time if you track them.

Object	Counter	Description

MSExchangeIS	User Count	The number of connected client sessions.
	Active User Count	The number of users who have been active in the last 10 minutes.
MSExchangeIS Private	Messages Submitted/min	The rate of messages being submitted to the private information store.
	Message Recipients Delivered/min	The rate of messages being delivered by the private information store. This will be higher than the submission rate because many messages have multiple recipients.
MSExchangeIS Public	Messages Submitted/min	The rate of messages being submitted to the public information store.
	Message Recipients Delivered/min	The rate of messages being delivered by the public information store. This will be higher than the submission rate because many messages have multiple recipients.
MSExchangeMTA	Messages/sec	The rate at which the MTA is processing messages.
	Messages Bytes/sec	The number of bytes in the messages being processed by the MTA. Divide this by the Messages/sec counter, and you can determine the average message size.

Watching Service Times

These counters don't provide a complete picture of the responsiveness of your Microsoft Exchange Server computer, but they will indicate trends over time if you track them.

Object	Counter	Description

MSExchangeIS Private	Send Queue Size	Indicates whether the information store is keeping up with the submitted load. The queue can be non-zero at peak traffic times, but it shouldn't stay there long after the peak has passed.
	Average Time for Delivery	Indicates how long it takes the information store to deliver messages.
MSExchangeIS Public	Send Queue Size	Indicates whether the information store is keeping up with the submitted load. The queue can be non-zero at peak traffic times, but it shouldn't stay there long after the peak has passed.
	Average Time for Delivery	Indicates how long it takes the information store to deliver messages.
MSExchangeMTA	Work Queue Length	Indicates whether the MTA is keeping up with the submitted load. The queue can be non-zero at peak traffic times, but it shouldn't stay there long after the peak has passed.

Processor Utilization

Windows NT provides additional counters that can help you analyze processor usage, but many are more useful to a developer than to an administrator. The following counters are relevant to bottleneck analysis.

Object	Counter

System	% Total Processor Time
Process	% Processor Time

If you observe processor utilization at a fine granularity, for example every one or five seconds, note that the counters fluctuate rapidly and will frequently hit 100 percent for short periods of time. For this reason, monitoring processor usage is more useful when averaged over a longer period of time. If you are monitoring for longer periods of time and you find that the processor usage reaches 100 percent and stays there for minutes or hours, your users are probably becoming impatient with response times. You may want to size your system for around 60 percent or 70 percent processor utilization during peak times, so that there is extra room for unexpected demands and for growth.

When you are running other services in addition to Microsoft Exchange Server on the server computer, it is recommended that you analyze per-process processor usage. This enables you to determine which services are using most of the CPU time, and how to appropriately balance the load.

If the processor is your bottleneck, consider taking the following actions:

Recommendations

Use a faster processor or multiple processors.
Use a larger L2 cache. This can improve processor efficiency.

Disk Counters

There are two sets of disk counters: LogicalDisk and PhysicalDisk. Either are fine to use, but LogicalDisk makes it easier to track drive usage. In either case, you must enable the disk counters. They are turned off by default, due to the small performance hit they create.

To enable Performance Monitor disk counters

At the command prompt, type diskperf -y. The counters take effect after the next reboot.

Following are important disk counters:

Object	Counter

LogicalDisk	Disk Bytes Written/sec
	Disk Bytes Read/sec
	Disk Reads/sec
	Disk Writes/sec
	Avg. Disk Queue Length
	% Disk Time (general indicator only; not a reliable indicator of disk saturation).

Compare the disk operations per second with the specifications for sustained operations provided by your vendor. If your disk operations per second are getting close to the vendor's specifications, you're nearing capacity. Note that the % Disk Time counter is not a fair indication of disk saturation. A disk that is busy 100 percent of the time may actually be capable of doing much more work, due to smart disk controllers and scheduling methods such as elevator algorithms.

Recommendations

If the disk subsystem is your bottleneck, consider taking the following actions:

Use the fastest possible disks for the information store log and database.
Use dedicated disks for the information store log.
Stripe the information store database disks.
Mirror the information store log; use RAID 5 on the information store database.
Use high-performance controllers, and make sure that there are enough of them for your disks.
Use a separate disk for MTA queues, or place these queues in the stripe set with the information store database.
Use a separate disk for Internet Mail Service queues, or place these queues in the stripe set with the information store database.

Information Store Disk Demands

Object	Counter

MSExchangeDB	Buffer Asynchronous Reads/sec
	Buffer Asynchronous Writes/sec
	Buffer Synchronous Reads/sec
	Buffer Synchronous Writes/sec

Check the amount of disk activity generated by the information store. If you add up the read/write counters shown above, you can determine how much of the disk's activity on your information store database drive is due to the information store, and how much is generated by other services that might share the same drive.

Memory

Following are important memory counters:

Object	Counter

Memory	Pages/sec
	Page Faults/sec
	Available Bytes
	Committed Bytes
Process	Page Faults/sec
	Working Set

The Pages/sec counter indicates the rate at which pages are physically read or written on the paging drive. This indicates the contribution that paging makes to the demand for the disk.

The Page Faults/sec counter indicates the rate at which pages are faulted into the working sets of processes. Due to the page pool in the virtual memory system, the number of pages actually being read and written to disk is much less than the number of page faults. The page faults are of interest if you have services in addition to Microsoft Exchange Server running. In such a case, you can examine the rate of page faults on a per-process basis and determine where they are occurring. With this information, you can make application-tuning changes. For example, you might consider (carefully) adjusting the tradeoff between the information store buffer pool and the system memory pool. Also, by checking the per-process working sets, you can see identify the major memory allocations.

Recommendation

If your paging rate is over a small amount, for example 100 pages/sec, consider adding memory.

The Available Bytes counter indicates how much physical memory is available at any given time. The system adjusts working sets of processes to keep this above a certain threshold, generally 4 MB. If this level is approached, you should see higher paging and page fault rates. The Committed Bytes counter indicates the amount of virtual address space that the system has committed to applications. This must be backed by the paging file on the disk, so make sure that there is space in the paging file.

Recommendation

Anticipate memory issues by tracking trends in available and committed memory.

Buffers

Following are important buffer counters.

Object	Counter

MSExchangeDB	% Buffer Cache Hit
	Buffer Asynchronous Reads/sec
	Buffer Asynchronous Writes/sec
	Buffer Synchronous Reads/sec
	Buffer Synchronous Writes/sec

Depending on your usage patterns, you may be able optimize the use of server memory by adjusting the number of information store buffers. Monitor the % Buffer Cache Hit counter. If it is consistently very close to 100 percent, try decreasing the number of buffers. You should also monitor the information store disk activity. If the activity doesn't increase, your setting is correct. However, if you notice that the cache hit rate is less than 95 percent, try increasing the number of buffers. As long as the paging activity does not increase, the setting is correct. Be careful when making these changes! Make small adjustments and monitor the results until you're confident in the changes made.

Recommendations

Consider adjusting your information store buffer usage if the cache hit rate is very high or low.
If you cannot find a happy medium for this setting, you probably need to add more memory.

To change the number of information store buffers

At the command prompt, type PerfWiz -V.

Change the setting number for the information store buffers, but be careful when doing this. Make small adjustments and monitor the results until you are confident with the changes made.

The Network Interface object can be obtained by installing the Windows NT Resource Kit. If your network connection is a point-to-point link, the counters below will show all traffic on the link. If the connection is a LAN, these counters will show the traffic to and from the server being monitored.

Object	Counter

Network Interface	Bytes Received/sec
	Bytes Sent/sec
	Packets Received/sec
	Packets Sent/sec

If you know the capacity ratings of your network and network interface card (NIC), you can compare these ratings to the values for the counters shown above and determine how close to capacity you are operating. For a more complete overview of network traffic, you can also use the Network Monitor tool available with Microsoft Systems Management Server (SMS). Note that Network Monitor is a stand-alone tool. You do not need to have other SMS components installed to run it.

If you find that the server is operating at or near network capacity, you can upgrade the network speed, for example, by moving from a 10-MB Ethernet to a 100-MB Ethernet, or moving from a 64-KB line to a T1 line. You may also want to consider using multiple Ethernet connections for the server, or multiple 64-KB lines, rather than one faster connection or line.

Recommendation

Consider faster links, multiple interfaces, and multiple links.

System Bus

If you suspect that bus saturation is an issue that affects your server, you can monitor its activity on Pentium and Pentium Pro servers. Use the p5ctrs from the Windows NT Resource Kit. To view Pentium counters, you must run Pperf and then change the configuration.

Following are important bus counters:

Object	Counter

Pentium	Bus Utilization (clks)/sec
	% Code Cache Misses
	% Data Cache Misses

If your system bus is near saturation, you have two options:

If there are many cache misses, you can use larger CPU caches; or
You can split the load onto multiple servers. Start by offloading any non-Microsoft Exchange Server services, if they are being used.

Recommendations

Use larger L2 caches.
Consider multiple servers.