Thursday, July 30, 2009

Performance Monitoring

Performance Monitoring on Windows 2003 serverIntroduction
Server performance is an important issue in a mission-critical business environment. Poor performance can have a huge negative impact on the ability of workers to do their jobs, and thus on productivity and the company’s bottom line. Monitoring and optimizing performance of network servers is one of the administrator’s most important tasks, and it is important to continually collect and analyze performance data to ensure that any problems can be taken care of before they impact end users. Security events are another important area that the administrator must stay on top of, to protect the integrity of the organization’s network and data.
Windows Server 2003 provides administrators with built in tools for monitoring performance issues and detecting security breaches (or attempted breaches). These include both simple monitoring tools such as Task Manager, powerful monitoring tools such as the System Monitor, and a set of useful command line utilities. For auditing security events, the security log provides vital information for tracking successful and failed breaches of security.
Using the Performance Utility to Monitor Performance

Let’s try to learn about the utilities that monitor performance. The main utilities are the System Monitor and the Performance logs. These tools provide us a graphical user interface to analyze performance data. We will also investigate the command line tools available in Windows 2003 Server. Let’s start with the System Monitor
Using the System Monitor
The System monitor is the primary tool for monitoring system performance. In Windows NT, it was called the Performance Monitor; in Windows 2000, Microsoft changed the name to System Monitor, within the Performance MMC.
In keeping with its old name, The System Monitor interface can be invoked by clicking Start Run and typing perfmon, or by clicking Start Administrative Tools Performance and selecting System Monitor . The System Monitor runs as an ActiveX control inside the Performance Monitor console. Because the System Monitor is built as an ActiveX control, you can embed the System Monitor into a web page or a web form application. You can also monitor remote computer activity from your local System Monitor console. A screen shot of the System Monitor is displayed in Figure 1.
Figure 1 System Monitor.

The System Monitor can be displayed in 3 formats. Figure 1 shows the System Monitor as a graph. We can also display the System Monitor as a Histogram or as a text report. You can alter these views by clicking on one of the three buttons in the button bar directly above the graph.(The first button is the fifth from the left of the button bar and next to the database sign) If you hover your cursor over these buttons, you will that they are labeled View Graph, View Histogram and View Report.
There are three performance counters that are activated and monitored by default. These are displayed in Figure 9.2, and include the following:
· Memory object: Pages/sec counter
· Physical disk object: Average disk queue length counter
· Processor object: % processor time counter
You can right click on any performance counter in the lower pane and select Save As to save the log information as an HTML file (.htm) or a tab delimited file (.tsv) .
Adding Performance Counters
You can add performance counters by doing one of the following:
· Right click the counter pane and select Add counters
· Select the Data tab from Properties dialog box of the System Monitor, as shown in Figure 2. To open the Properties dialog box, right click within the graph area or on an item in the counter pane, and click Properties, or click CTRL+L.
· Click the Add button on the button bar, which appears as a plus sign (+).
· Click CTRL+I.
Figure 2 Properties for System Monitor

You will see the existing counters in the Counters space. When you click the Add button or click CTRL + I, you should see the Add Counters dialog box as shown in Figure 3.
Figure 3 Add Counter screen
In the Add Counters dialog box, first select the machine you wish to monitor. You can monitor counters on the local computer by selecting Use local computer counters, or you can monitor counters on a remote machine by selecting Select counters from computer: and typing the UNC path to the remote system or choosing it from the dropdown box if you’ve monitored it from this computer previously.
Next, select the performance object. A performance object is a specialized object that has performance counter information on a particular application, service or hardware device. (e.g., SQL Server has specialized performance objects that will enable System monitor to monitor their activity. There are a large number of objects from which to choose. Some of the most commonly monitored objects include:

· Processor
· Memory
· Logical Disk
· Physical Disk
· DNS
· DHCP Server
· Network interface
· Web service
Note
Some applications and services add performance objects and counters to the System Monitor when you install them. Thus, you might not see all of the listed objects/counters if you don’t have the related applications or services installed on the computer you’re monitoring. For example, if you don’t have SQL Server installed, you will not see the SQLServer:Databases object.
Finally, select the counters you are interested in that pertain to your selected object, or select All Counters to track all counters that pertain to that object. (The counters are different from one performance object to another, and some objects have a large number of counters).
Next, select the instance to which the counters apply if there is more than one instance of the object on the machine. For example, if you have dual processors installed, there will be two instances for the Processor object. If you have two logical disks (C: and D:), both of these will show up as separate instances and can be monitored individually or you can select All instances to monitor them all.
Tip
You can select a counter and click Explain button to get help information about it. A window will pop up beneath the Add Counters dialog box with the explanation of the counter. You can remove a counter by selecting it and clicking Remove.
It is important for you to be familiar with the functions of the major performance counters and their thresholds. The performance counters we will discuss are memory, disk and process related. Table 1 discusses some of these counters and their thresholds. Some recommendations are given for thresholds values that should trigger actions on your part. There can be a myriad of reasons that the threshold is met. It is an indication that the system is not responding correctly if the counter thresholds are met, so it is important to know when this is occurring (or about to occur) and take action. System administrators should investigate the cause anytime a performance threshold is reached. You can also configure the Performance utility to notify you when a threshold is met.

We have investigated the “Data” tab of the System Monitor. Lets look at the other properties of the System Monitor now.

General tab of the System Monitor

The General tab lets you configure the System Monitor view. Figure 4 displays the General tab of the System Monitor’s properties. We can view the System Monitor as a Graph, Histogram or a report by selecting the option from the View group box. We can customize the System Monitor display by selecting the options from the Display elements group box. We can use the Report and histogram data group box to filter through the amount of data to be monitored. The maximum will display the maximum values of counters and minimum will display the minimum values. We can view the System Monitor as 3D or one dimension (The option Flat) by selecting the Appearance select box. Then we can apply a border using the Border option. The Sample automatically every X seconds box will let you configure the refresh interval of the System Monitor. We can also let duplicate counters by selecting the Allow duplicate counter instances option box.
Figure 4 : General tab of System Monitor

Source tab of the System Monitor

The Source tab describes the data source for the System Monitor. There are three major sources. The first one is the current activity of the System. The can be selected by enabling the Current Activity option. The second option is from a log file. This can be enabled by the selecting the Log files option. Then we have to point to the correct log files by adding them by utilizing the Add button. You can also remove the unwanted log files by using the Remove button. The third option is a data base source. We need to enter the Data Source Name (DSN) and select the correct log file database by using the Log set options. We can also filer the data sources according to time ranges by using the Time Range option. Please refer to Figure 5 for details.
Figure 5 : Source tab of System Monitor
Graph tab of the Systems Monitor

The Graph tab will let you configure the display format of the System Monitor graph. You can add titles and vertical axis names for the graph using this tab. We ca also display the graph as a grid using vertical and horizontal lines using this. Then finally we can configure the scale of the graph. Figure 6 displays the Graph tab of the System Monitor.
Figure 6 : Graph tab of System Monitor
Appearance tab of the System Monitor

The final tab is the Appearance tab. This controls the physical appearance of the System Monitor graph. We can change the back ground and foreground colors and font sizes using this tab. The Appearance tab is similar to Figure 7.
Figure 7 : Appearance tab of System Monitor

Using Performance Logs and Alerts
This section of the Performance utility is used to configure logging of performance related information and set up the system to alert you when thresholds are reached. Let’s look closely at the Performance Logs and Alerts section.
In the left pane of the Performance MMC, expand the Performance Logs and Alerts node, and you will see that this section has three child nodes. These are:
· Counter Logs
· Trace Logs
· Alerts
All these logs and alerts can be configured, started or stopped using the Performance utility. Let’s investigate the Counter logs first.
Counter Logs
The Counter logs will store the performance counter information. We can use these logs to analyze data at a later opportunity. Let’s learn how to create a counter log.

1. Click Start Run and type Perfmon.exe
2. Select Performance logs and counters from the Performance Monitor screen.
3. Right click on Counter Logs and select New Log Settings.
4. A text box will appear to enter the counter log name. We will enter Test_Memory_Log for demonstration purposes. Then you will be presented with a Properties screen for the newly created log. The image should be similar to Figure 8
Figure 8 : General tab of Counter Log
The log file name will be automatically assigned by the system. Then we can configure the counters we monitor by utilizing the Counters section. We can first add objects we like to monitor by using the Add Objects button. Then we can select the individual counters for each object by clicking on the Add Counters button. (We will select the memory counters to monitor memory activity for our demonstration purposes.) We can also configure the frequency of the log file entries by utilizing the Interval and Units option boxes. We can configure more settings by using the Log Files and the Schedule tabs. The Log Files tab is shown in Figure 9.
Figure 9 : Log Files tab of Counter Logs
You can configure the log file type using the Log file type option box. Some valid types are binary format, comma separated file format, tab delimited format or database. You can configure these options by clicking on the Configure button. The End file name with option box will let us append a time stamp to the log file. We have selected month- day – year format in Figure 9. We can also put a comment about the log by using the Comment field. We can also instruct the system to overwrite the existing log file by clicking on the bottom option box. Let’s investigate the Schedule tab now. (Please refer to Figure 10)
Figure 10 : Schedule tab for Counter Logs
You can configure the start date and the end date by suing this tab. You can either start the log manually or assign a time. This is done by the controls in the Start log group box. The Stop log group box will let you configure the end time and the subsequent operations of the termination of the log file. You can terminate log manually, after X number of day or at an exact time. Then you can use the Start a new log file command or Run this command option boxes to configure the subsequent events.
5. Click OK or Apply button to apply the changes.

Optimizing Servers for Application Performance
In a production environment, you need to optimize your servers to get the maximum throughput for your mission critical applications. In the following sections, we will address the specifics of monitoring and optimizing memory objects, network objects, process objects and disk objects to provide the best performance for your servers. The source data for the optimization is obtained by analyzing the performance counters related to each object. You can use the System Monitor, discussed earlier in the chapter, to monitor these counters. In each of the following sections, we will discuss which counters should be monitored and the actions you can take to address the problems you detect. Please refer to table 1 to learn about the thresholds for each counter.
Common optimization tips
The lack of memory is one of the most common performance issues on client workstations. You should initially investigate memory issues first when you have workstation performance problems. Servers, on the other hand, are more prone to disk and network problems. Here are some guidelines to help you with optimization methods:
· Make one optimization change at a time. Make the change and test the system to observe the outcome. You will not be able to determine the change if you make multiple changes simultaneously.
· Observe the Event Log closely when you are making modifications to the system. The Event log will display errors when the applications are unstable.
· Try to run the application locally on your system. (As apposed to running it on a network server). This can give you an indication of whether a network problem is present.
Monitoring memory objects
Memory issues often contribute to performance problems. You can use the System Monitor to monitor various counters related to the memory object. The most important performance counters that can be monitored to detect memory problems include the following:
· Memory:Available Bytes
· Memory:Pages/sec

Memory:Available Bytes indicates the available memory capacity. We recommend that you have at least 4MB of memory available to run the server effectively. You should take immediate action if the memory falls below 4MB.
Memory: Pages/sec indicates the rate at which pages are written to or read from disk, in number of pages. The recommended threshold for the Memory:Pages/sec counter is 20. It this counter exceeds 20, you should take action. (Alerts can be used to notify the system administrator of these events Refer to the Alerts section under System Monitor.). The most common memory problem is a memory leak due to incorrect application code. Following are some recommendations to remedy memory issues:
· Investigate the minimum memory requirement for your applications to run. This can be easily done by using the Task Manager. (Read the memory values before and after the application is loaded to the memory). Make sure the available memory exceeds this value. Add more physical RAM to the machine if it is not sufficient.
· Create multiple paging files on multiple disks. This will allow faster disk access between the disks.
· Reevaluate the paging file size. It is recommended that the paging file size be 1.5 times the physical RAM installed. If the paging file/ virtual memory used exceeds this limit, add extra physical memory or decrease the page file size.
· Run your most memory intensive applications on your highest performing computers. You can also reschedule such applications to run when the system work load is light.
Note
The first step in detecting a memory leak is to observe the memory data by using the Memory:Available Bytes and Memory:Committed Bytes performance counters. You should suspect a memory leak when the available memory figure declines by more than 4MBs. You need to isolate the applications and run them against these counters to determine which application is causing the memory leak. You might need to monitor the Process:Private Bytes, Process:Working Set and Process:Handle Count counters on the suspected process to confirm the memory leak. A kernel mode application can also be leaking memory. In that case, you need to use the Memory:Pool Nonpaged Bytes, Memory:Pool Nonpaged Allocs, Process (Process name):Pool Nonpaged Bytes counters. The kernel mode applications do not refer to paging mechanisms; therefore you should use nonpagesd counters.
Monitoring network objects
Monitoring network objects involves tracking the overall network traffic. You also need to track the server’s process and memory data in conjunction with the network traffic. Server memory problems can be initiated by malfunctions of the network architecture.

You should monitor network counters in conjunction with Processor:Processor Time, Physical Disk:% Disk Time and Memory:Pages/sec . Most network resources (network adapters and protocol software) use nonpaged memory. If the computer is doing excessive paging, this might be due to the fact that networking activities are consuming the resources and the applications are being swapped to the disk. This is indicated by an increase in Memory:page/sec and a decrease in Processor:Total Bytes performance counters. Please check the event viewer in this case to confirm that you are running out of paged or non paged memory.
Note
The paging capabilities of a system should be approximately 1.5 times the amount of installed RAM. This is automatically set by the operating system. The system will be unstable if you exceed the 1.5 limit (A common cause is a network issue that causes excessive swapping of applications.)
There are specialized performance counters that can be used to optimize network usability. The following are important network related performance counters:
· Network Interface\:Bytes Total/sec, Bytes Sent/sec, and Bytes Received/sec These counters will describe how the network adapters are performing against the network traffic. You should investigate any Bytes received or Send abnormalities indicated by these counters. (the recommended threshold depends on the network adapters and network topologies).
· Protocol_layer_object: Segments Received/sec, Segments Sent/sec, Frames Sent/sec, and Frames Received/sec The Protocol_layer object will be TCPv4, TCPv6, IPv6 etc.. These are based on a single protocol at a time. This will provide you with information on how the protocols perform against the network availability. A frame is a unit of data sent to a machine over the network. You should be concerned if the frames received or sent do not correspond to your preferred settings for the organization.
· Server: Bytes Total/sec, Bytes Received/sec, and Bytes Sent/sec These counters indicate how the server is using the network to receive and send data. This data is closely coupled to protocol layer and Network Interface layer data. The protocol and network activity should be high if these counters are high. We should investigate if the protocol activity and the network activity do not follow the server trends. (e.g. It could be a hardware malfunction that consumes the resource of the server. Therefore network and protocol activity will be slow in face of a high server utilization rate)
You need to constantly monitor network traffic and make sure it does not exceed your Local Area Network (LAN) capacity. You should be using the Network Monitor tool to manage large network traffic situations. (This is not installed by default in the Windows Server 2003 installation. You might need to install it via Add/Remove Programs in Control Panel in order to use it). Here are some recommendations to optimize your network performance:

· Unbind unwanted and infrequently used network adapters. They will put an extra burden on the system that has to manage them.
· Try to place all domain users in one subnet to prevent unwanted replication traffic on the network.
· The order in which network/transport protocols are bound makes a difference if you are using multiple protocols for network communications. For example, if you have both TCP/IP and IPX/SPX installed and bound to your NIC, put the most used protocol at the top of the protocol list. Some protocols are optimized for specific network topologies, so you should spend some time identifying the protocols you need and configuring the protocols for maximum throughput.
Monitoring process objects
Monitoring processor and system counters will give you a good indication of how the processors are utilized in your Windows Server 2003 server. The most important performance counters to monitor in this regard include the following:
· Processor: % Processor Time and Process(process): % Processor Time These counters will show how active the processor is. The Process(process) counter will display the statistics for a single process. The server is handing a lot of requests if the percentage is higher. If the counter is low then the server is idle most of the time. It is common practice to apply more processors if the counter gets more than 80%. (This threshold will change depending on what the server is dedicated to do).
· System: Processor Queue Length These are the requests in line to be processed. This value shouldn’t be greater than 1. If it is, that means that there are requests waiting in the queue to be processed . If this happens often, you should add more processors or upgrade to a faster processor to handle the extra load.
· Processor: Interrupts/sec This counter indicates the number of interrupts the system gets from devices (Disks, network adapters, etc.). If the number of interrupts is higher, you should upgrade the device drivers or assign other processors to control these devices. (The number of interrupt threshold can be different from a processor to processor. A common benchmark is 1000 interrupts per processor per processor. We should investigate if the interrupts are higher than 1000 per second.)
· Server Work Queues:Queue Length This counter indicates the queue length of the Server Work queue at a given time. The recommended threshold for this queue is 4. It is an indication of processor congestion of if there are more than 4 items in the queue. You should add processing power to redirect queries or install another processor to eradicate this problem.
You can observe these counters to monitor the process objects, and you will be able to tell if the processor(s) is creating a bottleneck on the system that needs to be addressed. After memory, the processor is the most common system bottleneck.
Monitoring disk objects
Another component that it is important to monitor in order to optimize your server’s performance is disk activity. The hard disk is often a system bottleneck in today’s fast processor, memory packed computers. One way to increase disk performance is to spread the workload over multiple disks. Disk activity can be monitored using the following performance counters.
· PhysicalDisk: % Disk Time and % Idle Time These two counters will indicate the percentage of time the disk was used and the percentage of time the disk has been idle. If the disk usage time is high, you should consider moving some applications to other servers. The threshold for these counters is 90%. We should investigate if these counters exceed 90%.
· PhysicalDisk: Disk Reads/sec and Disk Writes/sec This will indicate the speed of writing data on to the disk and the speed it was read from the disk, by showing the number of times the disk reads or writes per second. A long delay might indicate a disk hardware problem or a long queue of data. The thresholds for these counters change form disk manufacturer to manufacturer. (e.g. An Ultra Wide SCSI disk driver can handle 50 to 70 inputs and output transactions per second.). We should upgrade the disk or try to eliminate the queue length if the disk threshold is met.
· PhysicalDisk: Avg. Disk Queue Length This indicates the length of the queue involved in writing or reading from the disk, in number of requests that are waiting when the counter is measured, including requests in service. The threshold is the number of spindles plus two requests. The disk transactions are going to be slower if we exceed this queue length. Therefore we have to assign more disk space to accommodate the extra requests.
· LogicalDisk: % Free Space This counter indicates the amount of free space available on the disk, as a percentage of the total disk capacity. Paging problems can occur if you have little disk space to which the system can swap data out of memory, and operating system errors can occur if the partition on which the OS is installed becomes too full.
Note
Log the performance data onto another drive when you are testing the disk speed of a particular logical disk. Otherwise the logging process will interfere with the statistics.
· Logical Disk sec/Transfer This counter describes how long the disk is taking to fulfill the requests. The more time it spends on fulfilling the requests, the slower the disk controller is. It is recommended that this value be less than .3 second for most disk controllers.
· Physical Disk Bytes/sec This will give you the throughput of the disk activity.

Note
The following are recommendations for optimizing disk activity on the server:
· When you upgrade a disk, upgrade the disk controller and bus associated with it. It does no good to install a fast disk if the controller and bus don’t support the faster speed.
· Try to distribute applications across multiple disks. That is, place different applications on different disks, However, you should also ensure that each individual application is not cross referencing to multiple disks, so as to minimize disk activity.
· Use Disk Defragmenter on a regular basis (especially after deleting large amounts of data) to rearrange the data on each partition so that data belonging to a specific file is contiguous on the disk; this minimizes disk access time.