Design a Windows NT Service to Exploit Special Operating System Facilities

Have you ever wanted to write an application that could perform work for various clients, both local and remote? Suppose this app needs the ability to set the permission necessary to handle that work based on the client’s authorization level. And say that this app has to execute whether or not a user is physically touching the computer on which it is running. What you need to create, then, is a Windows NT® service.

In Windows NT, a service is a type of executable that gets special treatment from the operating system. This article describes what Windows NT service applications are, how to design them, and what additional facilities the operating system offers to services.

First and foremost, a Windows NT service is a Win32® executable. If you want to write a service and you are already familiar with DLLs, structured exception handling, memory-mapped files, virtual memory, device I/O, thread-local storage, thread synchronization, Unicode, and all the other staple facilities offered by the Win32 API, you will not have to learn much more. All of these facilities are available to Windows NT services, and it should be relatively easy and straightforward for you to convert an existing application into a service. (The Windows NT Resource Kit contains a utility called SRVANY.EXE that allows you to run any Win32 application as a service, but it will not be able to take advantage of the special operating system features available to services.)

The second thing you need to know when writing a service is that it should have absolutely no user interface. Most services will be running on a Windows NT Server machine locked away in a closet somewhere. If your service presents any user interface elements like message boxes, it’s unlikely that a user will be in front of the machine to see the message box and dismiss it. Since you shouldn’t have a user interface, it doesn’t matter whether you choose to implement your service as a GUI application (with WinMain as its entry point) or as a console application (with main as its entry point).

If a service doesn’t present a user interface, how do you configure it? How do you start and stop a service? How does the service issue warnings or error messages, or report statistical data about its performance? The answer to all of these questions is that services can be remotely administered. Windows NT offers a number of administrative tools that allow a service to be managed from other machines connected on the network so you don’t have to physically check the computer running the service. You are probably already familiar with many of these tools: the Services Control Panel applet, the Registry Editor (RegEdit.exe), the Event Viewer (EventVwr.exe), and the Performance Monitor (PerfMon.exe).

Windows NT includes a number of services right out of the box. Figure 1 shows all of the services installed on my Windows NT Workstation machine and the name of the corresponding Win32 executable file that contains the code for the service.

Figure 1: Windows NT Workstation Services

Service Name Executable Name
Alerter Services.exe
ClipBook Server ClipSrv.exe
Computer Browser Services.exe
DHCP Client Services.exe
Directory Replicator Lmrepl.exe
Event Log Services.exe
Messenger Services.exe
Net Logon Lsass.exe
Network DDE Netdde.exe
Network DDE DSDM Netdde.exe
NT LM Security Support Provider Services.exe
Plug and Play Services.exe
Remote Access Autodial Manager RASMAN.exe
Remote Access Connection Manager RASMAN.exe
Remote Access Server Rassrv.exe
RPC Locator Locator.exe
RPC Service RpcSs.exe
Schedule Atsvc.exe
Server Services.exe
Spooler Spoolss.exe
TCP/IP NetBIOS Helper Services.exe
Telephony Service TAPISrv.exe
UPS Ups.exe
Workstation Services.exe

A Word about Security

Depending on the service you are writing, you may have to be familiar with the Windows NT security architecture. I’ll just go over the basics. On Windows NT, all security is user-based. In other words, all objects—processes, threads, files, registry keys, mutexes, semaphores, events, and so on—have a user as an owner. When a process is invoked, that process is usually executing in the context of a user who has an account on the machine/network or in the context of a special account called the System Account.

If the process is executing under a user account, the threads in the process are allowed to touch any resources that the user has been granted access to. For example, users can read and write to a file on the local machine if they have been granted access to that file and they can also access a file on a networked machine if they are validated by the domain.

The System Account identifies the operating system itself, and any process running under this account has full access to anything on the machine. For example, threads running in a System Account process can read from and write to any file on the machine. However, the System Account is never validated by the domain, so it has no access to network resources.

There is an application called WinLogon.exe that the operating system starts as part of the boot process. WinLogon.exe runs under the System Account and is responsible for displaying the familiar Windows NT Logon dialog box. When a user enters her username and password, WinLogon.exe passes this information to the security database to validate the user. If the user is validated, WinLogon.exe runs the Explorer.exe application under the context of the logged-on user account. Any applications spawned from Explorer.exe will also run under the logged-on user account. A user who logs onto the system using the WinLogon dialog box is called an interactive user because she is physically accessing the machine using its keyboard and mouse.

It is also possible for users to access a machine via the network. When a request comes to a machine via the network, the request can indirectly contain a username and password that can be checked against the security database to validate the user. If the user is validated, the thread that executed the security check can now impersonate the user. Impersonation means that the thread acts as if it is executing in a process that was run by the networked user. This thread is allowed to touch any resources that the networked user has been granted access to. A user that connects to a machine this way is called noninteractive because she isn’t physically touching the computer’s keyboard or mouse.

Requests more often come from client software (usually running on Windows® 95 or Windows NT Workstation) into the server machine, and the user logged onto these machines becomes one of the many noninteractive users.

The Three Service Components

There are three types of components involved in making Windows NT services work. The first component is called the Service Control Manager (SCM, pronounced “scum”). Each Windows NT system ships with a SCM that lives in the Services.exe file. It is automatically invoked when the operating system boots, and terminates when the system is shut down. This process runs with system privileges and provides a unified and secure means of controlling Win32 services. The SCM is responsible for communicating with the various services, telling them to start, stop, pause, continue, and so on.

The second component is the service itself. A service is simply a Win32 executable that contains the additional code necessary to receive information and commands from the SCM. A service also calls special functions that communicate its status back to the SCM.

The third and last component is a service control program (SCP), a Win32 application that presents a user interface allowing the user to start, stop, pause, continue, and otherwise control all the services installed on a machine. The SCP calls special Win32 functions that let it talk to the SCM. The Services Control Panel applet (see Figure 2) is an SCP that Microsoft ships with Windows NT.

Figure 2: The Services Control Panel applet

In Figure 2, the Service column identifies the service’s name, the Status column indicates whether the service is started, paused, or stopped (blank), and the Startup column indicates when the service is to be invoked. Using this window, you can command the SCM to start a stopped service, stop a started service, pause a started service, or continue a paused service. If you are manually starting a service, you can pass command-line arguments to the service by filling in the Startup Parameters edit box (but most services should not require command-line arguments).

Of these three components, you are most likely to write the services themselves. It is also likely that you will have to write a companion client-side application that will talk to the SCM as an SCP. The bulk of this article describes how to design and implement a service. A future article will describe how to implement an SCP.

You will never write the SCM itself; Microsoft has done this for you. Because the SCM is an RPC server, an SCP can communicate with it over a network to administer services remotely. An administrator using an SCP on machine A can have the SCM on machine B start, stop, and control a Win32 service running on machine B.

In addition to the Services Control Panel applet, Windows NT also ships with a command-line SCP tool called NET.EXE. This tool is limited to controlling services residing on the local machine. Using NET.EXE, you can start, pause, continue, and stop services using the following syntax:

NET START    servicename
NET PAUSE    servicename
NET CONTINUE servicename
NET STOP     servicename

You can also use NET.EXE to display a list of services running on the local machine by simply typing

NET START

without specifying a servicename.

If you want an SCP that allows you to administer services remotely, you can use the Server Manager (SrvMgr.exe) that ships with both Windows NT Server and the Windows NT Workstation Resource Kit. When you invoke the Server Manager, it displays a list of networked computers (see Figure 3). Selecting a machine and choosing the Services option from the Computer menu item causes SrvMgr.exe to display a window that looks just like the Services Control Panel applet’s window. However, the content of the window reflects the status of the services installed on the selected machine. If you press the Start button, you will cause a service to start on the remote machine.

Figure 3: Windows NT Server manager

Microsoft also offers a command-line SCP (SC.EXE) that comes with all versions of Windows NT Resource Kits. Running this tool without passing it any parameters displays its usage syntax, as shown in Figure 4.

Figure 4: SC.EXE Usage Syntax

DESCRIPTION:
        SC is a command line program used for communicating with the
        NT Service Controller and services.

USAGE:
        sc <server> [command] [service name] <option1> <option2>...

        The option <server> has the form \\ServerName
        Further help on commands can be obtained by typing: "sc [command]"
        Commands:
          query-----------Queries the status for a service, or
                          enumerates the status for types of services.
          start-----------Starts a service.
          pause-----------Sends a PAUSE control request to a service.
          interrogate-----Sends an INTERROGATE control request to a service.
          continue--------Sends a CONTINUE control request to a service.
          stop------------Sends a STOP request to a service.
          config----------Changes the configuration of a service (persistant).
          qc--------------Queries the configuration information for a service.
          delete----------Deletes a service (from the registry).
          create----------Creates a service. (adds it to the registry).
          control---------Sends a control to a service.
          GetDisplayName--Gets the DisplayName for a service.
          GetKeyName------Gets the ServiceKeyName for a service.
          EnumDepend------Enumerates Service Dependencies.

        The following commands don't require a service name:
        sc <server> <command> <option>
          boot------------(ok | bad) Indicates whether the last boot should
                          be saved as the last-known-good boot configuration
          Lock------------Locks the Service Database
          QueryLock-------Queries the LockStatus for the SCManager Database

EXAMPLE:

        sc start MyService

How a Service Starts

In order for the SCM to start a service, information about the service must be added to the SCM’s database. The SCM’s database lives inside the registry under the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet
\Services subkey. An application should not manipulate this registry subkey directly using the various registry functions. Instead, an SCP should call Win32 functions that tell the SCM to manipulate this database. When you purchase and install a product that includes a Win32 service, the Setup program for that product is an SCP that tells the SCM to add the information for that service to the SCM’s database.

Once the service is in the SCM’s database, an SCP like the Services Control Panel applet will be able to enumerate the installed services and show you each service’s current status. The Services Control Panel applet also lets you select when the service should start. Pressing the applet’s Startup button displays a dialog box (shown in Figure 5) that allows you to configure how the service should start. This dialog box shows the name of the selected service at the top and allows you to adjust the startup type of the service.

Figure 5: Service Startup

One of the benefits of writing a service versus a regular Win32 application is that the operating system can automatically start a service for you. In the SCM database, a service can be marked as an automatic service, which means that the operating system should automatically try to start the service when it boots. It’s important to note that automatic services run before any user logs onto the machine interactively. In fact, many machines running Windows NT Server are set up to run services only, and no one ever logs onto the machine interactively. The Server service allows a client to access subdirectories, files, and printers on a networked machine. When Windows NT boots, the Server service starts automatically, and can therefore service requests from networked clients without anyone having to log onto the machine interactively.

A service can also be marked as manual, which means that the operating system will not automatically start it, but a user can explicitly start the service. A manual service will also start if another service is started that depends on the manual service. (I’ll go into more detail about service dependencies later.) Finally, a service can be marked as disabled, which means that it cannot start at all. For example, you would disable the DHCP Client service if you manually assigned an IP address to your machine rather than having it dynamically obtain an IP address from a machine running the DHCP Server service.

In addition to the startup type, you are able to specify under which user account the service should execute. Most services run under the System Account (sometimes referred to as the Local System Account), which basically means that the service can do just about anything on the computer. If the service is running under the System Account, you can optionally check the Allow Service to Interact with Desktop option. This option is disabled for most services.

Remember when I said that a service should not have a user interface because no one will be in front of the machine? A few special services do display a user interface. On my system, the Remote Access Connection Manager service and the Spooler service are the only two services that interact with the desktop. These two services are probably running on a machine controlled by a user. The Spooler service is responsible for sending data out of the machine to a printer. When the print job completes or if the printer is out of paper, the Spooler service notifies the user with a message box. For the user to see this message box, the Allow Service to Interact with Desktop option must be checked.

When a service is running under the System Account, there is no user running the process, and therefore the service will have limited access to network resources such as shared directories and pipes. A service running under the System Account may connect to resources using a NULL session. You can tell Windows NT what shares and pipes to make available to NULL session clients by modifying the NullSessionPipes and NullSessionShares data values that exist under the HKEY_LOCAL_MACHINE\SYSTEM\
CurrentControlSet\Services\LanmanServer\Parameters registry subkey. You can also enable all pipes and shares on the machine to be accessed by all NULL session connections by setting the RestrictNullSessionAccess data value in the subkey to 0. One more note: a service running under the System Account will not be able to open the HKEY_CURRENT_USER registry key, but the service will be able to open the HKEY_LOCAL_MACHINE\Security registry key.

Instead of running a service under the System Account, you can select the This Account option in the dialog box and then enter a username and password. Whenever this service starts, it will run using the security context of the specified user account. If the user account and password
are valid, the service process will have access to the network resources.

Some services, such as the DHCP Client, Messenger, and Alerter services, are fairly simple in their implementation. It would be inefficient if each of these services had to be implemented in a separate Win32 executable, each with its own 4GB virtual address space. Because of this inefficiency, Microsoft allows a single Win32 executable to contain several services. The Services.exe file actually contains about ten different services, including the three just mentioned. While combining the service implementations in a single executable is good for efficiency, it does come with one drawback—the SCM will only allow this executable to run using the System Account. The options for setting a specific user account will be disabled in the dialog box.

Although the Service Control Panel applet lets you change a service’s startup settings, you will have to do so very infrequently. It is expected that the Setup program will add the service to the SCM’s database with the appropriate settings for the service. I have only altered these settings when debugging my services.

Designing a Win32 Service

In this section, I’ll explain how a service application must be designed to take advantage of all the free features that Windows NT offers. There are three important functions to concern yourself with when writing a service. The first function is the entry point function of the process: WinMain or main. The primary thread in the process executes this function, which is responsible for performing initialization for the whole process. This means anything that applies to all services in the executable. Remember that a single executable can contain several services to keep things more efficient. The primary thread of the process also calls a Win32 function that tells the SCM how many services are contained inside the executable and gives the address of each service’s ServiceMain callback function. Once all services inside the executable have stopped running, the primary thread performs cleanup for the process as a whole before the process terminates.

The second important function when you’re writing a service is a ServiceMain function that must have the following prototype:

VOID WINAPI ServiceMain(DWORD dwArgc, 
                        LPTSTR *lpszArgv);

This function is called by the operating system and executes the code that implements the service itself. You do not have to call this function ServiceMain; you can name it anything you like because your WinMain/main function passes the address of this function when it tells the SCM how many services it contains. If your executable contains four services, you must implement four different ServiceMain functions and the addresses of all these functions get passed to the SCM.

A dedicated thread executes each service’s ServiceMain function. When the primary thread calls the Win32 function, StartServiceCtrlDispatcher, the SCM spawns a thread for each service in the process. Each of these threads begins its execution with the corresponding service’s ServiceMain function. This is why services are always multithreaded—an executable with just one service will have a primary thread and another thread that executes the service itself.

The third and last important function is a Handler function that must have the following prototype:

VOID WINAPI Handler(DWORD fdwControl);

Like ServiceMain, the Handler function is a callback function, and you must write a separate Handler function for each service contained inside the executable. So, if you have an executable containing two services, you must have five different functions: a single WinMain or main entry point function, ServiceMain and Handler functions for the first service, and another set of ServiceMain and Handler functions for the second service.

The SCM calls a service’s Handler function to change the state of the service. For example, when someone using the Services Control Panel applet tries to stop your service, your service’s Handler function receives a SERVICE_
CONTROL_STOP notification. The Handler function is responsible for performing whatever actions are necessary to stop the service. The primary thread of the process executes all Handler functions. You want to design your Handler functions to execute as quickly as possible so that other service’s Handler functions in the same process can receive their notifications in a reasonable amount of time.

Since the primary thread executes the Handler function but the service is executed by another thread, you must have the Handler code communicate the desired state change to the service thread. There is no standard way to perform this communication; it’s really dependent on what your service does. You can queue an asynchronous procedure call (APC), post an I/O completion status, post a window message, and so on.

Diving Deeper

OK, that’s it for the broad strokes. Now let’s get into the details. Inside WinMain an array of SERVICE_TABLE_
ENTRY structures is initialized. Each SERVICE_TABLE_
ENTRY structure looks like this:

typedef struct _SERVICE_TABLE_ENTRY {
   LPTSTR lpServiceName;
   LPSERVICE_MAIN_FUNCTION lpServiceProc;
} SERVICE_TABLE_ENTRY, *LPSERVICE_TABLE_ENTRY;

The first member indicates the name of the service and the second member is the address of the service’s ServiceMain callback function. Since this template process contains one service, there are two SERVICE_TABLE_ENTRY elements in the array: one for the service and a NULL entry to indicate the end of the array.

The address of this array is then passed to StartServiceCtrlDispatcher:

BOOL StartServiceCtrlDispatcher(LPSERVICE_TABLE_ENTRY
                                lpServiceStartTable);

This Win32 function is how the executable process notifies the SCM of the services contained within the process. StartServiceCtrlDispatcher spawns a new thread for each non-NULL element in the array passed to it. Each thread begins execution at the ServiceMain function indicated by the lpServiceProc member in the array element.

The SCM likes to keep close tabs on how a service is doing. For example, when the SCM invokes a service executable, the SCM waits for the primary thread in the executable to call StartServiceCtrlDispatcher. If StartServiceCtrlDispatcher is not called within two minutes, the SCM thinks that the service is malfunctioning and calls TerminateProcess to forcibly kill the process. For this reason, if your process will require more than two minutes to initialize, you must spawn another thread to do the initialization so that the primary thread can quickly get to its call to StartServiceCtrlDispatcher.

The StartServiceCtrlDispatcher function does not return immediately to your WinMain/main function. Instead, it sits inside a loop (see Figure 6). While inside this loop, StartServiceCtrlDispatcher suspends itself, waiting for one of two things to happen. First, the thread wakes up if the SCM wants to send a control notification to one of the services running inside the process. When a control notification comes in, the thread wakes up and calls the corresponding service’s Handler function. The Handler function processes the service control notification (probably communicating with the service’s thread) and returns to StartServiceCtrlDispatcher, which then loops back around, suspending itself again.

Second, the thread will also wake up if one of the service threads terminates. In this case, StartServiceCtrlDispatcher wakes up and decrements a count of running services in the process. If this count is zero (all services have stopped running), StartServiceCtrlDispatcher returns to your WinMain/main function so that you can perform any process-related cleanup and terminate the process. As long as there is at least one service running, StartServiceCtrlDispatcher loops back around and continues to wait for more control notifications or for another service thread to terminate.

Figure 6: StartServiceCtrlDispatcher Loop

// Don’t count the NULL entry
int nNumRunningServices = NumElementsInServiceStartTable - 1;
while (nNumRunningServices > 0) {
   WaitForAServiceControlCodeOrAServiceThreadToTerminate();
   if (AServiceControlCode) {
      RemoveServiceControlCodeFromQueue()
      CallServiceHandler(fdwControlCode);
   } else {
      nNumRunningServices--;
   }
}
return(TRUE);   // StartServiceCtrlDispatcher returns to WinMain/main

OK, that’s it for your WinMain/main function. Now, let’s look at your service’s ServiceMain function. Again, a ServiceMain function must have the following prototype:

VOID WINAPI ServiceMain(DWORD dwArgc, 
                        LPTSTR *lpszArgv);

Usually, a ServiceMain function ignores the two parameters passed to it because services are not frequently passed any parameters at all. It’s best for a service to configure itself by grabbing configuration settings out of the registry. A service should use the registry functions to look for its configuration settings under the HKEY_LOCAL_
MACHINE\SYSTEM\CurrentControlSet\Services\ServiceName\Parameters subkey, where ServiceName is the name of your service. In fact, you may want to write a client application (with a user interface) that allows an administrator to configure your service’s settings. The client application would then save these settings in the registry so that the service could retrieve them. A running service could use the RegNotifyChangeKeyValue function to receive a notification when an external application has changed its configuration data. This allows a service to reconfigure itself on-the-fly.

The first thing a ServiceMain function must do is tell the SCM the address of its Handler callback function by calling RegisterServiceCtrlHandler:

SERVICE_STATUS_HANDLE RegisterServiceCtrlHandler(
      LPCTSTR lpServiceName, 
                     LPHANDLER_FUNCTION lpHandlerProc);

The first parameter indicates which service you are setting a Handler function for and the second parameter is the address of the Handler function. The lpServiceName parameter must match the name used when the array of SERVICE_TABLE_ENTRYs was initialized and passed to StartServiceCtrlDispatcher. I’ll talk more about the Handler function later. For now, let’s continue discussing what your ServiceMain function does next.

RegisterServiceCtrlHandler returns a SERVICE_STATUS_HANDLE, which is simply a 32-bit value that the SCM uses to uniquely identify this service. When the service needs to report its current status back to the SCM, you will have to pass this handle to the desired Win32 function. Note that unlike most other handles in the system, you never close the handle returned from RegisterServiceCtrlHandler.

The SCM requires that your ServiceMain function’s thread call the RegisterServiceCtrlHandler function within one second. Otherwise the SCM thinks that your service has failed. In this case, the SCM does not terminate your service; it continues to run fine. However, whatever tool you used to start the service (most likely the Services Control Panel applet) may display a message box reporting the assumed failure. If you close the Services Control Panel applet and reopen it, it will refresh its contents and display the correct information. Microsoft should have added a Refresh button in the applet’s window so that users don’t have to close and reopen it to see the correct data.

After RegisterServiceCtrlHandler returns, the ServiceMain thread should immediately tell the SCM that the service is continuing to initialize. It does this by calling the SetServiceStatus function:

BOOL SetServiceStatus(SERVICE_STATUS_HANDLE hService, 
                     LPSERVICE_STATUS lpServiceStatus);

This function requires that you pass it the handle identifying your service (returned from the call to RegisterServiceCtrlHandler) and the address of an initialized SERVICE_STATUS structure:

typedef struct _SERVICE_STATUS {
   DWORD dwServiceType;
   DWORD dwCurrentState; 
   DWORD dwControlsAccepted; 
   DWORD dwWin32ExitCode; 
   DWORD dwServiceSpecificExitCode; 
   DWORD dwCheckPoint; 
   DWORD dwWaitHint; 
} SERVICE_STATUS, *LPSERVICE_STATUS;

The SERVICE_STATUS structure contains seven members that reflect the current status of the service. All of these members must be set correctly before passing the structure to SetServiceStatus.

The dwServiceType member indicates what type of service executable you have written. Set this member to SERVICE_WIN32_OWN_PROCESS if your executable contains a single service, or SERVICE_ WIN32_SHARE_PROCESS if your executable contains more than one service. In addition to these two flags, you can OR in the SERVICE_INTERACTIVE_PROCESS flag if your service needs to interact with the desktop (but you should avoid interactive services as much as possible). The value of this member should never change during the lifetime of your service.

The dwCurrentState member is the most important member of this structure. It tells the SCM the current state of your service. To report that your service is still initializing, you should set this member to SERVICE_START_ PENDING. I’ll explain the other possible values when I talk about the Handler function.

The dwControlsAccepted member indicates what control notifications the service is willing to accept. If you will allow an SCP to pause/continue your service, specify SERVICE_ACCEPT_PAUSE_CONTINUE. Many services do not support pausing or continuing; you have to decide for yourself if this makes sense for your service. If you allow an SCP to stop your service, specify SERVICE_ACCEPT_STOP. If you want your service to be notified when the operating system is being shut down, specify SERVICE_ACCEPT_SHUTDOWN. Use the OR operator to combine the desired set of flags.

The dwWin32ExitCode and dwServiceSpecificExitCode members allow the service to report error codes. If a service wishes to report a Win32 error code (as defined in WinError.h), it sets the dwWin32ExitCode member to the desired code. A service can also report errors specific to the service that do not map to a predefined Win32 error code. To do this, you must set the dwWin32ExitCode member to ERROR_SERVICE_SPECIFIC_ERROR and then set the dwServiceSpecificExitCode member to the service-specific error code. Set the dwWin32ExitCode member to NO_ERROR when the service is running normally and has no error to report.

A service reports its ongoing progress using the last two members, dwCheckPoint and dwWaitHint. When you set dwCurrentState to SERVICE_START_PENDING, you should set dwCheckPoint to 0 and set dwWaitHint to the number of milliseconds required in order for the service to be fully up and running. Once the service is fully initialized, you should reinitialize the SERVICE_STATUS structure’s members so that dwCurrentState is SERVICE_RUNNING, and then set both dwCheckPoint and dwWaitHint to 0.

The dwCheckPoint member exists for your benefit. It allows a service to report how far it has gotten in its processing. Each time you call SetServiceStatus, you can increment dwCheckPoint to a number that indicates what “step” your service has executed. It is totally up to you to decide how frequently you want to report your service’s progress. If you do decide to report each step of your service’s initialization, the dwWaitHint member should be set to indicate how many milliseconds you think you need to reach the next step—not the number of milliseconds required for the service to complete its processing.

After all of your service’s initialization is complete, your service calls SetServiceStatus indicating SERVICE_RUNNING—and at that point your service is running. Usually a service runs by placing itself in a loop. Inside the loop, the service thread suspends itself waiting for a network request or for a notification indicating that the service should pause, continue, stop, shut down, and so on. If a network request comes in, the service thread wakes up, processes the request, and loops back around to wait for the next request/notification.

If the service wakes up due to a notification, it processes the notification, and then loops back around waiting for the next request/notification—unless the service gets a stop or shutdown notification. If either of these notifications comes in, the service thread should exit the loop, perform any necessary cleanup, and then return from the thread. When the ServiceMain thread returns it terminates, causing the thread sleeping inside StartServiceCtrlDispatcher to wake up and decrement its count of running services, as explained earlier.

We have just one more function to discuss in detail: the service’s Handler function.

VOID WINAPI Handler(DWORD fdwControl);

The SCM gets and saves the address of this callback function when the ServiceMain function calls RegisterServiceCtrlHandler. An SCP calls a Win32 function that tells the SCM how to control a service. Microsoft currently defines five standard control codes (see Figure 7). In addition to these codes, a service can accept user-defined codes that are in the range of 128 to 255, inclusive. The Handler function is responsible for processing the notification codes appropriately. The action that the Handler takes differs dramatically depending on the control code received.

Figure 7: Service Control Codes

Control Notification Code Meaning
SERVICE_CONTROL_STOP Requests the service to stop.
SERVICE_CONTROL_PAUSE Requests the service to pause.
SERVICE_CONTROL_CONTINUE Requests the paused service to resume.
SERVICE_CONTROL_INTERROGATE Requests the service to immediately update its current status information to the SCM. All services must respond to this control code.
SERVICE_CONTROL_SHUTDOWN Requests the service to perform cleanup tasks because the system is shutting down.

When the Handler function receives a SERVICE_CONTROL_STOP, SERVICE_CONTROL_PAUSE, or SERVICE_CONTROL_CONTINUE control code, SetServiceStatus must be called to acknowledge receipt of the code and to specify how long the service thinks it will take to process the state change. For example, you acknowledge receipt of the control code by setting the SERVICE_ STATUS’s dwCurrentState member to SERVICE_STOP_PENDING, SERVICE_PAUSE_PENDING, or SERVICE_START_PENDING.

While a stop or pause operation is pending, you must also specify how long you think the operation will take to complete. This is useful because a service may not be able to change its state immediately; it may have to wait for a network request to complete or for data to be flushed to a drive. You indicate how long it will take to complete the state change using the dwCheckPoint and dwWaitHint members, just as I did when I reported that the service was first starting. If you want, you can report periodic progress by incrementing the dwCheckPoint member and setting the dwWaitHint member to indicate how long you expect the service to take getting to the next step.

After all of the actions are completed to stop, pause, or continue the service, call SetServiceStatus again. This time, set the dwCurrentStatus member to SERVICE_STOPPED, SERVICE_PAUSED, or SERVICE_RUNNING. When you report any of these status codes, both the dwCheckPoint and dwWaitHint members must be 0 since the service has completed its state change. (The Win32 documentation states that you should only set the dwCheckPoint and dwWaitHint members for start, stop, and continue operations. It does not state that you should set these members for a pause operation. I recommend you handle a pause operation just like any of the other operations.)

When the Handler function receives a SERVICE_INTERROGATE control code, the service should simply acknowledge receipt of this code by setting dwCurrentState to the service’s current state before calling SetServiceStatus. Set both dwCheckPoint and dwWaitHint to 0 before making this call.

When the operating system is shutting down, the Handler function receives a SERVICE_SHUTDOWN control code. A service does not have to acknowledge receipt of this code at all. The service should perform the minimal set of actions necessary to save any data. To ensure that a machine shuts down in a timely fashion, a service should only process this control if it absolutely has to. By default, the system gives just 20 seconds for all services to shutdown. After this time, the system calls TerminateProcess to forcibly kill all services and completes the system shutdown. This 20-second period is set by the WaitToKillServiceTimeout value contained in the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control registry subkey.

When the Handler function receives any of the user-defined codes (128 through 255), your handler should execute the desired user-defined action. Do not call the SetServiceStatus function unless the user-defined action forces the service to pause, continue, or stop. If the user-defined action does force a change in the service’s state, then SetServiceStatus should be called setting the dwCurrentState, dwCheckPoint, and dwWaitHint members as described for the control codes I just explained.

A service’s primary thread executing the Handler function receives the notification, but the ServiceMain thread needs to do the actual work to process the notification. For example, you might be writing a service that processes client requests that come in over a named pipe. Your service’s thread suspends itself while waiting for a client to connect. Now, if your Handler thread gets a SERVICE_CONTROL_STOP code, how do you stop the service? I’ve seen many developers simply call TerminateThread from the Handler function to kill the service thread forcibly. By now, everyone should know that TerminateThread is one of the worst functions you can possibly call because the thread doesn’t get a chance to do cleanup. The thread’s stack is not destroyed, the thread can’t release any kernel objects that it may have waited on, DLLs are not notified that the thread has been destroyed, and so on.

The proper way for the service to stop is for it to somehow wake up, see that it is supposed to stop, clean up properly, and then return. This means that you must implement some form of interthread communication between your Handler code and your ServiceMain code. The best interthread communication mechanism is I/O completion ports. You can use any mechanism you like, including asynchronous procedure call (APC) queues, sockets, and window messages.

Another challenge when developing a service is all the status reporting from calling SetServiceStatus. There is often debate among service implementors about where to place the calls to SetServiceStatus. Many sample programs I’ve seen have the Handler function make the initial call to SetServiceStatus to report SERVICE_STOP_PENDING and then pass the control code off to the service thread. Just before the service thread stops, it calls SetServiceStatus to report SERVICE_STOPPED.

This sounds like a good idea for two reasons. First, your service is acknowledging receipt of the control code right away and processing it as it has time. Second, the primary thread is responsible for calling all the Handler functions inside your executable. If you have the primary thread do all the work, you may be preventing other services inside your executable from processing their control notifications in a timely fashion. However, race conditions can easily occur with this method, and few services I’ve seen deal with these conditions properly.

Here is an example of a race condition: say that your service receives a SERVICE_CONTROL_PAUSE code and your Handler thread responds with a SERVICE_PAUSE_PENDING, and then passes the code off to the ServiceMain thread. The ServiceMain thread starts processing the code when, all of a sudden, your Handler thread preempts the ServiceMain thread and receives a SERVICE_CONTROL_STOP code. Your Handler function then responds with a SERVICE_STOP_PENDING code and queues the new code to the ServiceMain thread. When the ServiceMain thread gets CPU time again, it will complete its processing of the SERVICE_CONTROL_PAUSE code and report SERVICE_PAUSED. Then, it will see the queued SERVICE_CONTROL_STOP code, stop the service, and report SERVICE_STOPPED. After all of this, the SCM has received the following state updates:

SERVICE_PAUSE_PENDING
SERVICE_STOP_PENDING
SERVICE_PAUSED
SERVICE_STOPPED

As you can see, this is gibberish and the results are undefined! You’d be surprised how many services I’ve seen that can actually report this sequence. The reason these services work at all is that it is unlikely your service will be stopped while in the process of being paused—but it can happen.

When I first started working with services, I thought that the SCM would be responsible for preventing race conditions from occurring. But my experiments show that the SCM does absolutely nothing to time the sending of control codes. In fact, the SCM does nothing to ensure that a service receives control codes properly. Here’s what I mean: while a service is already paused, send the service a SERVICECONTROL_PAUSE code. You won’t be able to do this with the Services Control Panel applet because it will see that the service is paused and will disable the Pause button. But if you use the SC.EXE command-line utility, there is nothing stopping you from sending a pause code to a service that is already paused. I would have expected the SCM to report failure to the SC.EXE utility, but the SCM simply calls the service’s Handler, passing it the SERVICE_CONTROL_PAUSE code.

I have seen many services written that don’t deal with the possibility of the same control code coming into the service multiple times in a row. For example, I saw a service that closes the handle to a named pipe when the service is suspended. The service then proceeded to create another kernel object that coincidentally got the same handle value as the handle of the original named pipe. Then, the service received another pause control code and called CloseHandle, passing the handle value of the old pipe. Since this value happened to be the same as another kernel object’s handle, the new kernel object was destroyed and the rest of the service started failing in strange and mysterious ways. I can’t tell you how much of a pleasure this was to debug.

To fix this problem, when you get a stop, pause, or continue code, check first to see if your service is already in the desired state. If it is, don’t call SetServiceStatus and don’t execute your code to change states—just return.

Here is another thing that I see done in services all the time: when the Handler function receives a SERVICE_CONTROL_PAUSE code, the Handler calls SetServiceStatus to report SERVICE_PAUSE_PENDING. Then, the Handler calls SuspendThread to pause the service’s thread, and the Handler calls SetServiceStatus again to report SERVICE_PAUSED. This avoids the race conditions because all of the work is being done by one thread, but does suspending the service thread pause the service? Yes, but what does it mean to pause a service? Well, it depends on the service.

If I’m writing a service that processes client requests over the network, to me pause means that I’ll stop accepting any new requests. What about the request that I may be in the middle of processing right now? Maybe I should finish this one so my client doesn’t hang indefinitely. If my Handler function simply calls SuspendThread, the service thread may be in the middle of who knows what; maybe it’s inside a call to malloc trying to allocate some memory. If another service running inside the same process also calls malloc, this other service gets suspended too. This is certainly not what I want to happen!

Here’s another one: do you think you should be allowed to stop a service that is paused? I do, and apparently Microsoft thinks so too because the Services Control Panel applet allows me to select the Stop button even when a paused service is selected. But how can I stop a service that is paused because its thread has been suspended? Please don’t say TerminateThread.

From my experience, the best way to deal with this whole mess is to have one thread handle everything, and that thread should be the service thread—not the Handler thread. When the Handler function gets the control code, it should immediately use an interthread communication mechanism to queue the control code to the service thread, and then the Handler function should just return. The Handler function should never ever call SetServiceStatus. This way, the service stays in control of everything. There are no race conditions, the service decides what it means to be paused, the service can allow itself to be stopped when paused, the service decides what interthread communication mechanism is best for it, and the Handler code must simply conform to that mechanism.

The only downside to this technique is that a service is supposed to acknowledge the receipt of some control codes shortly after receiving the code. If the service thread is busy handling a client’s request, the control code may sit in the queue and SetServiceStatus may not be called in time. If you don’t call SetServiceStatus in time, the SCP that sent the notification may think your service has failed and report a message box to the user. However, the service has not failed and will not be terminated. The service just processes the control code when it gets around to it. Really, the SCP is indicating the wrong status to the user.

Obviously, this cannot be overlooked since the user will blame you, the service writer, not the SCP writer. So what can you do in the service to prevent this problem? Simple: make your service run efficiently and quickly and always try to keep a thread waiting to handle control codes.

By the way, placing the initial call to SetServiceStatus inside the Handler function doesn’t really solve this problem. Say that inside the Handler function you set the service’s state to SERVICE_START_PENDING, and you specify a wait hint of 5000 milliseconds before calling SetServiceStatus. There is absolutely no guarantee that the service thread will wake up and process the rest of the control code within 5000 milliseconds. If the service thread doesn’t catch the code in time, again the SCP will think that your service has failed.

Conclusion

I trust that I have left you with an introductory overview of services and an understanding of their interaction with the operating system, the user, and control programs. With the service design and development background from this article, I will expand on this understanding in a future article, in which I’ll examine code for a service, client, and control program in a working sample. Further details on services and security can be found on the Platform SDK and the MSDN Library Online (www.microsoft.com/msdn/).