Minimizing Buffer Copies

Under most UNIX implementations, every sockets I/O request results in a buffer copy to or from a system buffer. This generally simplifies things for applications, but at the cost of extra CPU overhead to perform the buffer copy.

Under Windows NT and Windows '95 operating systems, Windows Sockets attempts to avoid buffer copies whenever possible. However, application actions play a big role in how often buffer copies must be performed.

On the sending side, the rules for buffer copies are simple. By default, Windows Sockets always copies the user buffer into an intermediate buffer before passing it on to the transport protocol. This is because the transport protocol must hang on to the data until it receives an acknowledgment from the remote end that the data has been successfully received. However, most applications want their send() call to complete immediately so that the thread can proceed with other work.

Applications and services which require extremely high performance can, at the cost of some complexity, avoid the send-side buffer copies. To avoid the buffer copies, set the SO_SNDBUF socket option to 0, which tells Windows Sockets not to do send buffering. The application must then use an overlapped WriteFile() call to send data. These WriteFile() calls will not complete until the transport protocol is done with the data, so it is important to keep data available to the transport protocol by pending multiple I/O calls at any one time. For example, the application calls WriteFile() which returns ERROR_IO_PENDING. Before I/O completion is signaled, the application should, assuming that there is more data to be sent, can WriteFile() a second time. As soon as the first WriteFile() completes, the application calls WriteFile() again, giving yet more data. This "double-buffering" ensures that there is always data available in the transport protocol, so that the pipe never empties.

On the receiving side, the rules are a little different. Windows Sockets will copy incoming data directly into a user buffer if one is available. If there is no available user buffer, Windows Sockets is forced to copy into a system buffer, then into a user buffer when one eventually comes in.

A user buffer is available whenever a recv() or ReadFile() is outstanding on a socket. Therefore, if an application calls recv() in advance of data arriving, the data will be placed directly into the user buffer. This is another reason to avoid the select() call: applications which call select() usually do not have any user buffers available to the system, so Windows Sockets is forced to perform buffer copies whenever data comes in.

Double buffering can also be used on the receive side to avoid buffer copies. On the receive side, the application should pend two ReadFile() calls so that two user buffers are available to the system. Then, when data arrives, it does directly into one of the buffers. While the application is processing that data, there is still another buffer to receive incoming data, thereby avoiding the window where two chunks of data arrive back-to-back, the first going into a user buffer and the second into a system buffer.