Cache Coherency

In the previous case, several processes were competing for the same resource. But resource contention occurs even among multiple threads of a single process. Threads within a process share and contend for the same address space and frequently are writing to the same memory location. Although this is a minor problem for single-processor configurations, it can become a bottleneck in multiprocessor systems.

Unfortunately, you can't see cache and memory contention directly with Performance Monitor because these conflicts occur at hardware level (where no counters exist). You can, however, get indirect evidence based on response time and total throughput: The processors simply appear to be busy.

In multiprocessor systems, shared memory must be kept consistent: that is, the values of memory cells in the caches of each processor must be kept the same. This is known as cache coherency. The responsibility for maintaining cache coherency in multiprocessor systems falls to the cache controller hardware. When a memory cell is written, if the cache controller finds that the memory cell is in use in the cache of any other processors, it invalidates or overwrites those cells with the new data and then updates main memory.

Two frequently used update strategies are known as write-through caching and write-back caching:

In write-through caching, the cache controller updates main memory immediately so that other caches can get the updated data from memory.

In write-back caching, the cache controller doesn't update main memory until it needs to reuse the memory cell. If another cache needs the data before it is written to main memory (which is more likely with more threads), the cache controller must obtain the data from the cache of the other processor. That processor's cache must listen in on bus requests and respond before main memory recognize the call.

Write-back caching usually causes fewer writes to main memory and reduces contention on the memory bus, but as the number of threads grows and the likelihood that they will need shared data increases, it actually causes more traffic and resource contention.

Resource sharing and contention is much more common than isolated processing. Although ample processors exist for the workload, they must share the single pool of virtual memory and contend for disk access. There is no easy solution to this problem. However, it demonstrates the limits of even the most sophisticated hardware. In this situation, the traditional solutions to a bottleneck—adding more processors, disk space, or memory—cannot overcome the limitations imposed by an application's dependence on a single subsystem.