How Intel Nehalem processors and Windows 7 work together

The result is better performance, improved power consumption, or both, depending on the number of threads in your application

Intel's Nehalem processor has several features aimed at better management of the processor cores. The Nehalem has four cores, each of which is capable of running two threads simultaneously, using a technology known as symmetrical multithreading (SMT). Depending on whether SMT is enabled, a processor can therefore run either four or eight threads.

SMT is generally configured in the BIOS and can be changed at boot time. Because SMT on the Nehalem processor shares some of the core's resources between threads, turning on SMT does not double processor performance or throughput. Whether your applications run better with SMT turned on or not needs to be determined with specific testing of your applications.

[ See the results of InfoWorld's multithreading tests of Windows 7, Vista, and Windows XP in "New multithreading in Windows 7: How much faster?" ]

For this discussion of how threads work, I'll limit myself to four cores running one thread each -- that is, with SMT disabled. Note that this is the configuration with which Nehalem PCs are generally shipped by vendors.

Intel's Nehalem architecture pays close attention to which cores are actively running code. When a core remains inactive for a determined period of time, the processor turns off the transistors that drive that core. This feature is designed primarily as an energy savings step. The processor can also increase the frequency of the active cores to enable increased performance. This option is known as Turbo Mode; on some systems, Turbo Mode needs to be specifically enabled.

For all this magic to happen, though, operating systems need to cooperate. The key is that a core must remain inactive long enough for the processor to shut off its transistors. This was difficult to achieve prior to Windows 7. Although applications create the threads, it's the operating system's task to schedule them for execution and to assign them to an execution pipeline, such as a core. Applications have no control over the scheduling, and they have limited control over which pipeline is used. Prior to Windows 7, the Windows kernel would schedule a thread to run on any available core without regard to where the thread had executed previously. (If no cores were available, the thread scheduler in Windows would choose one running thread to shut down and swap in the waiting thread. The decision about which thread to preempt is controlled by numerous factors.)

Developers could exert some level of control over the selection of the execution core by using a technique called processor affinity. This capability allows the developer to specify a core for the thread to run on. Most operating systems, including Windows, treat processor affinity as a request rather than a command, so they accept the affinity only if it fits within their scheduling constraints (in practice, they mostly fulfill the request). However, this programming approach is generally discouraged, as it tends to make the scheduler work less efficiently. In almost all cases, the scheduler has far better algorithms for deciding what to run when and where than does the programmer at code-writing time.

Because previous Windows schedulers were indiscriminate about where they scheduled threads, an application using three threads might see those threads constantly rotated through all four Nehalem cores. The result is that the power-savings feature and Turbo Mode would suffer, as no core would remain inactive for very long. Windows 7, however, tends to schedule the threads to run on the same cores, rather than having them hop about; a three-thread application typically uses only three cores and lets Nehalem turn off the fourth core.

This behavior clearly reduces power needs, and it can improve performance in two ways: via Turbo Mode, as described earlier, and via marginally better cache usage. As we see from the benchmarks, however, the difference in performance is marginal when four or more threads are running. The real win is the power savings. On desktop systems, the power savings might not appear terribly important, but they can be very significant on servers and on mobile devices. Mobile users, in particular, will enjoy longer battery life when the Nehalem mobile processors begin appearing in consumer devices.

See additional InfoWorld articles on Intel's Nehalem:

This story, "How Intel Nehalem processors and Windows 7 work together," was originally published at InfoWorld.com. Follow the latest developments in Windows, Windows 7, and Intel Nehalem on InfoWorld.com.

Copyright © 2009 IDG Communications, Inc.