Another factor is the transition away from the Front Side Bus architecture that has been a staple of Intel-based PC and workstation designs for years. In its place, Quick Path Interconnect (QPI) -- Intel's answer to AMD's HyperTransport -- places a memory controller on the same die as the CPU, allowing the latter to directly access physical memory. The net result is much faster access to memory that is local to a particular CPU core and, when combined with a Level 3 cache, improved performance for juggling workloads across multiple CPUs.
Together, the NUMA and QPI advancements have served to drive the Intel architecture forward. However, they would both be for naught without support from the OS. Which is why the extensive multicore tuning that went into the Windows 7 kernel is so important -- without it, Windows users would be unable to leverage the performance-enhancing features of the latest Intel (and AMD) CPUs. In other words, to make the most of today's smarter CPUs, you need a smarter OS. (See also: "How Intel Nehalem processors and Windows 7 work together.")
Windows XP is a great operating system. It has proven itself over nearly a decade of continuous use. However, when compared against Windows 7's sophisticated multicore support, XP is a bit of a dim bulb. The XP kernel still adheres to the Symmetric Multiprocessing (SMP) worldview of pre-millennial Windows NT, and this hamstrings the OS when dealing with modern, NUMA-based hardware. It's like the old joke about the man with only a hammer on his tool belt: To XP, every multiprocessing problem looks like a nail.
Windows 7, by contrast, has a more nuanced worldview. For example, it understands the difference between a discrete CPU and multiple cores within a single CPU. Windows 7 also has a basic grasp of NUMA design principles -- specifically, how groups of cores within a single processor should be treated as a functional node and how processor affinity can directly affect application performance in a multiprocessor environment. Together, these newfound processor smarts allow the Windows 7 kernel to better manage the underlying hardware fabric, taking into account logical and physical CPU layout as it schedules threads and allocates memory.
Windows on multicore: OfficeBench results (dual core)
|Windows XP||1.57 seconds||5.94 seconds|
|Windows Vista||3.02 seconds||11.77 seconds|
|Windows 7||3.42 seconds||8.18 seconds|
Windows on multicore: OfficeBench results (quad core)
|Windows XP||0.43 second||4.49 seconds|
|Windows Vista||0.51 second||7.45 seconds|
|Windows 7||0.51 second||7.14 seconds|
Windows on multicore: OfficeBench results (eight cores)
|Windows XP||0.25 second||4.33 seconds|
|Windows Vista||0.21 second||2.03 seconds|
|Windows 7||0.17 second||1.56 seconds|