- Review Price: £650.00
Last April, I was in Beijing, attending Intel’s first Intel Developer Forum outside of the US. There, CEO Paul Ottelini announced that Intel was on schedule to move from a 0.065 micron process down to a 0.045 micron process. Intel has ramped up its 45 nanometre production in two factories – Fab D1D one Oregon and Fab 32 in Arizona. A third, Fab 28 in Kiryat Gat, Israel, will come online in the first half of 2008 and a fourth, Fab 11X in Mexico, is expected to come online sometime in the second half of 2008.
On November 12th you’ll be able to buy the new flagship consumer Intel processor, the Core 2 Extreme QX9650 – a 3GHz quad-core processor with 12Mb of Level 2 cache, on a 1,333MHz FSB. If you’re in the flagship processor buying kind of mood then expect to pay around £650 – £700 for it.
What’s significant about the launch of the QX9650 for Intel is that it is the first to be based on the new 0.045 micron process. While moving to a new micron process is normally par for the course, the move to 45 nanometres (nm) is big news from an engineering perspective, so much so that Gordon Moore, co-founder of Intel, referred to it as the biggest breakthrough in process technology for a generation. This is a bold claim indeed and we’ll go into more detail a little later. For the consumer though, it’s business as usual, with the next gen offering more performance, for the same or even less money as the previous gen.
The first 45nm part was announced at IDF in Beijing in April 2007 as Penryn. This is a refinement of the current based Core 2 Duo, which has so dominated the market since its release. This would not be an entirely new as Intel would never combine moving to a new architecture and moving to a new micron process – it would never take that kind of risk, as doing both at once would create too many opportunities for things to go wrong.
Instead, moving forward it intends to only move to a new micron process with a familiar architecture, introducing relatively minor enhancements and improvements, and only introducing brand new microprocessor architectures on a tried and tested micron process. Intel refers to this as ‘Sustained Technology Cadence, or informally, as Tick/Tock.
Hence, we have a very clear picture of what the future holds for Intel. The current Merom and Conroe Core 2 Duo processors are being replaced by the improved Penryn , based on a 45 nanometre (nm) process, while a radical all-new architecture called Nehalem will come along in 2008, again based on 45nm. Then in 2009 you’ll get the improved version called Westmere (32nm) and then another new architecture in 2010 called Sandy Bridge, again on 32nm.
But what is the actual significance of moving to a new micron process? The key advantage is that it produces smaller chips, so more of them can fit on a wafer, which reduces Intel’s costs – which is then usually passed onto the consumer. It also reduces heat and makes it possible to fit more transistors onto the die increasing performance. Specifically, with the move to 45nm we are told we can expect a 20 per cent increase in performance, a 10 times reduction in leakage and a 2x improvement in transistor density.
The breakthrough that has enabled Intel to become the first manufacturer to build a 45nm part is developing High-K Metal Gate transistors. This was necessary to ensure that Moore’s law could continue, which states that transistor counts on mainstream CPUs would double every two years.
As Intel’s own presentation slide has it, “High-k + metal gate transistors are the biggest advancement in transistor technology since the introduction of polysilicon gate MOS transistors in the late 1960s”.
The problem is one of current leakage. Previously up to 65nm technology, Intel has used a silicon dioxide (Si02) gate dielectric inside its transistors. What is a gate dielectric? A dielectric is an insulator, a material that does not transmit electricity, which is necessary in a transistor to be able to turn the current on and off, which is what enables it to function as an electronic switch.
Silicon dioxide has traditionally been used inside processors, but as the manufacturing process hit 65nm, the gate diecletric was a mere five atoms thick, any thinner and it would start causing significant current leakage. This would mean than the transistors would not be able to achieve a fully off state, so there would be little difference between idle state and full power state, making for a very inefficient design. It’s also pretty stonking difficult to make a wall less than five atoms thick. (Trust me, I’ve tried).
To get round these problems, Intel has replaced the silicon dioxide used in the gate dielectric with a thicker hafnium-based high-k material. ‘High-k’ refers to a dielectric material with a high dielectric constant,ie. one that insulates consistently and effectively. Intel claims that this reduces the current leakage by more than ten times over the silicon dioxide that was used in the past. This means that you can start increasing the transistor count and keep the processor small and power efficient.
Intel hasn’t revealed what materials it has using to make this high-k material, presumably in a bid to delay its competitors in catching up. Just to keep things moving along at a pace, Intel already has very early SRAM 32nm parts up an running, though as already mentioned, we won’t actually see a processor based on this until 2009.
Intel has created two new processors on the new 45nm process. Penryn, as I said earlier is an enhancement and refinement of the excellent Core architecture that powers current Core 2 Duos, while the other is Silverthorne, which is a low power CPU being aimed at the new Ultra Mobile PCs that should appear next year.
The desktop version of Penryn will be a native dual-core processor, known as Wolfdale. The quad-core variant, essentially two Wolfdales bolted together, is called Yorkfield. The Extreme model, a Yorkfield XE, with an unlocked multiplier, is being released on November 12th and is the processor we were sent for review. This is based on a 1,333MHz front-side bus, but we can expect faster clocked versions, based on a 1,600MHz front-side bus to appear later.
The 45nm die for the dual-core Wolfdale now sports 410 million transistors on a 107mm2 die, a 40 per cent increase over Conroe, which was 291 million transistors on a 143mm2 die. That’s the benefit of die shrinks – more in less.
While it may not be a new architecture, there are several key areas where Penryn improves on Conroe and we’ll now take a look at them in turn.
The first improvements we’re getting to is the one that sounds like a washing powder. Radix-16 Divider. Simple Dynamic Execution refers to a dealing with instructions in as efficient a way as possible. There are a number of techniques to do this – data flow analysis, speculative execution, out of order execution, and super scalar architecture, all of which were first introduced with the great Pentium Pro, back in 1995. The Core architecture improved on this with Wide Dynamic Execution, which increased the number of instructions clocks per it could deal with.
One aspect of the Wide Dynamic Execution that’s improved is how it deals with Radix computations – an algorithm that commonly used, especially in gaming. Previously this was a Radix-4 divider, working on two bits of data per iteration. Penryn though, has a Radix-16 divider enabling it to work on four bits of data per iteration, exactly doubling performance of this function. The best bit is that it requires no extra coding or new software – it just works.
”’Super Shuffle Engine”’
The SSE instructions that have been present in Intel processors since Pentium III, require a lot of operations on the data. This can now be done in a single clock cycle as the engine is now 128-bits wide, matching the length of SSE instructions – double that of Conroe.
”’Advanced Smart Cache”’
While the two cores of Wolfdale will share a 6MB cache, the Yorkfield shares a phenomenal 12MB cache. The increased performance of the larger caches comes from 24-way associativity, up from the 16-way of Conroe. Associativity essentially means there are more ways in an out of the cache, so data can be exchanged quicker. Think more entrances and exits into a large car park, only with data instead of cars. And fewer ticket machines.
”’Smart Memory Access”’
There’s still an inefficient memory controller hub looking after things on Yorkfield – Intel isn’t moving to an integrated controller a la AMD until Nehalem. To help hide the effects of this Intel uses a Memory Order Buffer, that employs smart prefetching algorithms that enable the chip to intelligently re-order instructions to reduce execution time in respect to load – Intel calls this Memory Disambiguation. The algorithm makes an educated guess about whether or not the chip can move on to the next set of instructions.
”’Advanced Digital Media Boost – SSE4.1”’
This is the big one for Penryn – the introduction of SSE 4.1 (Streaming SIMD Extensions), which adds 49 new instructions, the most added to SSE since 144 were added to the Pentium 4. These are focused on improving video encoding and decoding, and gaming. The best thing is that applications such as DivX 6.7 are already geared up to take advantage of these as well as the Japanese version of TPMPEG and we can expect other codecs to follow.
”’Deep Power Down”’
Deep Power Down technology will only be applicable to the mobile CPUs, which will simply be called Penryn, and is designed to really increase battery life through smart power saving techniques. This is especially important as quad-core comes to the mobile space.
Inevitably, all four cores are unlikely to be used at the same time. Therefore it’s necessary to be able to put the parts of the CPU that aren’t being used into a deep low power state, so that they are not a burden on the battery, while also being able to wake up quickly when they are needed. Intel claims that Deep Power Down enables the unused core to idle at with nearly zero power draw, which is quite an achievement.
There are five different power states ranging from full tilt to boring PowerPoint presentation mode, where the core is virtually asleep. We covered these here so I won’t recap the whole thing again.
The deepest power stage is C6 and when this occurs the micro-architectural state is put into storage and the voltage is dropped. The only power goes to the SRAM where the state is stored and the IO ring that communicates with the chipset so it knows when to wake up.
The C6 state only occurs on a die basis – so in a dual-core mobile CPU it would only drop into C6 when there was no activity whatsoever. However, as Yorkfield is still two Wolfdale’s stuck together, it would mean when only two cores were in use the other two could fall into C6 state – really saving power. You’re quad core would only be more of a drain than a dual-core then, when it really needed to be.
In our testing we put up the CPU against a whole host of others, both Intel and AMD to see how it would fair.
For our test system we placed the Intel CPUs in a Asus P5E3 Deluxe Wi-Fi AP motherboard, with a Seagate 7200.11 160GB drive, with 2GB or Corsair 1,800MHz DDR3 and a GeForce 8800 Ultra. We used Windows Vista Ultimate (64-bit) with nVidia 163.69 WHQL drivers.
For AMD CPU we used the same setup except the motherboard was an Asus M2N32-E SLI Deluxe.
We tested with the following CPUs.
• Intel Core 2 Extreme QX9650 – 4×3.00GHz, 2x6MB L2 cache, 1,333MHz FSB, 45nm, Yorkfield
• Intel Core 2 Extreme QX6850 – 4×3.00GHz, 2x4MB L2 cache, 1,333MHz FSB, 65nm, Kentsfield
• Intel Core 2 Quad Q6700 – 4×2.67GHz, 2x4MB L2 cache, 1,066MHz FSB, 65nm, Kentsfield
• Intel Core 2 Quad Q6600 – 4×2.40GHz, 2x4MB L2 cache, 1,066MHz FSB, 65nm, Kentsfield
• Intel Core 2 Duo E6850 – 2×3.00GHz, 4MB L2 cache, 1,333MHz FSB, 65nm, Conroe
• Intel Core 2 Duo E6750 – 2×2.67GHz, 4MB L2 cache, 1,333MHz FSB, 65nm, Conroe
• AMD Athlon 64 X2 6400+ – 2×3.20GHz, 2x1MB L2 cache, 2GHz HTT, 90nm, Windsor
• AMD Athlon 64 X2 6000+ – 2×3.00GHz, 2x1MB L2 cache, 2GHz HTT, 90nm, Windsor
With these we ran through a number of tests to see if we could pick up on any weaknesses in both real world games, and synthetic tests.
Starting with Half Life 2: Episode 2 tests and the new and highly entertaining Team Fortress 2, both based on the Source engine, the QX9650 sits happily at the top of the pile. Team Fortress 2 is clearly more CPU intensive than Episode 2 and the faster CPU clearly pays more dividends in terms of frame rate, especially once the resolution gets cranked up.
In World in Conflict the QX9650 is sitting pretty at the top again. It’s the minimum frame rate that of most interest – note the difference between today’s mainstream quad-core champion – the 6600 – and the QX9650. On the 6600, the frame rates don’t change between 1,024 x 768 and 1,920 x 1,200 – it’s clearly totally CPU limited, so big fans of this game will get a boost from a CPU such as the QX9650.
In Enemy Territory, we can see that the top four CPUs are all ones with a 1,333MHz front-side bus speed.
PCMark Vantage is the new overall test from FutureMark and has a range of application and graphics tests inside it. It’s a good one stop shop for testing. No surprises here – the 9650 is on top of the pile.
WinRAR is one of those utilty apps that you’ll be using all the time, so the faster the better and the QX9650 chalks up another victory.
DivX 6.7 is a real number cruncher. We enabled SSE4 where relevant and the test involved a 2-pass encode of a 276MB MPEG-2 digital TV recording. Here the QX9650 really flew, even over the previous Extreme, the QX6850. If you do a lot of encoding you should certainly be looking to move to Penryn.
Finally, we have the LAME MP3 encoding using either the Intel compiler with SSE3 enhancements and the more AMD friendly Microsoft compiler. Either way, our new best friend is still top.
As for overclocking, if you’re into that kind of thing we found that with a good Zalman CNPS9700 heatsink and upping the voltage to 1.435V we actually hit a stable 4.35GHz! This is admittedly on a engineering sample so it might not be quite that good at retail but it’s a good indicator that there’s headroom and how efficient the new micron process is.
There’s no doubt that the QX9650 is a very impressive CPU that convincingly faster than anything that’s out there right now. Clearly the price places it out of the range of all but the most dedicated and well-heeled enthusiasts but it serves its purpose as a truly impressive flagship processor for Intel. And perhaps even more importantly it bodes well for next year’s more affordable variants. The only question now is whether AMD can muster a suitable response with Phenom.
Score in detail