Refine search for Graphics
Intel Larrabee: An Introduction
| Author | Hugo Jobling |
| Published | 4th Aug 2008 |
The problem that Intel sees itself trying to solve with Larabee is that of offering both the programmability of a traditional CPU with the throughput performance of a traditional GPU in the same package. When first considering this, rather than consider creating an entirely new architecture, Intel instead took a look at how much it would have to change its current chips to improve throughput performance.
As a purely theoretical exercise Intel’s engineers considered what kind of performance could be achieved using the same die size and power of a current dual-core CPU, but with an architecture optimised for throughput performance. In this particular case Intel decided to focus on specifically vector processing, as it is a heavy component of graphics applications.
What Intel came up with was a chip with 10 in-order cores capable of issuing two instructions per clock, as opposed to a desktop CPUs two out-of-order cores issuing four instructions per clock, and a 16 lane wide VPU Vector Processing Unit (VPU) in each core. The result was peak vector throughput of 160 per clock, versus the eight on a Core 2 Duo; a five time performance increase within the same die area and TDP.
As suggested already, it isn’t necessary to create an entirely new architecture in order to create the kind of core best suited for graphics processing. To that end, in creating Larrabee Intel decided to start with a CPU core it already had and tweak it to get the GPU core it wanted. Larrabee is, at heart, derived from the original Pentium processor design although you can hardly tell to look at it.
As the block diagram above shows, Larrabee does still look more like a multi-core CPU than a GPU, such as the nVidia GeForce GTX 280, although there are some similarities too, as you would expect. Notable features of the Larrabee design include a 1,024 bits wide ring bus (512 bits, bi-directional) over which all the components can communicate, dedicated texture units and a fixed-function unit.
Larrabee uses a shared L2 cache hierarchy whereby each core has an allocated area to which it has read/write access and every other core has read access, allowing for data sharing among multiple cores. Coupled with the wide, high-speed ring bus this provides both high bandwidth and fast access to fixed-function blocks
Latest 4 of 13 Comments
Have your say: Leave a comment below about this article.
Hugo said on 4th August 2008
Triple said on 4th August 2008
Hmm, we'll see. I hope not, but it's far harder to push a software change then a hardware one. I'm crossing my fingers for raytracing here, hoping it won't be T... more
Ed said on 5th August 2008
Ray tracing in games is still a long way off, of that there can be no doubt. I'm conscious of the risk of sticking my neck out too much but I'd also say that the first ge... more
Ardjuna said on 5th August 2008
Hey Triple, would be interested in taking a look at your source (I'm fluent in Dutch, so that's not a problem).
See all 13 comments on this article.
Add your comment
You must be logged in to comment. Login or register here.


13 comments
Email this to a friend
TrustedReviews Newsletters
I'm not disputing that Larrabee's architecture could be utilised for ray-tracing, rather with the suggestion that that's what it's purpose was.
R... more