Refine search for Graphics

Intel Larrabee: An Introduction

Author Hugo Jobling
Published 4th Aug 2008
Intel Larrabee: An Introduction
Bookmark and Share discuss this article  13 comments    Email  Email trustedreviews newslettersTrustedReviews Newsletters

The problem that Intel sees itself trying to solve with Larabee is that of offering both the programmability of a traditional CPU with the throughput performance of a traditional GPU in the same package. When first considering this, rather than consider creating an entirely new architecture, Intel instead took a look at how much it would have to change its current chips to improve throughput performance.

As a purely theoretical exercise Intel’s engineers considered what kind of performance could be achieved using the same die size and power of a current dual-core CPU, but with an architecture optimised for throughput performance. In this particular case Intel decided to focus on specifically vector processing, as it is a heavy component of graphics applications.

Click for full size
Click to enlarge

What Intel came up with was a chip with 10 in-order cores capable of issuing two instructions per clock, as opposed to a desktop CPUs two out-of-order cores issuing four instructions per clock, and a 16 lane wide VPU Vector Processing Unit (VPU) in each core. The result was peak vector throughput of 160 per clock, versus the eight on a Core 2 Duo; a five time performance increase within the same die area and TDP.

As suggested already, it isn’t necessary to create an entirely new architecture in order to create the kind of core best suited for graphics processing. To that end, in creating Larrabee Intel decided to start with a CPU core it already had and tweak it to get the GPU core it wanted. Larrabee is, at heart, derived from the original Pentium processor design although you can hardly tell to look at it.

As the block diagram above shows, Larrabee does still look more like a multi-core CPU than a GPU, such as the nVidia GeForce GTX 280, although there are some similarities too, as you would expect. Notable features of the Larrabee design include a 1,024 bits wide ring bus (512 bits, bi-directional) over which all the components can communicate, dedicated texture units and a fixed-function unit.

Larrabee uses a shared L2 cache hierarchy whereby each core has an allocated area to which it has read/write access and every other core has read access, allowing for data sharing among multiple cores. Coupled with the wide, high-speed ring bus this provides both high bandwidth and fast access to fixed-function blocks

 

Newsletters

Register to receive the latest Reviews and News Headlines directly to your Inbox every day, and enter our regular competitions. More Info.

Your Name


Email Address


Latest 4 of 13 Comments

Have your say: Leave a comment below about this article.

comment Hugo said on 4th August 2008

I'm not disputing that Larrabee's architecture could be utilised for ray-tracing, rather with the suggestion that that's what it's purpose was.

R... more

comment Triple said on 5th August 2008

Hmm, we'll see. I hope not, but it's far harder to push a software change then a hardware one. I'm crossing my fingers for raytracing here, hoping it won't be T... more

comment Ed said on 5th August 2008

Ray tracing in games is still a long way off, of that there can be no doubt. I'm conscious of the risk of sticking my neck out too much but I'd also say that the first ge... more

comment Ardjuna said on 5th August 2008

Hey Triple, would be interested in taking a look at your source (I'm fluent in Dutch, so that's not a problem).

See all 13 comments on this article.

add comment Add your comment

You must be logged in to comment. Login or register here.