Refine search for Graphics
Intel Larrabee: An Introduction
| Author | Hugo Jobling |
| Published | 4th Aug 2008 |
Delving deeper, into the cores themselves we see one important difference from any GPU currently on the market because Larrabee is fully x86 capable. What that means in real terms is that whereas nVidia and AMD’s cards can only run the APIs (OpenGL and DirectX) they have been designed to, Larrabee can run any code that can run on any other x86 processor. Each core, then, boasts a separate scalar and vector unit, each with its own dedicated register, coupled in turn to a shared L1 cache which finally backs onto its own 256MB L2 cache local subset.
Each core supports four separate execution threads with separate registers per thread, allowing for a short, efficient in-order pipeline but without sacrificing the latency-hiding benefit of a more complex out-of-order pipeline. The vector unit itself has a 16 lane wide vector ALU controlled by mask registers, which maintain data flow control, in turn enabling the mapping of a separate execution kernel to each of these VPU lanes. Basically, this makes the vector unit extremely efficient at crunching through maths operations.
The VPU instruction set supports numeric type conversion, which can cause a lot of slowdowns, on the cache read and write cycles and also allows data replication and lane rearrangement on the register read cycle. Notably, Larrabee also supports both 32-bit (single precision) and 64-bit (double precision) floating point data.
Unlike a typical GPU, Larrabee’s cores offer such features as context switching and pre-emptive multitasking, basically helping each core make the most efficient use of each available clock cycle. Other differences Intel highlights include the addition of a ring bus for inter-block communication, low-latency L1 and L2 caches and the removal of most fixed-function logic - Larrabee features no hardware setup or rasterisation units, for example.
That said, in one important area, in the context of graphics, Larrabee does still defer to fixed function logic. For texture sampling, Intel concluded that a software implementation was simply far too inefficient, 12 times so for filtering and 40 times if decompression was necessary. Larrabee’s texture logic unit, then, performs standard texture operations such as anisotropic filtering and decompression, communication with the cores via the chip’s L2 cache using 2x2-pixel tiles, 8-bit colour values.
The second fixed function unit Larrabee sports hasn’t had its use specified yet, but basically, just like the texture unit, will do anything Intel decides isn’t efficient enough to perform in code. As an example, given the 2009-2010 launch schedule for retail GPUs, it seems likely that Intel would probably add a hardware tessellation engine to Larrabee, as that feature is being introduced with the DirectX 11 spec.
So, just how does Larrabee deal with current graphics implementations?
Latest 4 of 13 Comments
Have your say: Leave a comment below about this article.
Hugo said on 4th August 2008
Triple said on 4th August 2008
Hmm, we'll see. I hope not, but it's far harder to push a software change then a hardware one. I'm crossing my fingers for raytracing here, hoping it won't be T... more
Ed said on 5th August 2008
Ray tracing in games is still a long way off, of that there can be no doubt. I'm conscious of the risk of sticking my neck out too much but I'd also say that the first ge... more
Ardjuna said on 5th August 2008
Hey Triple, would be interested in taking a look at your source (I'm fluent in Dutch, so that's not a problem).
See all 13 comments on this article.
Add your comment
You must be logged in to comment. Login or register here.


13 comments
Email this to a friend
TrustedReviews Newsletters
I'm not disputing that Larrabee's architecture could be utilised for ray-tracing, rather with the suggestion that that's what it's purpose was.
R... more