In designing its graphics card, Intel looked at some of the problems with traditional rendering pipelines. As nVidia and ATI discovered when creating their unified shader model cards, original GPUs’ fixed pipelines often ended up being extremely inefficient. This problem arises from the fact that not every graphics application is created equal. For arguments sake, we’ll look at games because, let’s face it, that’s what most of us buy a graphics card for.
When rendering frames, different games end up stressing different areas of the graphics card. F.E.A.R., back when it was wowing us with its visual splendour, had a graphics engine that was very heavy on rasterisation, whereas a more recent game, such as Gears of War, is almost entirely pixel shade biased. Even within the same game, different frames can vary massively in what part of the graphics card’s pipeline is being taxed.
What Intel decided then was that rather than use a partially fixed function, partially programmable pipeline as seen on today’s GPUs, Larrabee would do things differently. In theory, it would be possible to go back to where we started and use any software rendering process that the developer chooses to, offloaded onto Larrabee. However, in real terms it makes more sense to use a standard interface, especially if you want to maintain cross-compatibility or, to put that another way, if you want to compete with rival products.
Therefore what Intel did was to set Larrabee up such as each fixed function stage of DirectX and OpenGL, the two most commonly used graphics APIs, is replicated by Larrabee. The upshot of this is that in theory Larrabee shouldn’t suffer from bottlenecking at any stage of the pipeline.
In terms of actually rendering frames, then, Larrabee uses what is referred to as a binning renderer; bin being, basically, set of data which is to be worked on in some way. In this particular context, each bin corresponds to a tile which is a grid of pixels on your screen, and contains a list of rendering operations. This method of rendering allows the card to render these tiles in parallel as each is independent of its neighbours.
The binning renderer itself is split into a front end, which deals with vertices and a back end, which deals with pixels. This front end features a vertex shader and a geometry shader which can process 16 vertices and 16 primitives (such as spheres, cubes or teapots) at a time respectively. The front end also deals with rasterisation before passing its results into a bin set for the back end to deal with.
Rendering in the back end uses the Larrabee core’s ability to handle four threads simultaneously by using one as a setup thread which tells the other three work threads what to do. These work threads deal with early and late Z-culling (deciding if a rendered object will be visible and, if it won’t, not rendering it), pixel shading (determining the pixel’s colour - more complicated than it sounds) and alpha blending (transparency effects). Each tile’s data is stored in the core’s local L2 cache, allowing the texture unit fast access to said data via the ring bus and de-coupling its operation from pixel operations.
Larrabee also has a couple of other tricks up its sleeve as such. One of these is context switching, that is, allowing data in the cache not currently being worked on, but which will be wanted later, to be saved to memory to free up cache space. The use of a software, rather than hardware task scheduler also means that resource scheduling can be tailored to each workloads specific needs.