One of nVidia's big marketing campaigns this year has revolved around raising the profile of General Purpose Graphics Processing Unit (GPGPU), the task of performing non-3D-graphics related computation on a GPU. So, with the launch of GT200, nVidia was keen to emphasise the superior GPGPU abilities of its latest chip.
GPUs in general are ideally suited to performing parallel computing tasks, like image manipulation and video conversion, because all those shaders can also be put to task as mini CPUs. Individually they may pale in comparison to a proper CPU but when you've got 240 of them, as you have in GT200, the sheer brute force number of shaders will easily out perform any CPU. The big problem at the moment is that writing software to take advantage of parallel processing, and particularly parallel processing on a GPU, is very difficult. This is what prompted nVidia to start working on its CUDA Software Development Kit (SDK), which Hugo recently talked about, and which makes programming for GPGPU considerably easier for the coder.
As well as CUDA as a general programming platform, though, nVidia also recently bought PhysX, the physics processor company, and integrated its technologies into the CUDA SDK. This means that nVidia GPUs can now be put to task in creating life-like physical effects as well as life-like visual effects.
Also, with CUDA being supported by all nVidia's GPUs since the 8800 GTX, it now has a massive installed user base of 70 million. This has prompted some pretty big name developers to sit up and pay attention to CUDA, including Adobe - it will be using GPU acceleration in its upcoming versions of Photoshop and Premier.
Of course, AMD has also been developing its own competitor to CUDA, in the form of its Close To Metal (CTM) SDK. However, this has had significantly less enthusiastic uptake. Even so, with AMD also planning to support the Havok physics engine and not yet jumping on board with PhysX, the true state of play with regards GPGPU is all very up in the air and personally I'd take the whole thing with a pinch of salt for the time being. That said, for those that are interested, GT200 has made some significant improvements over nVidia's previous efforts.
Looking at GT200 with regards GPGPU and you have a chip that takes on the following look. The TPCs become mini 24-core processors, each with its own little stores of cache memory. A Thread Scheduler distributes the massive compute load between all the various TPCs and the frame buffer memory acts like main system memory.
Now in its briefings nVidia went into a lot of detail about why GT200 is better than every nVidia GPU that came before when it comes to GPGPU. However, a lot of the improvement is simply down to the basic increase in processing units rather than any grand new designs. The result is an increase from 518 GigaFLOPs of processing power on G80 to 933 GigaFLOPs on GT200.
That said, there are a few architectural improvements as well. First, thread scheduling has been improved to allow dual-issue MAD+MUL functions to be performed more efficiently. Also double precision (64-bit) calculations are now supported though these rely on 30 (one per SM) dedicated double-precision processors rather than using the SPs themselves, resulting in double-precision performance that is one twelfth that of single-precision (32-bit). Four ‘Atomic' units have also been added. These are designed to handle particular atomic read-modify-write commands with direct access to memory, rather than utilising the chip's own caches.
All told, though, it's a lot of stuff that involves very complicated mathematics, and optimisations thereof, and right now little of it is relevant to the general consumer. When mainstream GPGPU applications begin to hit the shelves we'll come back to these issues and see what difference they really do make.
So, with all that theory out the way, let's look at the first consumer card based on GT200, the GTX280.