Taking a further step back we a greeted with the following diagram.
Here we see there are ten TPCs making up the shader-power of the chip in a section that nVidia calls the Streaming Processor Array (SPA). In G80 and G92, the SPA consisted of only eight TPCs, which resulted in the total of 128 SPs. With the increase in SPs per TPC and the increase in TPCs per SPA in GT200, we end up with a total 240 SPs - quite some increase, I'm sure you'd agree. This increase in the number of TPCs also accounts for the improvement in texturing power mentioned earlier. With two extra TPCs you get two extra blocks of texturing units, making for a total of 32 texture address units and 32 texture filtering units, up from 24 of each on G92 and 12 addressing units and 24 filtering units on G80.
Above the SPA is the shader-thread dispatch logic, which manages the task of splitting up the huge number of calculations into TPC sized chunks, as well as the raster / setup engine.
Below the SPA are eight ROP partitions, which handle per pixel tasks like anti-aliasing and alpha-blending. Each ROP partition can process four pixels per clock making for a total of 32 pixels per clock for the whole chip. Also, the new ROPs have been tweaked to enable full speed blending (i.e. 32 pixels per clock) - G80 could only blend 12 and output 24 pixels per clock cycle - so antialiasing, particle effects, shadows, and such like should all see a performance increase with GT200. Each ROP has its own little store of L2 cache memory as well as a dedicated 64-bit connection to the frame buffer making for a total memory interface that's 512-bits wide. Or in other words, colossal!
Elsewhere there have been a number of tweaks applied that include improved geometry shading and Z-occlusion culling performance. Communication between hardware and driver has also been improved reducing potential bottlenecks that could impact performance.
All told, these changes result in some pretty impressive raw performance figures. Over G80, shader processing power has increased by 87.5 per cent, texturing capabilities by 25 per cent, and pixel throughput by 33.3 per cent. Compared to the dual-chip cards that ATI and nVidia both recently introduced, some of the numbers seem less impressive but there are two things to consider here. First, figures for the dual card solutions assume a perfect doubling up of performance from the two chips involved, which rarely is the case in real life. Secondly, these dual-chip cards only offer performance increases for the games with which they work properly whereas single chip solutions like GT200 will give you a guaranteed level of performance.
So, that's the logical picture but how does it all relate to that ginormous chip we saw earlier? Well, take a look below.
This is a picture of the GT200 with the various compute sections highlighted. The unmarked section in the middle performs a variety of roles but primarily it is concerned with managing the rest of the chip so includes things like the thread scheduler and raster setup.
Finally, one note about DirectX10.1. Put simply, GT200 doesn't support it, which is a shame. Although 10.1 is only a small tweak that doesn't bring any new features to Microsoft's gaming API it does improve efficiency, and thus performance, in certain situations. The only thing in nVidia's favour here is that few developers are yet utilising these new tweaks. However, this won't be the case forever. We will just have to wait and see how this one pans out.