Much as nVidia did with GT200, ATI has taken the basics of its previous generation architecture and added a bit more here, taken away a bit there, and generally optimised the whole lot to better suit modern, real world, performance needs. And, just as was the case with GT200, explaining the differences requires delving fairly deep into how the whole chip works. So, let's start from the bottom and work our way up.
The core grunt work of RV770 is performed by its Stream Processing Units (SPU). These are largely similar to the Stream Processors used in nVidia's latest architecture - they each contain three basic Arithmetic Logic Units that can power through simple floating point mathematical calculations like add and multiply. Even here there are differences in exactly how the two manufacturers perform these calculations but they're similar enough to compare. It's as we zoom further out that the two architectures really start to diverge.
The next step up for the RV770 is what I'm calling the Multi Stream Processing Unit (MSPU). This contains four of the basic SPUs along with a fifth, Special Function Unit (SFU) that can do everything the other SPUs can do plus perform transcendental calculations like logarithms and trigonometrics.
Although the MSPU incorporates a number of processing units it is in many ways only the equivalent of a single Stream Processor from nVidia's architecture. This is because, while an MSPU can perform up to five calculations per cycle, all those calculations have to be on the same thread. So unless the thread can be efficiently broken down the whole MSPU might only be as fast as one of nVidia's SPs. Conversely, this is why nVidia runs its shaders at a much higher speed than ATI - to counter those situations where ATI's many processors can perform calculations faster.
Zooming out another step we see a Single Instruction Multiple Data (SIMD) core that from the outside looks similar to nVidia's Streaming Multiprocessor (SM). Both contain a cluster of processing units, a thread sequencer and a few data stores. However, there are a number of places where the two differ.
For a start, ATI includes (four) texture processing units within the SIMD whereas nVidia does texturing further up the ladder. Each texture unit can perform one address lookup and one filtering operation per clock cycle. While this is essentially the same as RV670 (indeed the whole chip is largely identical up to this point) a few hidden improvements have been made. ATI claims performance per mm^2 has increased by 70 per cent through some untold tweaking. Also, additional L1 texture cache memory has also been dedicated to each SIMD making for a significant increase in texture cache bandwidth.