Asus EN 8800 GTX - nVidia's G80 Performance Revealed
Introducing G80 architecture and DX 10
Published: 21st November 2006 | Source: Asus | Price: |
The last generation of cards saw nVidia slip well behind ATI in terms of image quality, although with their dual-PCB, dual-GPU 7950GX2 they clawed the performance crown back. This was a place that nVidia was not used to being, especially after the success of their 6 series.
DirectX 10 is a technology that we are all waiting for and both nVidia and ATI are producing cards (hopefully) in time for the release. nVidia are the first onto the market with their 8800GTX and GTS. With an architecture almost totally re-worked from the old generation, nVidia have gone all out on a unified architecture.
Asus kindly sent us their 8800GTX for a study on how the card does in real life, but first let us explore G80's features.
Outlining the technology
There's a lot of information out there on nVidia's latest gen of card so I thought I'd try to keep the explanation part simple and concise.
In the 8800GTX nVidia have implemented a parallel, unified shader design consisting of 128 individual stream processors running at 1.35 GHz. As described in my article on ATI's unified shader architecture, nVidia have made a pipeline that processes vertex, pixel, geometry, or physics operations: giving it flexibility and efficiency.

One noticible difference first of all is that nVidia have implemented ZCull before the data enters the stream processors. ZCull is a process that strips the data that you will not see out of the rendering engine. This means that the GPU does not waste time rendering stuff you will never see on screen. Previously this was implemented in post processing, meaning that vital processing power was used to render the unnecessary pixels, which were then culled afterwards.
Let's see why both nVidia and ATI think that a unified architecture was needed to increase the performance of DX10 cards:
DirectX 10 Unified Shaders:

Let's move onto the Unified example. Here in both geometry and pixel workloads the unified architecture excels (in theory) as the unified shader pipelines use their flexibility to render any of the information sent their way. Couple this with dynamic load balancing and you have a mightily efficient architecture that can handle anything thrown at it.
This means that you have a GPU with 128 shader processors each capable of processing pixel, vertex, geometry and physics data.
nVidia are also hoping that the flexible and efficient (and of course hugely parallel) processor in their G80 will mean that other data can also be processed.
DirectX 10
I don't want to go into too much detail with DirectX 10, as this has been covered in one of our previous articles - see here, but I'll just go over why DX10 will also add to the performance increase.
DirectX 10 reduces the CPU overhead by reducing the amount that the CPU gets involved in the rendering process. By cutting out the CPU in the most basic API processes, DirectX 10 means that the time that each object get's rendered is hugely reduced. Let's look at it like this (ATI slide)
So Geometry Shaders can manipulate data...how do the developers use this?
Well basically I'm hoping that developers will use this to do things like stop "pop-up" (of trees/objects etc in the distance). I can see that there would be a huge advantage in using these units to change things like water over distance and adding far superior rendering to characters that are in the periphery of games: such as excellent crowd animation in racing/sports games. This is all my own speculation, but it would certainly be nice to see.
Memory interface mid-process
Also added into nVidia's "Stream" processors is the ability to move data to memory and back again in one pass. This means that you should no longer require data to have two or more passes before it can be outputted. Once again this adds to the picture of added efficiency that nVidia are building up.
Instancing
Shader Model 3 brought far superior instancing than we had seen before. Instancing means that you can render one object and replicate it a whole load of times, creating a fuller effect. This is very useful in trees and grass where you need to replicate basically the same thing many times over.

nVidia have implemented 32-bit floating-point precision for a total of 128bits dynamic-range rendering. They claim this is a quality of accuracy that outperforms film renders
This is enough detail for this article at the moment, but I may do a fuller article on this in the future.
Most Recent Comments
All corrected :)
And yep, it did indeed take a long time :)
[IMG]http://upload.overclock3d.net/dwn/261919555140/cimg3966.jpg[/IMG]
I thought i was getting one with the stock cooler but hey, although there doesnt seem to be any ram sinks which worries me with ocing (I think most ram on 8800GT's are rated to run @ 1000):
[IMG]http://upload.overclock3d.net/dwn/261921444681/cimg3967.jpg[/IMG]
Just a heads up for people if they go with Asus.
Nice review Kemp, might have to oc this bad boy and get it closer to a 8800GTS 512.
So forget about the ATi completely?
My screen is under 1680x1050. :)
Deshman - Where d'you get the cards? The 8800GT Extreme is £180 delivered, which isn't too bad, but I can't find a 8800GTS Extreme for ~£210.
Got them from specialtech but they're out of stock now.
I mean is it really annoiing when playing games like crysis with full load ?
I guess when not playing games it should be no problem right ?
Can please someone be more specific about the noise on the asus card ?
I mean is it really annoiing when playing games like crysis with full load ?
I guess when not playing games it should be no problem right ?
Its only loud when its at full speed which only happens when the card gets a bit warm.. You can always change at what temps the card switches speed of the fan.. Thats what I do.
My card sits at 20% fan speed up to 55 degs so you can't hear it at all when not gaming then at 57 degs it jumps to 40% still can't hear it. 60 degs and it jumps to 60% and 65 degs it would jump to 80% which you can hear...
100% is a touch annoying but you can't hear it if using a headset obviously.
Although i don't intend to use headset, i guess playing a game with a 5.1 surround sound wont let you notice the fan noice of the gpu, so i guess i will be ok.




I think what surprised me the most was how close all 3 were.. I knew the GTX still beat the GT but I thought the gap was bigger... Its such a small gap to try and fit the new GTS into. They obviously didn't want it to come out and trash the more expensive GTX but didn't want it out and get whipped by the cheaper GT...
Oh yeah 1 more correction too :)
Conclusion
It's actually wrong to separate these two cards based simply on pure performance along.
Enjoyed reading that.. So again nice one!