071207 / G84 previous | next Just some quick summary notes on the NVidia G84 for my own reference. Tested with a 8600 GTS clocked to 730MHz core / 1460MHz SPU / 2.26 DDR. 32 SPUs at 1460MHz 16 TEX units at 730MHz 8 ROP units at 730MHz
Texture Performance Nearest filtering has no advantage, bilinear is free. Trilinear is roughly double cost of bilinear. 64bit texels should have 50% performance of a 32bit texel. 128bit texels should have 25% performance of a 32bit texel. All formats tested with sequential access sum of 3x3 (9 total) texels, using a 2x2 texture and a second batch of tests with a 2048x2048 source texture. Both tests output to a 2048x2048 FBO. Bilinear results, Max possible bilinear rate = 16 TEX units at 730MHz = 11.6 Gtex/sec. <=32bit texels - L8,L16F,L32F,LA8,LA16F,RGBA8 : ~7.6-8.0 Gtex/sec, 65-69% of max. 64bit texels - LA32F,RGBA16F : ~5.8 Gtex/sec, 99% of max. 128bit texels - RGBA32F : ~2.7 Gtex/sec, 93% of max. Random notes, Strange reduction in performance for <=32bit texel types. For bilinear filtering, 64bit and 128bit texel performance is ideal. Trilinear textures with forced bilinear LOD levels are at bilinear speeds. Trilinear performance is bandwidth limited? with the tested 9 texel sequential access. Typical trilinear performance is 30-40% off expected performance of 2x2 texture. ROP Performance Assuming that memory read/write rates of 64bit and 128bit pixels are 1/2 and 1/4 32bit pixel rates. Also assuming that blend costs of 16bit and 32bit are 2x and 4x the 8bit rate. Guessing L32F is going to be blend limited but not write limited, RGBA32F is going to be both blend and memory limited (guessing blend and memory latency is additive), but that RGBA16F will be a fast path with memory latency fully masked by the blend latency. Tested writing to 2048x2048 FBO. Max possible blend rate = 8 ROP units at 730MHz = 5.8 Gpix/sec. Without blending - L8,L16F,LA8,LA16F,RGBA8 : ~5.1 Gpix/sec, 88% of max (5.8 Gpix/sec). Without blending - L32F : ~3.3 Gpix/sec, 57% of max (5.8 Gpix/sec). Without blending - RGBA16F : ~2.7 Gpix/sec, 93% of max (2.9 Gpix/sec). Without blending - LA32F : ~2.1 Gpix/sec, 72% of max (2.9 Gpix/sec). Without blending - RGBA32F : ~1.2 Gpix/sec, 82% of max (1.45 Gpix/sec).
With glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA), L8, RGBA8 : ~2.9 Gpix/sec, 50% of max (5.8 Gpix/sec). L16F : ~2.7 Gpix/sec, 93% of max (2.9 Gpix/sec). L32F : ~1.4 Gpix/sec, 97% of max (1.45 Gpix/sec). RGBA16F : ~2.7 Gpix/sec, 94% of max (2.9 Gpix/sec). RGBA32F : ~0.364 Gpix/sec, %99 of max (0.365 Gpix/sec, see comments above).
Random notes, ROP blends must always use a 2 clock 16bit blend even with 8bit texels. Write without blend rates are odd, especially L32F. Other Random Notes Using this as reference. Apparently Z-Cull is much more effective on the G84 compared to the G80. The G84 can do something like 510 Mtri/sec max. The G84 can do something like 68 Mpoints/sec max at 1pix 48 Mpts/s max at 4pix. | Atom ©2008/2007 Timothy Farrar Latest Blog Entries 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install
Index 000000 . index
Graphics 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080319 . beyond the vacuum 071130 . GPU only 071121 . deferred 3 071116 . deferred 2 071103 . random shots 071025 . motion cards 071018 . cubemap concepts 071015 . drawing reverse II 070926 . drawing in reverse 070822 . new pipeline progress 070819 . high dynamic range 070817 . video update 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070713 . micro impostors 070711 . infinite LOD 070710 . graphics engine intro
Interaction 071204 . GPU only 2 071018 . cubemap concepts 070816 . CFD videos 070730 . CFD code 070715 . self healing
Networking 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth
Sound 070709 . 3D audio / KEMAR
Language 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th
Elsewhere andrew selle adrian crook alex champandard angelo pesce aras pranckevicius benjamin hanson brian karis cedrick collomb christer ericson chris hecker cort stratton craig reynolds dave moore david lenihan davide pasca ignacio castano jeremy shopf jonas risbrandt ke-sen huang marco salvi mike acton mingw nick porcino oss pete shirley pierre terdiman pixar papers realtime rendering ron fedkiw tom forsyth vincent scheib wolfgang engel All Blog Entries 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install 080108 . 2008 071207 . G84 071204 . GPU only 2 071130 . GPU only 071126 . opt+more 071121 . deferred 3 071116 . deferred 2 071115 . critic 2 071112 . critic 071108 . GPU assembly 2 071104 . GPU assembly 071103 . random shots 071031 . cubemap seams 071026 . transform feedback 071025 . motion cards 071024 . GS woes 071019 . cubemap woes 071015 . drawing reverse II 070930 . porting to sm3.0? 070926 . drawing in reverse 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th 070822 . new pipeline progress 070819 . high dynamic range 070818 . DFES 070817 . video update 070816 . CFD videos 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070730 . CFD code 070715 . self healing 070713 . micro impostors 070712 . fragment raytracer 070711 . infinite LOD 070710 . graphics engine intro 070709 . 3D audio / KEMAR 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth 070702 . market research
|