071015 / Drawing in Reverse II previous | index Back from the photography trip. I finally got around to doing the reverse drawing with hierarchical Z buffer Z-Cull, and measuring the performance difference with a static VBO. Well the results are in and it is a draw. The benefit was about 10%. The extra cost of drawing the x/4 by y/4 32bit scaler float framebuffer using stencil, then turning each of those pixels into a 4x4 pixel quad to render Z into the full size depth buffer, and finally doing the full size drawing pass, is too much extra work. The overhead is something like 25% of the original drawing pass, for only a 35% gain. However After more testing, the performance found previously by drawing front first was simply a side effect of turning on the alpha test and throwing out ROP fragments with alpha under 0.0625. Stencil test wasn't even needed for the performance increase. I need 16x overdraw at a minimum to draw the frame anyway, and had 32x max in some areas but these were minor. In the end I need front first drawing for the physics scatter passes, so the front first is here to stay, and as it turns out I got a 10% improvement simply using the alpha test. What is Next Still working on the optimization of the engine. I'm taking a 1-2 week gamble on a full rewrite to a new combined overdraw culling and physics algorithm on the GPU. Current CPU time is roughly 16% tree prune, 32% overdraw cull, 16% particle to motion blurred imposter, and 36% generation double precision geometry work (tree traversal, etc). With the new system I'm moving 50% to the GPU, and optimizing, through simplification, of the rest of the CPU bound code. Also switching my version of the "broad phase" collision detection pass from texture arrays to a mipmaped cubemap (rendering to the various mipmaps seperately). More later when I know if it works or fails! | Atom ©2008/2007 Timothy Farrar Latest Blog Entries 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install
Index 000000 . index
Graphics 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080319 . beyond the vacuum 071130 . GPU only 071121 . deferred 3 071116 . deferred 2 071103 . random shots 071025 . motion cards 071018 . cubemap concepts 071015 . drawing reverse II 070926 . drawing in reverse 070822 . new pipeline progress 070819 . high dynamic range 070817 . video update 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070713 . micro impostors 070711 . infinite LOD 070710 . graphics engine intro
Interaction 071204 . GPU only 2 071018 . cubemap concepts 070816 . CFD videos 070730 . CFD code 070715 . self healing
Networking 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth
Sound 070709 . 3D audio / KEMAR
Language 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th
Elsewhere andrew selle adrian crook alex champandard angelo pesce aras pranckevicius benjamin hanson brian karis cedrick collomb christer ericson chris hecker cort stratton craig reynolds dave moore david lenihan davide pasca ignacio castano jeremy shopf jonas risbrandt ke-sen huang marco salvi mike acton mingw nick porcino oss pete shirley pierre terdiman pixar papers realtime rendering ron fedkiw tom forsyth vincent scheib wolfgang engel All Blog Entries 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install 080108 . 2008 071207 . G84 071204 . GPU only 2 071130 . GPU only 071126 . opt+more 071121 . deferred 3 071116 . deferred 2 071115 . critic 2 071112 . critic 071108 . GPU assembly 2 071104 . GPU assembly 071103 . random shots 071031 . cubemap seams 071026 . transform feedback 071025 . motion cards 071024 . GS woes 071019 . cubemap woes 071015 . drawing reverse II 070930 . porting to sm3.0? 070926 . drawing in reverse 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th 070822 . new pipeline progress 070819 . high dynamic range 070818 . DFES 070817 . video update 070816 . CFD videos 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070730 . CFD code 070715 . self healing 070713 . micro impostors 070712 . fragment raytracer 070711 . infinite LOD 070710 . graphics engine intro 070709 . 3D audio / KEMAR 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth 070702 . market research
|