070926 / Drawing in Reverse previous | next I have been getting ready for one of our big photography trips this year, this time to the Sierra Nevada area starting in the first week of October to catch the fall color, so I will have to take a break from Atom for about two weeks. On the Topic of Alpha Blending I had a theory that only about 8 times of overdraw per pixel would be necessary to render everything in Atom. Currently using something upwards of 32 times overdraw per pixel, so if I could skip 3/4 of the overdraw, this would be a tremendous performance win. So I switched the rendering from back to front, to front to back. Changing the alpha blending equation, and added a stencil test so only the first 8 front most impostors per pixel get drawn. The result worked mostly, with one problem. When the first 8 pixels are all low alpha, there is still some artifacting. Adding in a alpha test to clip out really low alpha pixels so they didn't get included in my 8 pixel limit, helped but didn't fix the problem. A more innovative solution was needed! If you think about it, when a pixel is generated by the overlap of many low alpha sprites, it is usually representing some kind of fog or haze. And this fog or haze usually has a similar color to the surrounding pixels. So if the accumulated coverage of a pixel is very low after drawing 8 pixels, it is probably safe to assume the fog/haze case. Now I had a solution to the problem. The solution is to add one more pass, drawing a 1/2 down-sampled copy (using the GPU's automatic mipmap generation) of the previous frame as the last back-most overdraw pass. The down-sampling blurs the pixels slightly (fog/haze), and fills in the areas of low alpha accumulation. Given a good 30 fps, the convergence of the algorithm is invisible to the eye. And it worked, really really well! Final Step to a Huge Performance Win Already the stencil test helps quite a lot by skipping the fragment shader (and thus 2 texture reads, and 1 ROP blend). But there is a faster way by eliminate large groups of pixels way before the stencil check. After some research, it looks as if only the newest AMD/ATI GPUs have a hierarchical stencil buffer, enabling the stencil pass to clip out groups of pixels (say 16 or 32) at a time. So the best next option is to use the hierarchical z-cull hardware, which I believe is similar in function in all DX10 type cards. Filling the Z buffer is another subproblem. Looks like to use the z-cull, I'm going to have to draw polygons with alpha test off, and no fragment shader depth write. So my idea is to draw a mini framebuffer (x/4 by y/4) first using the stencil idea, but only drawing Z into a texture instead of color. So the last z drawn is for the 8th pixel drawn into the mini framebuffer. Then using a vertex shader to generate two triangles per pixel of the mini framebuffer, and doing a depth only write of the resulting z values into the full size Z buffer. Then the z-cull hardware should be primed to quickly chop groups of pixels which exceed the overdraw limit. With the stencil optimization alone, I am again CPU bound. So I probably wont get to my z-cull test until I get the CPU side better optimized (need to finish my Atom4th stuff). | Atom ©2009-2007 Timothy Farrar Latest Blog Entries 090407 . dxt tip 090320 . gdc 2009 090318 . re-attachable code 090311 . atom tri soup 090305 . voxels 090219 . r600 090218 . arm vfp 090212 . iphone atom 090208 . iphone 090207 . kz2 ii 090129 . gt3xx speculation 090121 . killzone 2 090110 . hole filling 090108 . structure synth 090105 . nv gpu prg + tes 081230 . gl3 textures 081224 . larrabee 081223 . 3d ifs art 081219 . gl3 driver 081218 . reprojection 2 081217 . reprojection 081216 . pc gpu stats 081209 . opencl 081115 . r2 081106 . arm vfp11 081102 . gl3 on linux 081030 . p r d a 081020 . temporal binned ring buffer 081014 . octahedron map 081010 . temporal locality 081008 . future hardware 080926 . changed email 080918 . general purpose 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install
Index 000000 . index
Graphics 090311 . atom tri soup 090110 . hole filling 081218 . reprojection 2 081217 . reprojection 081209 . opencl 081014 . octahedron map 081010 . temporal locality 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080319 . beyond the vacuum 071130 . GPU only 071121 . deferred 3 071116 . deferred 2 071103 . random shots 071025 . motion cards 071018 . cubemap concepts 071015 . drawing reverse II 070926 . drawing in reverse 070822 . new pipeline progress 070819 . high dynamic range 070817 . video update 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070713 . micro impostors 070711 . infinite LOD 070710 . graphics engine intro
Interaction 071204 . GPU only 2 071018 . cubemap concepts 070816 . CFD videos 070730 . CFD code 070715 . self healing
Networking 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth
Sound 070709 . 3D audio / KEMAR
Language 090318 . re-attachable code 081030 . p r d a 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th
Elsewhere andrew selle adrian crook alex champandard angelo pesce aras pranckevicius brian karis cedrick collomb christer ericson chris hecker craig reynolds dave moore david lenihan ignacio castano jeremy shopf jonas risbrandt ke-sen huang marco salvi mikael christensen mike acton mingw naty hoffman nick porcino oss pete shirley pierre terdiman pixar papers realtime rendering ron fedkiw tom forsyth vincent scheib wolfgang engel All Blog Entries 090407 . dxt tip 090320 . gdc 2009 090318 . re-attachable code 090311 . atom tri soup 090305 . voxels 090219 . r600 090218 . arm vfp 090212 . iphone atom 090208 . iphone 090207 . kz2 ii 090129 . gt3xx speculation 090121 . killzone 2 090110 . hole filling 090108 . structure synth 090105 . nv gpu prg + tes 081230 . gl3 textures 081224 . larrabee 081223 . 3d ifs art 081219 . gl3 driver 081218 . reprojection 2 081217 . reprojection 081216 . pc gpu stats 081209 . opencl 081115 . r2 081106 . arm vfp11 081102 . gl3 on linux 081030 . p r d a 081020 . temporal binned ring buffer 081014 . octahedron map 081010 . temporal locality 081008 . future hardware 080926 . changed email 080918 . general purpose 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install 080108 . 2008 071207 . G84 071204 . GPU only 2 071130 . GPU only 071126 . opt+more 071121 . deferred 3 071116 . deferred 2 071115 . critic 2 071112 . critic 071108 . GPU assembly 2 071104 . GPU assembly 071103 . random shots 071031 . cubemap seams 071026 . transform feedback 071025 . motion cards 071024 . GS woes 071019 . cubemap woes 071015 . drawing reverse II 070930 . porting to sm3.0? 070926 . drawing in reverse 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th 070822 . new pipeline progress 070819 . high dynamic range 070818 . DFES 070817 . video update 070816 . CFD videos 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070730 . CFD code 070715 . self healing 070713 . micro impostors 070712 . fragment raytracer 070711 . infinite LOD 070710 . graphics engine intro 070709 . 3D audio / KEMAR 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth 070702 . market research
|