071026 / Transform Feedback previous | next Transform feedback + multiple drawing passes is providing to be an excellent solution to the problem of the Geometry Shader pipe being too slow to be useful! Transform feedback basically is a SM4.0 feature which allows the output of a vertex shader to be written into one or more VBOs (vertex buffer objects). On my GeForce 8600 GTS up to 4 VBOs can be written to in one transform feedback pass. Furthermore up to 16 FP32 values can be written to each of those four VBOs. So this enables a transform feedback pass to output up to 16 new vec4 points per point input. Easy data expansion, and a very quick way to turn a single particle into an output triangle. So with transform feedback solving the geometry expansion issues, I tried separate drawing passes to each side of the cubemap (instead of using gl_Layer in a Geometry Shader). This new method is almost 20 times faster than using the Geometry Shader! Now to avoid re-calculating flat varyings (per primitive values, instead of per vertex) in the vertex shaders, and to keep these values well cached between vertexes of the same triangle, texture buffer objects should do just fine. So one early point to point VS pass to generate an interleaved VBO with per primitive values. Then map this VBO as a texture buffer object, and use gl_PrimitiveID to build an index into the texture buffer object in future vertex shaders. I believe this is the absolute fastest path on the GPU for what I am doing. Other Optimization Progress So I've managed to offload a large part of the CPU time by getting the motion card pass on the GPU. For the sake of getting this done, I'm archiving my ray tracing ideas for some future project. So the drawing pipeline is doing to be very similar to what I have already working, except now I have extra environmental lighting from the cubemap. I've also gone back through my stencil based reverse drawing code again, and found that turning off the alpha test and using the stencil test actually works quite well. Much better than turning off the stencil and turning on the alpha test. This almost makes me wonder if the GeForce 8 series has some block based stencil hardware to reject blocks of fragments (the AMD/ATI HD card has a hierarchical stencil buffer). Perhaps with the alpha test on, this hardware was disabled. One bonus of having the stencil test is that now I have another method for frame rate control, using the stencil to limit the number of times of overdraw per pixel. Tossing the HDR Code That is right, I'm no longer using it. First off the lower-end GeForce 8 cards take 2 times longer to fetch a bilinear filtered FP16 pixel, and 2 times longer to blend FP16 output in the ROP than the same operations with 8bit values. Not to mention the extra memory bandwidth and texture cache misses. With the amount of overdraw I use, this cost just wasn't worth it. Second, I don't like overblown overexposure. From a fine art photography perspective, HDR like extreme overexposure easily ruins an otherwise good photograph. Highlights should near clip or perhaps only clip a little at a point light source like say the sun. Otherwise highlights should have detail. Turns out with my mix of atmospheric lighting (I render atmospheric spaces in-between surfaces), it is just too easy to limit lighting and lighting feedback to values under the clipping point. I can still get near bloom with the added bonus of having detail there, and with colored lights, they still bleed to surrounding objects. | Atom ©2009-2007 Timothy Farrar Latest Blog Entries 090407 . dxt tip 090320 . gdc 2009 090318 . re-attachable code 090311 . atom tri soup 090305 . voxels 090219 . r600 090218 . arm vfp 090212 . iphone atom 090208 . iphone 090207 . kz2 ii 090129 . gt3xx speculation 090121 . killzone 2 090110 . hole filling 090108 . structure synth 090105 . nv gpu prg + tes 081230 . gl3 textures 081224 . larrabee 081223 . 3d ifs art 081219 . gl3 driver 081218 . reprojection 2 081217 . reprojection 081216 . pc gpu stats 081209 . opencl 081115 . r2 081106 . arm vfp11 081102 . gl3 on linux 081030 . p r d a 081020 . temporal binned ring buffer 081014 . octahedron map 081010 . temporal locality 081008 . future hardware 080926 . changed email 080918 . general purpose 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install
Index 000000 . index
Graphics 090311 . atom tri soup 090110 . hole filling 081218 . reprojection 2 081217 . reprojection 081209 . opencl 081014 . octahedron map 081010 . temporal locality 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080319 . beyond the vacuum 071130 . GPU only 071121 . deferred 3 071116 . deferred 2 071103 . random shots 071025 . motion cards 071018 . cubemap concepts 071015 . drawing reverse II 070926 . drawing in reverse 070822 . new pipeline progress 070819 . high dynamic range 070817 . video update 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070713 . micro impostors 070711 . infinite LOD 070710 . graphics engine intro
Interaction 071204 . GPU only 2 071018 . cubemap concepts 070816 . CFD videos 070730 . CFD code 070715 . self healing
Networking 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth
Sound 070709 . 3D audio / KEMAR
Language 090318 . re-attachable code 081030 . p r d a 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th
Elsewhere andrew selle adrian crook alex champandard angelo pesce aras pranckevicius brian karis cedrick collomb christer ericson chris hecker craig reynolds dave moore david lenihan ignacio castano jeremy shopf jonas risbrandt ke-sen huang marco salvi mikael christensen mike acton mingw naty hoffman nick porcino oss pete shirley pierre terdiman pixar papers realtime rendering ron fedkiw tom forsyth vincent scheib wolfgang engel All Blog Entries 090407 . dxt tip 090320 . gdc 2009 090318 . re-attachable code 090311 . atom tri soup 090305 . voxels 090219 . r600 090218 . arm vfp 090212 . iphone atom 090208 . iphone 090207 . kz2 ii 090129 . gt3xx speculation 090121 . killzone 2 090110 . hole filling 090108 . structure synth 090105 . nv gpu prg + tes 081230 . gl3 textures 081224 . larrabee 081223 . 3d ifs art 081219 . gl3 driver 081218 . reprojection 2 081217 . reprojection 081216 . pc gpu stats 081209 . opencl 081115 . r2 081106 . arm vfp11 081102 . gl3 on linux 081030 . p r d a 081020 . temporal binned ring buffer 081014 . octahedron map 081010 . temporal locality 081008 . future hardware 080926 . changed email 080918 . general purpose 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install 080108 . 2008 071207 . G84 071204 . GPU only 2 071130 . GPU only 071126 . opt+more 071121 . deferred 3 071116 . deferred 2 071115 . critic 2 071112 . critic 071108 . GPU assembly 2 071104 . GPU assembly 071103 . random shots 071031 . cubemap seams 071026 . transform feedback 071025 . motion cards 071024 . GS woes 071019 . cubemap woes 071015 . drawing reverse II 070930 . porting to sm3.0? 070926 . drawing in reverse 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th 070822 . new pipeline progress 070819 . high dynamic range 070818 . DFES 070817 . video update 070816 . CFD videos 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070730 . CFD code 070715 . self healing 070713 . micro impostors 070712 . fragment raytracer 070711 . infinite LOD 070710 . graphics engine intro 070709 . 3D audio / KEMAR 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth 070702 . market research
|