080709 / Anti-Aliasing previous | next Anti-Aliasing Got anti-aliasing working in engine without hardware AA, using temporal jitter, sub-pixel positioning, and frame feedback. The 720P shots below (running at over 60fps on my NVidia 8600 GTS) were generated from only 256 K/pixels per frame. So I'm effectively only computing 25% of the screen pixels a frame. Which is great, builtin "keeping it low res". Click image for the full 1280x720 resolution shot, updated it actually works now!    I'm very satisfied with the results thus far, moving to GPGPU from CPU has enabled me to get a 64x speed up in the l-system traversal. There is still a lot of work to do (lighting, physics, content creation tools, etc), and I've still got a lot of possibilities for optimization, so this thing will be awesome when finished... A Little Profiling You can only get so far with GL_EXT_timer_query, and I don't expect NVidia to release the newest PerfKit for Linux any time soon, but I have some rough estimates. First, I a previous said that somehow the 8600 GTS was point draw limited around 68 Mpoints/sec. I cannot remember where I got that number from, but it is dead wrong. Early in the engine I was getting 287M points/sec when drawing 128-bits/point, texture fetching 128-bits/point, and doing a large amount of math (over 80 gpu_program4 shader instructions per point). At this point I'm doing 256-bits/point ROP (and 256-bits/point TEX) at 170M points/sec (on 8600 GTS). Without porting over to Windows and using NPerf, guessing I might be hitting a bandwidth limit either from a combination of texture cache over fetch on cache miss, or ROP bandwidth limits from scattering. I should have something like 272 ALU ops/point best case performance from the hardware, so I'm probably not ALU bound yet. Good news is that I have broken through the G80 1/8 performance barrier for point drawing (based on SIMD packing in the fragment shader) in terms of ROP performance. I'm about 1/4 the way to the estimated peak ROP performance on this card (perhaps the G84 doesn't share the G80's limitations). Since I'm doing points instead of 2x2 pixel quads, this makes sense. Not sure yet if this limit can be broken (when ROP is probably designed for 2x2 pixel quads), but I have a few ideas to try out when I get to the windows port. Part of the performance I've managed to get I'm sure has to do with a combination of doing all the math in the vertex shader, working with a FP32 RGBA framebuffer (which I'm guessing is much more ROP bandwidth efficient for scatter), and that I've designed my algorithm so its point scatter has both great destination locality, and good MP utilization (so it doesn't starve what I'm guessing are limits on output 2x2 pixel quad queues in the hardware). Newest NVidia GL Drivers Just switched to the latest NVidia drivers (after not updating for perhaps the last 6 months), and my test program ran significantly faster on the newest drivers. Seems like they might even have an optimization to not fetch vertex attributes when you don't use them... (I'm computing vertex attributes from gl_VertexID). In any case, thanks all at NVidia for single handedly keeping OpenGL up to date! | Atom ©2009-2007 Timothy Farrar Latest Blog Entries 090407 . dxt tip 090320 . gdc 2009 090318 . re-attachable code 090311 . atom tri soup 090305 . voxels 090219 . r600 090218 . arm vfp 090212 . iphone atom 090208 . iphone 090207 . kz2 ii 090129 . gt3xx speculation 090121 . killzone 2 090110 . hole filling 090108 . structure synth 090105 . nv gpu prg + tes 081230 . gl3 textures 081224 . larrabee 081223 . 3d ifs art 081219 . gl3 driver 081218 . reprojection 2 081217 . reprojection 081216 . pc gpu stats 081209 . opencl 081115 . r2 081106 . arm vfp11 081102 . gl3 on linux 081030 . p r d a 081020 . temporal binned ring buffer 081014 . octahedron map 081010 . temporal locality 081008 . future hardware 080926 . changed email 080918 . general purpose 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install
Index 000000 . index
Graphics 090311 . atom tri soup 090110 . hole filling 081218 . reprojection 2 081217 . reprojection 081209 . opencl 081014 . octahedron map 081010 . temporal locality 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080319 . beyond the vacuum 071130 . GPU only 071121 . deferred 3 071116 . deferred 2 071103 . random shots 071025 . motion cards 071018 . cubemap concepts 071015 . drawing reverse II 070926 . drawing in reverse 070822 . new pipeline progress 070819 . high dynamic range 070817 . video update 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070713 . micro impostors 070711 . infinite LOD 070710 . graphics engine intro
Interaction 071204 . GPU only 2 071018 . cubemap concepts 070816 . CFD videos 070730 . CFD code 070715 . self healing
Networking 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth
Sound 070709 . 3D audio / KEMAR
Language 090318 . re-attachable code 081030 . p r d a 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th
Elsewhere andrew selle adrian crook alex champandard angelo pesce aras pranckevicius brian karis cedrick collomb christer ericson chris hecker craig reynolds dave moore david lenihan ignacio castano jeremy shopf jonas risbrandt ke-sen huang marco salvi mikael christensen mike acton mingw naty hoffman nick porcino oss pete shirley pierre terdiman pixar papers realtime rendering ron fedkiw tom forsyth vincent scheib wolfgang engel All Blog Entries 090407 . dxt tip 090320 . gdc 2009 090318 . re-attachable code 090311 . atom tri soup 090305 . voxels 090219 . r600 090218 . arm vfp 090212 . iphone atom 090208 . iphone 090207 . kz2 ii 090129 . gt3xx speculation 090121 . killzone 2 090110 . hole filling 090108 . structure synth 090105 . nv gpu prg + tes 081230 . gl3 textures 081224 . larrabee 081223 . 3d ifs art 081219 . gl3 driver 081218 . reprojection 2 081217 . reprojection 081216 . pc gpu stats 081209 . opencl 081115 . r2 081106 . arm vfp11 081102 . gl3 on linux 081030 . p r d a 081020 . temporal binned ring buffer 081014 . octahedron map 081010 . temporal locality 081008 . future hardware 080926 . changed email 080918 . general purpose 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install 080108 . 2008 071207 . G84 071204 . GPU only 2 071130 . GPU only 071126 . opt+more 071121 . deferred 3 071116 . deferred 2 071115 . critic 2 071112 . critic 071108 . GPU assembly 2 071104 . GPU assembly 071103 . random shots 071031 . cubemap seams 071026 . transform feedback 071025 . motion cards 071024 . GS woes 071019 . cubemap woes 071015 . drawing reverse II 070930 . porting to sm3.0? 070926 . drawing in reverse 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th 070822 . new pipeline progress 070819 . high dynamic range 070818 . DFES 070817 . video update 070816 . CFD videos 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070730 . CFD code 070715 . self healing 070713 . micro impostors 070712 . fragment raytracer 070711 . infinite LOD 070710 . graphics engine intro 070709 . 3D audio / KEMAR 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth 070702 . market research
|