070930 / Porting to SM3.0 previous | next Atom is a technology gamble, will enough people with GeForce 8x00 cards be interested in the game within the first year of release (some time in 2008)? If so, this will be a success, if not, well, I'd rather not go there. BTW, this is an OpenGL based game, so there is NO need to goto Vista to run this (to get DirectX10), the NVidia drivers for Windows XP have SM4.0 support! Porting to AMD/ATI HD Getting Atom to run on the New AMD/ATI HD cards should be easy, only have to replace my TextureArray usage with a large flat 2D texture and multiple drawing passes with glScissor() to mask the regions. Is it worth the change? I will probably wait and see in 2008. Porting to SM3.0 First, ATI never really had SM3.0 support do to a lack of a vertex texture fetch ability. So all older ATI cards are an instant no go. Vertex texture fetch is an absolute requirement, render to vertex array just isn't going to cut it for Atom. Looking at the Valve Hardware Survey Summary about 22% have a 7600 or better NVidia card, and only about 6% have a SM4.0 able card. So could Atom be ported to SM3.0 to get about 3.6 times the possible user base? The GPU Gems 2 book has a chapter which outlines the 6800 card's hardware. Key points from this are, limited to 4 MRTs, only float FP32 and vec4 FP32 un-filtered texture lookups from the Vertex Shader, only un-filted FP32 texture lookups from the Fragment Shader, no TextureArrays, and no Geometry Shaders. From what I can gather, the 7x00 hardware is the same in this respect, just a faster series of cards. Getting around the lack of a Geometry Shader is a serious problem, requiring either more work on the CPU or 4 times more work on the Vertex Shader, including a bunch of dependent texture reads (to generate my impostor billboards on the GPU). Either way performance would suffer, but it could be done. However the real deal breaker might be the lack of a FP32 filtered texture lookup. A little secret, the physics gathering step uses linear filtered texture lookups to filter between particles ... without it, physics effects start grouping particles because a much smaller than screen size buffer (in the x,y dimension) is used to accumulate the velocity vector field (and other physics properties). Manually filtering is not an option. I'm doing 65K mega particles with multiple gathers per particle. The performance just isn't there to manually filter each gather. Going to FP16 was a possible fix, and would work for a 2D game. Would probably also greatly improve my performance do to 2x better texture caching and filtered lookup performance. Just not enough precision for Z. The X,Y position could be translated into projected screen space, then un-projected back to FP32 knowing the Z. So the first though was to just use impostor drawing order as Z, 65K values fit nicely into a FP16 value. Then could do a dependent texture lookup to find actual Z given the FP16 impostor drawing order. Might work, but one more problem, my physics scattering pass draws alpha blended motion billboards (of the physics properties per particle) into a layered 2D buffer. It is the alpha blending and actual drawing of the path of the particle, which enables the ultra smooth CFD physics. So if I had my XY velocity vectors in screen space, and my Z position as drawing order, the blending would be wrong. Also storing a Z velocity vector would be a mess. Could save Z velocity divided by particle radius, but would have to save particle radius in projected screen space as well. Doing a non-blended extra FP32 Z only pass could be an option, would double the work at the Vertex Shader level, and removes the Z blending which is important. Bottom line, it only just might be possible to port to SM3.0, but is it worth it? This is what I am going to ponder while I'm reading GPU Gems 3 on the plane and am off photographing this week in the Sierras. | Atom ©2009-2007 Timothy Farrar Latest Blog Entries 090407 . dxt tip 090320 . gdc 2009 090318 . re-attachable code 090311 . atom tri soup 090305 . voxels 090219 . r600 090218 . arm vfp 090212 . iphone atom 090208 . iphone 090207 . kz2 ii 090129 . gt3xx speculation 090121 . killzone 2 090110 . hole filling 090108 . structure synth 090105 . nv gpu prg + tes 081230 . gl3 textures 081224 . larrabee 081223 . 3d ifs art 081219 . gl3 driver 081218 . reprojection 2 081217 . reprojection 081216 . pc gpu stats 081209 . opencl 081115 . r2 081106 . arm vfp11 081102 . gl3 on linux 081030 . p r d a 081020 . temporal binned ring buffer 081014 . octahedron map 081010 . temporal locality 081008 . future hardware 080926 . changed email 080918 . general purpose 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install
Index 000000 . index
Graphics 090311 . atom tri soup 090110 . hole filling 081218 . reprojection 2 081217 . reprojection 081209 . opencl 081014 . octahedron map 081010 . temporal locality 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080319 . beyond the vacuum 071130 . GPU only 071121 . deferred 3 071116 . deferred 2 071103 . random shots 071025 . motion cards 071018 . cubemap concepts 071015 . drawing reverse II 070926 . drawing in reverse 070822 . new pipeline progress 070819 . high dynamic range 070817 . video update 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070713 . micro impostors 070711 . infinite LOD 070710 . graphics engine intro
Interaction 071204 . GPU only 2 071018 . cubemap concepts 070816 . CFD videos 070730 . CFD code 070715 . self healing
Networking 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth
Sound 070709 . 3D audio / KEMAR
Language 090318 . re-attachable code 081030 . p r d a 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th
Elsewhere andrew selle adrian crook alex champandard angelo pesce aras pranckevicius brian karis cedrick collomb christer ericson chris hecker craig reynolds dave moore david lenihan ignacio castano jeremy shopf jonas risbrandt ke-sen huang marco salvi mikael christensen mike acton mingw naty hoffman nick porcino oss pete shirley pierre terdiman pixar papers realtime rendering ron fedkiw tom forsyth vincent scheib wolfgang engel All Blog Entries 090407 . dxt tip 090320 . gdc 2009 090318 . re-attachable code 090311 . atom tri soup 090305 . voxels 090219 . r600 090218 . arm vfp 090212 . iphone atom 090208 . iphone 090207 . kz2 ii 090129 . gt3xx speculation 090121 . killzone 2 090110 . hole filling 090108 . structure synth 090105 . nv gpu prg + tes 081230 . gl3 textures 081224 . larrabee 081223 . 3d ifs art 081219 . gl3 driver 081218 . reprojection 2 081217 . reprojection 081216 . pc gpu stats 081209 . opencl 081115 . r2 081106 . arm vfp11 081102 . gl3 on linux 081030 . p r d a 081020 . temporal binned ring buffer 081014 . octahedron map 081010 . temporal locality 081008 . future hardware 080926 . changed email 080918 . general purpose 080826 . olick paper 080814 . otoy, braid 080813 . opengl 3 II 080811 . opengl 3 080806 . random stuff 080718 . nv perf kit 080709 . antialiasing 080704 . micro polys II 080628 . micro polys 080524 . triangles 080426 . parallel II 080319 . beyond the vacuum 080223 . human head + parallel 080114 . xp install 080108 . 2008 071207 . G84 071204 . GPU only 2 071130 . GPU only 071126 . opt+more 071121 . deferred 3 071116 . deferred 2 071115 . critic 2 071112 . critic 071108 . GPU assembly 2 071104 . GPU assembly 071103 . random shots 071031 . cubemap seams 071026 . transform feedback 071025 . motion cards 071024 . GS woes 071019 . cubemap woes 071015 . drawing reverse II 070930 . porting to sm3.0? 070926 . drawing in reverse 070921 . assembler in atom4th 070919 . editor working 070915 . chicken and egg 070912 . font making 070910 . 2 4th | !2 4th 070822 . new pipeline progress 070819 . high dynamic range 070818 . DFES 070817 . video update 070816 . CFD videos 070810 . engine lighting 070809 . engine videos 070731 . screen shots 070730 . CFD code 070715 . self healing 070713 . micro impostors 070712 . fragment raytracer 070711 . infinite LOD 070710 . graphics engine intro 070709 . 3D audio / KEMAR 070708 . breaking firewalls 070707 . management servers 070706 . 510 players / 128Kbps 070705 . UDP player bandwidth 070704 . network latency 070703 . cost of bandwidth 070702 . market research
|