090121 / Killzone 2 Bean Spilling

previous | next

Just to make sure my personal bias is clear, I've never been tempted to pre-order a game before, I'm actively waiting to pickup my pre-ordered KZ2. I'd pay my $60 just to visually dissect the tech, regardless if the game is good. From what I've seen thus far (which is little more than internet vids), KZ2 sets the bar for game technology.

Killzone 2 Tech

View the 40 minute behind the scenes making of KZ2 video at Game Kings!
Refresh with their previous deferred rendering presentation.

I tossed out a guess on the Beyond3D Forums that their post would be between 15-30% of the GPU time depending on frame rate, and it seems as if this video interview confirms that moving GPU work (post processing, perhaps more) to the SPUs has resulted in an amazing 20 to 40% performance gain.

Apparently tech was being polished late in development. The "4xMSAA trick" low resolution particle artifacts are clear on this set of August images (which appear to be direct framebuffer grabs). Look at the first muzzle flash image at 720P. Then check out the rocket image (2x2 pixel blocks). Also seems as if the small high detail particles are directly blending (perhaps into the G-Buffer) at full resolution. R2 does something similar with high detail particles at full resolution. Post August, I don't see any KZ2 shots with the 4xMSAA artifacts. Could either be doctored for public consumption shots, or that they indeed changed the particle rendering. I see evidence in at least one shot that they might now be drawing to a downsampled buffer and later merging back. View this shot at 720P. Look where the explosion is behind the sandbags, looks more like the Battlefield Bad Company particle upsizing. Perhaps the change was to provide downsize particle rendering which adapts to overdraw by changing resolution.

How About Some Numbers

Notes below are my best guess of the debug info gathered from many frames of the above linked video. It took a bunch of frames to make out the numbers, so don't expect sums to add up properly.

CPU TIME
--------
Unknown .......... 1.24%
SPU Sync ......... 0.06%
AI Manager ....... 0.47%
Game Logic ....... 9.52%
Script ........... 0.80%
Physics .......... 1.57%
Representation ... 10.46%
Draw ............. 20.18%
HUD .............. 2.19%
Sound ............ 0.65%
Profile HUD ...... 25.17%
GPU Sync ......... 37.99%
----------
Total Time ....... 36.85%

SPU TIME
----------------------------
AI.Cover ................... ........ 0.00%
AI.LineOfFire .............. ........ 0.00%
Anim.EdgeAnim .............. 33 ..... 2.01%
Anim.Skinning .............. 152 .... 30.68%
Gfx.DecalUpdate ............ 9 ...... 0.78%
Gfx.LightProbes ............ 396 .... 9.00%
Gfx.PB.DeferredSchedule .... 1 ...... 0.60%
Gfx.PB.Forward ............. 2 ...... 1.69%
Gfx.PB.Geometry ............ 1 ...... 18.67%
Gfx.PB.Lights .............. 1 ...... 0.66%
Gfx.PB.ShadowMap ........... 1 ...... 4.20%
Gfx.Particles.ManagerJob ... 1 ...... 3.14%
Gfx.Particles.UpdateJob .... 130 .... 12.33%
Gfx.Particles.VertexJob .... 70 ..... 20.64%
Gfx.Post.BloomCapture ...... 12 ..... 2.80%
Gfx.Post.BloomIntegrate .... 8 ...... 1.52%
Gfx.Post.DepthOfField ...... 64 ..... 12.12%
Gfx.Post.DepthToFuzzy ...... 8 ...... 0.67%
Gfx.Post.Downsample ........ 29 ..... 0.61%
Gfx.Post.GrainWeight ....... 1 ...... 0.51%
Gfx.Post.HBlur ............. 45 ..... 3.02%
Gfx.Post.ILR ............... 1 ...... 0.63%
Gfx.Post.Modulate .......... 27 ..... 1.3?%
Gfx.Post.MotionBlur ........ 46 ..... 11.31%
Gfx.Post.Unlock? ........... 1 ...... 0.01%
Gfx.Post.Upsample .......... 108 .... 9.47%
Gfx.Post.VBlur ............. 46 ..... 3.73%
Gfx.Post.Vg??lle ........... 1 ...... 1.18%
Gfx.Post.Zero .............. 16 ..... 0.64%
Gfx.Scene.Portals .......... 3 ...... 30.72%
Mesh.Decompression ......... ........ 0.00%
Physics.Collide ............ 4 ...... 2.48%
Physics.Integrate .......... 4 ...... 2.11%
Physics.KdTree ............. 8 ...... 20.50%
Physics.Raycast ............ ........ 0.00%
Snd.MP3.Stereo ............. 2 ...... 2.60%
Snd.MP3.Surround ........... 2 ...... 7.51%
Snd.?Synth ................. 35 ..... 3.23%
Snd.Reverb ................. 14 ..... 4.02%
----------------------------
Total Time ................. 1232 ... 227.46%

GRAPHICS
--------
FPS ................. 30
GPU Stall by CPU .... 0.123 ?s
CPU stall by GPU .... 12.231 ?s

GPU TIME
--------------------------
Unknown ....... 0.2?/ ... 3.43%
Geometry ....... 1.8?/ ... 43.37%
Lighting ....... 1.7?/ ... 14.??%
Effects ........ 8.5?/ ... 8.4?%
Post process ............. 18.31%
--------------------------
Total Time ............... 81.??%
GPU Stall ................ 0.??%

PRIMS / TRI
-----------
Totals ..... 1431/ ... 344,634
Prime? ..... 0/ ...... 0
Geometry ... 619/ .... 161,231
Shadow ..... 683/ .... 170,???
Effects .... 121/ .... 14,3??

MEMORY STATS
------------------------
Pushbuffer ???? ........ 0.15 MB
Pushbuffer High ........ 0.15 MB
VRAM Free .............. 23.43 MB
Host Free .............. 80.?? MB
Heap Free .............. 134.?? MB
Render Mem ???? ........ 0
Render Mem Used ........ 12.00 MB
Render Mem Watermark ... 12.00 MB

MAIN RAM ....... 101.00 MB
----------------
Physics ........ 5.30 MB
Collision ...... 3.72 MB
Sound .......... 16.25 MB
Mesh ........... 21.20 MB
Graphics ....... 6.53 MB
Animation ...... 34.45 MB
Texture ........ 0.56 MB
Shader ......... 1.46 MB
AI Data ........ 2.75 MB
Various ........ 3.32 MB
Waste .......... 5.27 MB
----------------
Total .......... 97.17 MB
Main RAM ??? ... 97 / 101

VIDEO RAM .. 190.04 MB
------------
Mesh ....... 15.99 MB
Texture .... 156.87 MB
Waste ...... 1.20 MB
Total ...... 174.08 MB

Interesting

Just a few bits which can be gathered from this, SPUs are getting used for all sorts of stuff: AI, animation, skinning, push buffer, post processing, visibility, physics, and audio. SPUs are doing motion blur, depth of field, and more. SPU list suggests that they have portal based visibility as well as a KDTree for physics and raycasts (very interesting). Primitive stats suggest shadows take around 50% of triangles, show a relatively low number of triangles per frame (thanks to SPU culling?), and that perhaps particles (and decals?) are upwards of 10-20K triangles. Streamed in texture budget looks to be about 160-180MB. Lighting appears to be 1/3 as expensive as G-Buffer creation.