090129 / GT3xx Speculation

previous | next

A possible CUDA roadmap shows CUDA 2.2 and 2.3 in 2009 before the big CUDA 3.0 in Q4.

The GPU 2013 slides show C++, preemption, virtual pipeline, complete pointer support, adaptive workload partitioning, arbitrary data flow, general purpose programming model, special purpose hardware, hardware managed threading and pipeline.

Then Beyond3D Forum speculation in regards to GT3xx using a form of MIMD, with dynamic clusters, use of new buffers and crossbar, with change in power and memory management. Possible dynamic warp formation (DWF)?

Seems to me that NVidia is in the process of iteratively making general purpose scalar computation as efficient as possible with each new hardware generation. This started with improvements to address divergent global memory access in GT2xx. Seems as if GT3xx might improve performance under branch divergence, and perhaps even under bank conflicts (if we are lucky).

This might be too pie in the sky for GT3xx, but I'm personally hoping for MIMD with dynamic work-item grouping to handle bank conflicts through a banked cache which is only coherent under atomic operations, where the cache provides the functionality of shared registers, shared memory, constants, and access to global memory. Effectively the hardware would auto vectorize access to the cache to maintain bandwidth efficiency, using knowledge of work-groups to maintain coherency and as a basis for grouping.

Pixeljunk Monsters

Was looking for a new game to play co-op on the PS3 and decided to try Pixeljunk Monsters (downloadable on PSN). Turned out to be quite an enjoyable game!

SEAforth 40C18

Speaking in terms of awesome hardware there is an interesting Dr. Dobb's article on Extreme Forth which describes the state of the art in ultra low power embedded parallel processing, ie SEAforth 40C18, 40 core chip with <9 mW/core at full speed (yes mW) which can do 25 billion 18-bit forth operation/sec.