081020 / Temporal Binned Ring Buffers

previous | next

Right vs Left Brain Programming

A majority of programmers I know are very left brained, solving problems mathematically. I however started as an artist, right brained, and solve problems visually. All the left brain skills developed from the desire to do right brained stuff. For example, my take on HDR, comes from a photographic perspective,

Recently there has been a resergance in right brained rendering (ie artistic model), such as the latest Prince of Persia. However, I would really like to see a fellow right brained graphics programmer take this to the next level, via a 3D rendering engine which generates in visuals that look exactly like great concept art. Perhaps an engine which uses a 3D painters algorithm, where a concept artist could speed paint in 3D, and build assets in real time. An engine which would actually composite 3D brush strokes, and the brush strokes would be "lit" by the local environment just in the way which an artist simplifies lighting mentally. Level of detail simplification could be automatic by pealing off the higher detail brush strokes. An engine which could paint the spaces between surfaces... would be a worthy challenge.

Dynamic Memory with Temporal Binned Ring Buffers

The idea is to take the simplest form of dynamic memory allocation, namely just allocate on a stack (like obstack but without freeing) and reduce some of its limitations.

1. Change the stack into a circular stack or ring buffer. The limitation of running out of memory has now been transformed into the condition where memory gets overwritten.

2. To enable the program to detect and handle cases where memory has been overridden, work with two offsets, virtual and physical. The virtual offset just keeps on allocating in order as if there was an infinite stack (ie infinite memory). In practice, even a 64bit virtual offset will eventually wrap around. The physical offset is simply the virtual offset masked when using power of two sized buffers, physical = virtual & (buffersize - 1). Allocation returns a virtual offset, and the program keeps this virtual offset instead of the physical offset. The allocator keeps track of range of virtual offsets which are still valid. This window slides as new allocations are made.

3. In order to limit memory being overridden, budget and setup multiple ring buffers. Divide up allocations on each ring buffer based on the estimated lifetime of that data structure. This is the "temporal binning" or mapping.

4. In terms of parallel programming, best case is to only use a ring buffer from one thread and thus avoid any sync issues. The range of virtual offsets need to be locked and only changed at a periodic sync point (such as when a frame is finished drawn). Since this window is locked, one must also handle cases where the allocation would fail (ie would overlap with the fixed window). It is important to also note that allocation with multiple threads can be done with lockless programming through atomic operations.

5. There are some smart ways to handle issues relating to memory eventually being overridden. For example, preempt the overwrite and copy over the old data before it is overridden. Often this copy is free, because the data is going to change anyway (just write it to the new destination). Or with a streaming architecture, often this becomes a chance to stream in the data at the new required level of detail or precision. Another way to look at this construct is that the ring buffers are temporal caches. When data is (or is going to be) evicted from this temporal cache, the program must recompute the data.

I've already gotten the, you'd be insane to do such a thing, response on this one, but I can see a few cases were such a construct is very useful. In the massively parallel future, the synchronization cost of dynamic memory allocation will be a barrier to performance. New tools and ideas will need to be employed to solve problems in ways which can continue to scale with hardware.

The core idea here is that often optimization is about intelligently choosing a constraint which enables both simplification and performance. Often choosing the constraint involves finding a good fixed mapping of the problem (or a large subset of the problem) which can be done in a very small constant (or near constant) amount of time.

Another way to view this idea, is in terms of the amount of communication (to memory, between nodes, etc) or synchronization required to solve a problem. Communication requires bandwidth, and bandwidth or synchronization is often the limiting factor in the scalability and performance of an algorithm. Fixed mappings (or binning) does not require communication or synchronization between objects to resolve the mapping. Non fixed mappings usually involve some sorting or searching which has a communication cost. When the number of things grows very large, even a good logarithmic reduction in communication can still not be fast enough.