How can you claim O(delta output) work per frame?


Andreas Ögren <andreas.ogren@st.se> asks:

> In your ROAM paper, you claim that ROAM achieves an
> execution time that is proportional to the number of triangle changes
> per frame. How it that possible? Don't you need to check all triangles
> each frame to find out whether they should be split, merged, or left as
> they are (although these checks are usually quite fast thanks to
> priority-computation deferral)?

Given current graphics hardware, it is *not* possible to do all the work on the software side in time proportionate to the number of changes. You have to send the triangles, meaning each triangle gets touched by at least some DMA hardware to move it from system memory to graphics hardware.  It is possible in principle to have a hardware-local triangle display list that gets updated incrementally in time proportionate to the number of changes, and the OpenGL API can even specify this awkwardly, but OpenGL hardware doesn't allow enough display lists to make this feasible.  It *is* possible (and I think relatively easy) to make hardware do what we need, but I haven't convinced SGI, Nvidia or others to put this capability in their pipes yet (although I *have* been making that sales pitch).

The O(delta output) applies only to the processing up to but not including sending the triangles to the hardware.  Think of a "ROAM Optimization Server" sending a series of incremental changes to a rendering client, where both keep a copy of a bunch of triangle strips.  When a split or merge occurs, only a small constant amount of data and "surgery" to the triangle strips must take place on either the client or server.  The client is responsible for sending the server the current frustum and accepting the stream of resulting incremental update instructions to its strips, and then sending those strips off to be rendered.  The server is really what the ROAM paper is about, and it is this server which runs in time O(delta output) per frame (actually, time can be treated continuously in this model, in essense a "frameless" point of view, with all the "work" strung out at a finite number of discrete "transition" points where the continuous frustum motion crosses a point where a single split or merge becomes desirable/optimal).

The paper goes to great pains to make sure you only touch information related to those triangles that actually change, or within a small factor of only those.  Let me give you an outline of a proof.  First I claim that all the work (with the provisos above) occurs in one of four phases: frustum labeling, priority computations, split/merge queue processing, and strip updating.  This is straigtforward to verify.  The split/merge processing is clearly O(delta output) by definition, the number of changes exactly corresponds with split/merge operations.  Strip updates take a small constant amount of work per split/merge, so this too is clearly in time O(delta output).  This leaves priority computation and frustum labeling to be proven.

The priority deferral is basically trying to predict in a continuous sense exactly the moment when the frustum motion makes a split or merge optimal.  It does this based on continuous models of frustum motion with physical limits on the rate at which the models can diverge from the actual frustum path.  Given this, it becomes a kind of interval-Newton iteration in continuous time to find the exact moment the split or merge should occur.  Given finite sampling of time (i.e. frames), you approximate this Newton iteration with deferral buckets as described in the paper.  Obviously you could write a lengthy paper on this topic alone!  I hope this gives you some idea of where the ideas in the paper came from and how a proof might be made.

Frustum culling touches only those triangles that are on the frustum boundary in either the previous frame or the current frame. This is the only part that is not strictly O(delta output), but the dimensionality of this set of triangles is smaller than the whole mesh, so this typically doesn't become the dominant time factor in practical applications.  Theoretically, you could apply similar logic to what was used to defer priority computations (the logic is actually simpler for frustum culling) and get O(delta output).


Contact Information

Mark A. Duchaineau -- duchaineau1@llnl.gov


Updated June 13, 2001