Suggested ROAM Implementation Phases

`Suggested ROAM implementation phases`

Overall you need to deal with four things each frame, (a) frustum culling,
(b) priority updates (really, queue-index updates), (c) split/merge
surgery, and (d) stripping updates. I *don't* recommend you try to make
a completely optimized implementation from scratch all at once. Much
better (for your sanity, if you are anything like me) to make a simple,
unoptimized implementation, do detailed timing tests, and optimize one
bottleneck after another. I would think of things more in terms of data
structures and and activities on those data structures that as a single
"best" dataflow (in fact the overall picture of the dataflow will
evolve as you go along, in the outline I've suggested). Part of the
problem is that the dataflow does become very strange after all the
optimizations. This is unfortunate but I don't know a way around it.
Here is a potential staging of the evolution of the dataflow:

    a) split only, no frustum cull, no stripping, simple priority
       (project two points and take distance between). This takes
       a single "split" queue and a very simple loop that stops when
       the triangle count is met.

a1) add simple recursive frustum cull (IN/OUT bits as in paper)

    b) split and merge queues, with a loop like in the paper (select
       split or merge depending on whether below/above triangle count,
       keep going until queues don't "overlap"). Simple priority
       as before recomputed for each triangle each frame. Still no
       stripping. Frustum cull computed as priorities are computed,
       no optimization for incremental changes.

    b1) add the incremental optimizations from the paper, where
        subtrees need not be recomputed in most cases (e.g. was
        IN and still IN means whole subtree is still IN and doesn't
        need to be touched).

    c) add incremental stripping. As triangles are split or merged,
       break up all strips these are part of and then perform as
       many "gluing" operations as you can to re-form longish strips.
       It can help to limit strip length to something like 8 (depending
       on hardware) since longer strips actually lose (or at least don't
       gain) performance because of internal queue lengths in graphics
       hardware. Also longer strips tend to cause more processing
       as you split and merge. You might also put off the re-stripping
       process until after all the triangles have been split and merged
       for a frame (you have to keep a list of triangles needing work
       so you don't traverse all the triangles to find them later).

d) add the deferred priority computation queueing (lists of triangles
per frame that need to be updated).

    e) even faster frustum culling by deferring the frustum cull in the
       same way as you deferred the priority updates in (d). This is
       NOT in the paper and I can help you figure out how to do this when
       you get this far.

The coherence-based methods can make or break whether you can fully use
the new hardware with wildly high triangle counts per frame. Touching
every triangle every frame with your optimizer will kill you soon
because the triangle counts are going up faster than your processor
speeds. Also the coherence methods pay off big time when the frame
rates go up. If you double the frame rate ***while cutting the camera
motion per frame in half***, the number of triangles you need to touch
gets cut almost in half when you fully exploit coherence.

...moreanother big assumption I made in ROAM was that
all lighting is happening in the textures and pixels, and
not at the vertices of the view-dependent mesh. Any
opinions on whether this will be a good assumption
given the latest graphics hardware (like >=nv10)?
It is *not* a good assumption for specular lighting
on old hardware...

One of the design goals of ROAM was to avoid feedback
problems like this by using a fast priority queue to
directly get either the triangle count or the accuracy
you want. That is *if* you know what you want! Because
of a lot of variations in compute and graphics load from
other parts of an application, these target numbers may need
to vary. These "other" resource hogs must be taken into
account, but I don't know of a clever universal way of
dealing with it. Certainly feedback can be given a shot
for some apps, but I think the success in general will be
limited and will require a great deal of tuning and luck.
It might be easiest to devote some fixed number of
milleseconds per frame to the ROAM mesh optimization
and drawing, then the fast and robust queue techniques will
serve well.

Scott LeGrand wrote:

> OK, so I got the merge queue working. Thanks. However, here's the
> bad news. What I see happening is that mergeable diamonds tend to
> end up locked in the middle of the frame since they tend to be at
> the highest level of subdivision. What this means is that a bunch of
> polygons flow offscreen but they are frozen since none of them are
> mergeable diamonds. Then a mergeable diamond flows offscreen and
> causes a daisychain reaction, frees up a bunch of polygons, and
> blam up pops a mountain off in the distance. It's very disconcerting.
>
> I am leaning towards dividing every frame as a result.
>
> Scott

First I want to make sure what you are seeing is the expected behaviour
or not. Expected is that you get the same (or nearly the same) output
triangles as if you had only used splitting from scratch each frame (BTW,
you should implement the split-only optimizer first--the split-plus-merge
is a refinement your should attempt only after the split-only works well,
albeit slowly). Minor differences may occur if some triangles have the
same priority. For testing you can force all priorities to be distinct.
This will be slow but only used for sanity checks. One way to do this is
to add a hoard of buckets to your queues and force each triangle to live
in it's own bucket, in effect doing a true sort. Triangles that
otherwise have the same priority can be ordered by their unique tree
indices (something that remains constant whether you are running the
split-only or split-and-merge versions of the optimizer). If the same
triangles happen in both cases then your merging is working right,
otherwise you've got some debugging to do.

For debugging, I can suggest a few possibilities without looking at your
source code or getting into a detailed exchange that should probably be
taken off-list (so as not to bore too many :).

-- a mergable diamond is two triangles in your active bintree that
share a base edge, and whose four children are all leaves. A leaf is any
active bintree triangle with no left child. Make sure that what you are
determining to be mergable diamonds agrees with this definition. Put a
sanity check in your code. This comes in two parts. Part (1) is to look
all the diamonds listed on the merge queue and make sure they all fit
this definition. Part (2) is to go through all the active bintree
triangles, test to see if they fit the definition, and if they do make
sure they are on the merge queue. Note that finest-level triangles can
only form the *children* of a mergable diamond, and will never form a
mergable diamond themselves.

-- a mergable diamond should never be locked. The configuration I
described as forming a mergable diamond can *always* be converted from
four triangles to two, and should *always* be on the merge queue. The
priority of the mergable diamond is the max of the two priorities
associated with its two triangles. Note that and OUT triangle should
have the lowest possible priority, and that a diamond with two OUT
triangles has the lowest possible priority. This means that *no*
mergable diamonds should exist outside the frustum when the optimizer is
finished. You can add a sanity check for this by looking for any
mergable diamonds with OUT labels on the two triangles, and indicating an
error if you find one.

-- if your triangle count is really low, you can get cascade effects as
the frustum moves that would cause a mountain to pop up. This is
normal! This will go away when your tri counts get to >3000 tris or so
for most terrain (really rough may take 4000-5000).

-- make sure you *only* count (and draw) triangles in your active
bintree that are leaves and are not labeled OUT. Do not count active
bintree triangles that are OUT or that have a left child. Sanity check
here is to count the hard way and compare to the count you maintain
incrementally.

So if these general guidelines/suggestions are still not working out for
you, I'd be happy to help in more specific detail off-list (for anyone
who needs it). Adding merge--ONCE YOU GET SPLIT-ONLY WORKING--is a big
win and I encourage you to keep working at it.

Tom Hubina wrote:

> >Ideally you want to use both techniques. You _need_ LOD if the view distance
> >is large and you want detail. But you want to take advantage of vertex
> >arrays/display lists. So, just set it up so that only a few percent of your
> >"blocks" are rebuilt every frame. This is doable, since we assume the
> >viewpoint changes slowly relative to the block size.
>
> Each engine has different requirements. I don't want my engine to require
> slow moving views. This is one of the reasons why I've been avoiding
> methods that require frame to frame coherency methods (like the original
> ROAM paper). It should be possible for the camera to make a full 180 degree
> rotation in well under a second. As I understand it, that kind of stuff
> will destroy coherent methods.

If you "sit and spin" with a frustum that covers 45 degrees from left
screenedge
to right screen edge (a fairly narrow view/modest zoom), then turning
180 degrees would take four steps with exactly no overlap, 8 with 50%
overlap, and 16 with 75% overlap. I would guess that the frame-to-frame
coherence would be a win at around 10-12 steps, which translates to
10-12 frames/sec for the 1sec turn time. Even without frame-to-frame
coherence you would probably still be better off using simple splits than
throwing out dynamic adaptive meshing altogether, unless you really only
have a tiny area of terrain. All this said, I completely agree that ROAM
could be a lot better in these cases. So let's fix it!

Since we're working on a fix, let's toss in one other problem you didn't
mention that kills coherence even more. Take the case where you have a tiny
corner of your screen just touching the horizon, and let your view direction
swing directly towards that corner. Initially you spend your entire triangle
budget on a few pixels of the screen, and then the triangles rapidly spread out
as the view swings, completely destroying coherence at the moment the
horizon comes into view. Here the tiniest change in view orientation
can cause frame-to-frame coherence to fail. (Note: in reality this isn't
as bad as I've made out since you typically hit some finite resolution limit
and get only a handful of triangles in the corner initially).

One answer is to ignore the frustum for purposes of splitting and merging,
but still use it to cull and decide what to actually send to the hardware
(culling is so fast using the hierarchy that frame-to-frame coherence
isn't needed to avoid a bottleneck at the frame rates we're dealing with).
We approximate the perspective divide-by-Z with an orientation-independent
divide-by-R. We replace the near clipping plane with a near clipping sphere.
If the triangles were distributed evenly over the sphere of view directions
after optimization (they aren't, but let's use this as a first cut), then you
multiply your old triangle-count target by the number of frustums it takes
to cover a sphere, and everything works! (yes, yes, the number of triangles
you are optimizing is a lot more, and the optimization is still the bottleneck
rather than the graphics hardware...but this will be fixed in a few months by
a faster ROAM-like optimizer ;-).

Now to fix things up for the case that the optimized triangles aren't spread
out evenly over the view-direction sphere. What we want to guarantee
is that we can instantly move to any view direction and get no more than
the target triangle count. Our optimization priority remains the same:
minimize
the maximum projected error (now divide-by-R projections), but now
for all possible view directions, i.e. just ignore direction. So the only
tricky part is that target-count guarantee.

The optimization outer loop becomes the following. If there is a frustum
direction with too many triangles, find the mergable diamond with the lowest
priority that is in such a frustum and merge it. If there is a triangle for
which all the
direction frustums have too few triangles, and which has higher priority than
all other triangles in this state, then split it. Keep going until the split
and
merge
queues don't overlap in more than one priority and there are no triangles
for which all direction frustums have too few triangles.

Too keep the frustum counts, keep a number of frustum-direction buckets.
The number of buckets should be maybe 10 times the number of frustums it
takes to cover the view-direction sphere. Each leaf triangle adds one to the
count of each bucket that sees its centerpoint (there are about 10 of them).

Something like this should work. I've been seriously considering implementing
this for a while, but other things have been higher on my priority queue ;-).
It

would be cool to hear if anyone gets something like this going though, so I
offer

it to the list.

You asked specifically about walking through the bintree structures and
outputing to OpenGL. Recall that the fastest way to send a triangulated
surface to OpenGL graphics hardware is to group triangles into strips.
A bintree triangle's data structure contains pointers to neighbors as well
as the parent and children. This allows you to easily walk in the strip
directions to construct strips. If you use the incremental stripping
procedure, then walking is not needed, only merging neighboring strips.
I'd recommend looking in the OpenGL programming guide for basic
information on what strips are and how to send them to the graphics
hardware. Note that you should separate the *construction* of the
strips from the *display* of the strips. Construction is the process
of collecting triangles into strips and storing the results as arrays.
Display is the process of making the OpenGL calls for these arrays.
The reason you separate strip construction and display is so the work
of construction can be amortized over several display operations.

Leonardo,

>   If _T_ is the root triangle, how can I traverse a bintri like this:
>
> +      +
> |\    /|
> | \ / |
> | \/ |
> |LN/\RN|
> | / \ |
> |/ T \|
> +------+

> It produces an infinite loop because _LN_'s right neighbor is _T_ that has
already been rendered!

>

>   I was thinking about doing something like this:
>
>   render(t) {
>     if (t->LC) render(t->LC);
>     if (t->RC) render(t->RC);
>
>     draw(t);
>
>     if (t->LN) render(t->LN);
>     if (t->RN) render(t->RN);
>     if (t->BN) render(t->BN);
>   }
>
>   But I see it's not that simple.

If you just want to output all the leaf triangles without strips, just do a
depth-first recursive
descent from all the coarsest-level (base mesh) triangles.

RenderAll(M) {
for all t in M, RenderRecurse(t);
}

RenderRecurse(t) {
if (t->LC exists) { RenderRecurse(t->LC); RenderRecurse(t->RC); return; )
draw(t);
}

If you want to actually make strips of triangles, then the neighbor links
become useful. First add a "marker" field to the bintree triangle data
structure. Initially clear all t->mark to 0, and set global int marker=1.
Then you can make RenderRecurse as follows to generate strips:

RenderRecurse(t) {
if (t->LC exists) { RenderRecurse(t->LC); RenderRecurse(t->RC); return; )
if (t->mark!=0) return; // the triangle has already been output
t_state={t,base_clockwise}; // you can fiddle with this to optimize strip
length
while (StripLeftWalk(&t_state)) ;
BgnStrip(t_state); // OpenGL calls to start strip, and output first two
vertices
while (StripRightWalk(&t_state)) ;
EndStrip(t_state);
marker++;
}

The "t_state" variable contains the bintree triangle and label of which edge
and direction for that edge (i.e. one of {base,left,right} with one of
{_clockwise,_counterclockwise} appended).

The procedure StripWalkLeft(&t_state) marks the triangle (t->mark=marker),
then looks at the neighbor associated with the edge label. If
the neighbor does not exist (boundary case) or is already marked (t->mark!=0),
then a 0 is returned. Otherwise the edges of the new triangle are examined
to see which points to the current triangle, and t_state is updated
appropriately to the new triangle and new edge label (so as to follow
OpenGL stripping rules). This walks you to the "left" edge of a strip,
leaving a trail of bread crumbs that you will follow in the next step
going to the right.

Now the calls to StripWalkRight(&t_state) will output the triangle strip.
The implementation is almost the same as StripWalkLeft(), but you now
stop when the neighbor does not exist or has a label that is not ==marker
or ==0. At the beginning of StripWalkRight(), instead of setting
t->mark=marker
you set t->mark= -marker to avoid infinite loops. Also needed is one OpenGL
call
to output an additional strip vertex.

Hopefully this sketch of the stripping procedure should get you started.

-

`Contact Information`

Mark A. Duchaineau -- duchaineau1@llnl.gov

Updated Feb 6, 2001