Overall you need to deal with four things each frame, (a) frustum
culling,
(b) priority updates (really, queue-index updates), (c) split/merge
surgery, and (d) stripping updates. I *don't* recommend you
try to make
a completely optimized implementation from scratch all at once.
Much
better (for your sanity, if you are anything like me) to make a
simple,
unoptimized implementation, do detailed timing tests, and optimize
one
bottleneck after another. I would think of things more in
terms of data
structures and and activities on those data structures that as
a single
"best" dataflow (in fact the overall picture of the dataflow will
evolve as you go along, in the outline I've suggested). Part
of the
problem is that the dataflow does become very strange after all
the
optimizations. This is unfortunate but I don't know a way
around it.
Here is a potential staging of the evolution of the dataflow:
a) split only, no frustum cull, no stripping,
simple priority
(project two points and take
distance between). This takes
a single "split" queue and
a very simple loop that stops when
the triangle count is met.
a1) add simple recursive frustum cull (IN/OUT bits as in paper)
b) split and merge queues, with a loop like in
the paper (select
split or merge depending on
whether below/above triangle count,
keep going until queues don't
"overlap"). Simple priority
as before recomputed for each
triangle each frame. Still no
stripping. Frustum cull
computed as priorities are computed,
no optimization for incremental
changes.
b1) add the incremental optimizations from the
paper, where
subtrees need not be
recomputed in most cases (e.g. was
IN and still IN means
whole subtree is still IN and doesn't
need to be touched).
c) add incremental stripping. As triangles
are split or merged,
break up all strips these
are part of and then perform as
many "gluing" operations as
you can to re-form longish strips.
It can help to limit strip
length to something like 8 (depending
on hardware) since longer
strips actually lose (or at least don't
gain) performance because
of internal queue lengths in graphics
hardware. Also longer
strips tend to cause more processing
as you split and merge.
You might also put off the re-stripping
process until after all the
triangles have been split and merged
for a frame (you have to keep
a list of triangles needing work
so you don't traverse all
the triangles to find them later).
d) add the deferred priority computation queueing
(lists of triangles
per frame that need to be
updated).
e) even faster frustum culling by deferring the
frustum cull in the
same way as you deferred the
priority updates in (d). This is
NOT in the paper and I can
help you figure out how to do this when
you get this far.
The coherence-based methods can make or break whether you can fully
use
the new hardware with wildly high triangle counts per frame.
Touching
every triangle every frame with your optimizer will kill you soon
because the triangle counts are going up faster than your processor
speeds. Also the coherence methods pay off big time when
the frame
rates go up. If you double the frame rate ***while cutting
the camera
motion per frame in half***, the number of triangles you need to
touch
gets cut almost in half when you fully exploit coherence.
...moreanother big assumption I made in ROAM was that
all lighting is happening in the textures and pixels, and
not at the vertices of the view-dependent mesh. Any
opinions on whether this will be a good assumption
given the latest graphics hardware (like >=nv10)?
It is *not* a good assumption for specular lighting
on old hardware...
One of the design goals of ROAM was to avoid feedback
problems like this by using a fast priority queue to
directly get either the triangle count or the accuracy
you want. That is *if* you know what you want! Because
of a lot of variations in compute and graphics load from
other parts of an application, these target numbers may need
to vary. These "other" resource hogs must be taken into
account, but I don't know of a clever universal way of
dealing with it. Certainly feedback can be given a shot
for some apps, but I think the success in general will be
limited and will require a great deal of tuning and luck.
It might be easiest to devote some fixed number of
milleseconds per frame to the ROAM mesh optimization
and drawing, then the fast and robust queue techniques will
serve well.
Scott LeGrand wrote:
> OK, so I got the merge queue working. Thanks. However,
here's the
> bad news. What I see happening is that mergeable diamonds
tend to
> end up locked in the middle of the frame since they tend
to be at
> the highest level of subdivision. What this means is that
a bunch of
> polygons flow offscreen but they are frozen since none of them
are
> mergeable diamonds. Then a mergeable diamond flows offscreen
and
> causes a daisychain reaction, frees up a bunch of polygons, and
> blam up pops a mountain off in the distance. It's very
disconcerting.
>
> I am leaning towards dividing every frame as a result.
>
> Scott
First I want to make sure what you are seeing is the expected behaviour
or not. Expected is that you get the same (or nearly the
same) output
triangles as if you had only used splitting from scratch each frame
(BTW,
you should implement the split-only optimizer first--the split-plus-merge
is a refinement your should attempt only after the split-only works
well,
albeit slowly). Minor differences may occur if some triangles
have the
same priority. For testing you can force all priorities to
be distinct.
This will be slow but only used for sanity checks. One way
to do this is
to add a hoard of buckets to your queues and force each triangle
to live
in it's own bucket, in effect doing a true sort. Triangles
that
otherwise have the same priority can be ordered by their unique
tree
indices (something that remains constant whether you are running
the
split-only or split-and-merge versions of the optimizer).
If the same
triangles happen in both cases then your merging is working right,
otherwise you've got some debugging to do.
For debugging, I can suggest a few possibilities without looking
at your
source code or getting into a detailed exchange that should probably
be
taken off-list (so as not to bore too many :).
-- a mergable diamond is two triangles in your active bintree
that
share a base edge, and whose four children are all leaves.
A leaf is any
active bintree triangle with no left child. Make sure that
what you are
determining to be mergable diamonds agrees with this definition.
Put a
sanity check in your code. This comes in two parts.
Part (1) is to look
all the diamonds listed on the merge queue and make sure they all
fit
this definition. Part (2) is to go through all the active
bintree
triangles, test to see if they fit the definition, and if they
do make
sure they are on the merge queue. Note that finest-level
triangles can
only form the *children* of a mergable diamond, and will never
form a
mergable diamond themselves.
-- a mergable diamond should never be locked. The configuration
I
described as forming a mergable diamond can *always* be converted
from
four triangles to two, and should *always* be on the merge queue.
The
priority of the mergable diamond is the max of the two priorities
associated with its two triangles. Note that and OUT triangle
should
have the lowest possible priority, and that a diamond with two
OUT
triangles has the lowest possible priority. This means that
*no*
mergable diamonds should exist outside the frustum when the optimizer
is
finished. You can add a sanity check for this by looking
for any
mergable diamonds with OUT labels on the two triangles, and indicating
an
error if you find one.
-- if your triangle count is really low, you can get cascade effects
as
the frustum moves that would cause a mountain to pop up.
This is
normal! This will go away when your tri counts get to >3000
tris or so
for most terrain (really rough may take 4000-5000).
-- make sure you *only* count (and draw) triangles in your
active
bintree that are leaves and are not labeled OUT. Do not count
active
bintree triangles that are OUT or that have a left child.
Sanity check
here is to count the hard way and compare to the count you maintain
incrementally.
So if these general guidelines/suggestions are still not working
out for
you, I'd be happy to help in more specific detail off-list (for
anyone
who needs it). Adding merge--ONCE YOU GET SPLIT-ONLY WORKING--is
a big
win and I encourage you to keep working at it.
Tom Hubina wrote:
> >Ideally you want to use both techniques. You _need_ LOD if the
view distance
> >is large and you want detail. But you want to take advantage
of vertex
> >arrays/display lists. So, just set it up so that only a few
percent of your
> >"blocks" are rebuilt every frame. This is doable, since we assume
the
> >viewpoint changes slowly relative to the block size.
>
> Each engine has different requirements. I don't want my engine
to require
> slow moving views. This is one of the reasons why I've been avoiding
> methods that require frame to frame coherency methods (like the
original
> ROAM paper). It should be possible for the camera to make a full
180 degree
> rotation in well under a second. As I understand it, that kind
of stuff
> will destroy coherent methods.
If you "sit and spin" with a frustum that covers 45 degrees from
left
screenedge
to right screen edge (a fairly narrow view/modest zoom), then turning
180 degrees would take four steps with exactly no overlap, 8 with
50%
overlap, and 16 with 75% overlap. I would guess that the
frame-to-frame
coherence would be a win at around 10-12 steps, which translates
to
10-12 frames/sec for the 1sec turn time. Even without frame-to-frame
coherence you would probably still be better off using simple splits
than
throwing out dynamic adaptive meshing altogether, unless you really
only
have a tiny area of terrain. All this said, I completely
agree that ROAM
could be a lot better in these cases. So let's fix it!
Since we're working on a fix, let's toss in one other problem you
didn't
mention that kills coherence even more. Take the case where
you have a tiny
corner of your screen just touching the horizon, and let your view
direction
swing directly towards that corner. Initially you spend your
entire triangle
budget on a few pixels of the screen, and then the triangles rapidly
spread out
as the view swings, completely destroying coherence at the moment
the
horizon comes into view. Here the tiniest change in view
orientation
can cause frame-to-frame coherence to fail. (Note: in reality
this isn't
as bad as I've made out since you typically hit some finite resolution
limit
and get only a handful of triangles in the corner initially).
One answer is to ignore the frustum for purposes of splitting and
merging,
but still use it to cull and decide what to actually send to the
hardware
(culling is so fast using the hierarchy that frame-to-frame coherence
isn't needed to avoid a bottleneck at the frame rates we're dealing
with).
We approximate the perspective divide-by-Z with an orientation-independent
divide-by-R. We replace the near clipping plane with a near
clipping sphere.
If the triangles were distributed evenly over the sphere of view
directions
after optimization (they aren't, but let's use this as a first
cut), then you
multiply your old triangle-count target by the number of frustums
it takes
to cover a sphere, and everything works! (yes, yes, the number
of triangles
you are optimizing is a lot more, and the optimization is still
the bottleneck
rather than the graphics hardware...but this will be fixed in a
few months by
a faster ROAM-like optimizer ;-).
Now to fix things up for the case that the optimized triangles aren't
spread
out evenly over the view-direction sphere. What we want to
guarantee
is that we can instantly move to any view direction and get no
more than
the target triangle count. Our optimization priority remains
the same:
minimize
the maximum projected error (now divide-by-R projections), but
now
for all possible view directions, i.e. just ignore direction.
So the only
tricky part is that target-count guarantee.
The optimization outer loop becomes the following. If there
is a frustum
direction with too many triangles, find the mergable diamond with
the lowest
priority that is in such a frustum and merge it. If there
is a triangle for
which all the
direction frustums have too few triangles, and which has higher
priority than
all other triangles in this state, then split it. Keep going
until the split
and
merge
queues don't overlap in more than one priority and there are no
triangles
for which all direction frustums have too few triangles.
Too keep the frustum counts, keep a number of frustum-direction
buckets.
The number of buckets should be maybe 10 times the number of frustums
it
takes to cover the view-direction sphere. Each leaf triangle
adds one to the
count of each bucket that sees its centerpoint (there are about
10 of them).
Something like this should work. I've been seriously considering
implementing
this for a while, but other things have been higher on my priority
queue ;-).
It
would be cool to hear if anyone gets something like this going though,
so I
offer
it to the list.
You asked specifically about walking through the bintree structures
and
outputing to OpenGL. Recall that the fastest way to send
a triangulated
surface to OpenGL graphics hardware is to group triangles into
strips.
A bintree triangle's data structure contains pointers to neighbors
as well
as the parent and children. This allows you to easily walk
in the strip
directions to construct strips. If you use the incremental
stripping
procedure, then walking is not needed, only merging neighboring
strips.
I'd recommend looking in the OpenGL programming guide for basic
information on what strips are and how to send them to the graphics
hardware. Note that you should separate the *construction*
of the
strips from the *display* of the strips. Construction is
the process
of collecting triangles into strips and storing the results as
arrays.
Display is the process of making the OpenGL calls for these arrays.
The reason you separate strip construction and display is so the
work
of construction can be amortized over several display operations.
Leonardo,
> If _T_ is the root triangle, how can I traverse a
bintri like this:
>
> + +
> |\ /|
> | \ / |
> | \/ |
> |LN/\RN|
> | / \ |
> |/ T \|
> +------+
> It produces an infinite loop because _LN_'s right neighbor
is _T_ that has
already been rendered!
>
> I was thinking about doing something like this:
>
> render(t) {
> if (t->LC) render(t->LC);
> if (t->RC) render(t->RC);
>
> draw(t);
>
> if (t->LN) render(t->LN);
> if (t->RN) render(t->RN);
> if (t->BN) render(t->BN);
> }
>
> But I see it's not that simple.
If you just want to output all the leaf triangles without strips,
just do a
depth-first recursive
descent from all the coarsest-level (base mesh) triangles.
RenderAll(M) {
for all t in M, RenderRecurse(t);
}
RenderRecurse(t) {
if (t->LC exists) { RenderRecurse(t->LC); RenderRecurse(t->RC);
return; )
draw(t);
}
If you want to actually make strips of triangles, then the neighbor
links
become useful. First add a "marker" field to the bintree
triangle data
structure. Initially clear all t->mark to 0, and set global
int marker=1.
Then you can make RenderRecurse as follows to generate strips:
RenderRecurse(t) {
if (t->LC exists) { RenderRecurse(t->LC); RenderRecurse(t->RC);
return; )
if (t->mark!=0) return; // the triangle has already
been output
t_state={t,base_clockwise}; // you can fiddle with this
to optimize strip
length
while (StripLeftWalk(&t_state)) ;
BgnStrip(t_state); // OpenGL calls to start strip, and output
first two
vertices
while (StripRightWalk(&t_state)) ;
EndStrip(t_state);
marker++;
}
The "t_state" variable contains the bintree triangle and label of
which edge
and direction for that edge (i.e. one of {base,left,right} with
one of
{_clockwise,_counterclockwise} appended).
The procedure StripWalkLeft(&t_state) marks the triangle (t->mark=marker),
then looks at the neighbor associated with the edge label.
If
the neighbor does not exist (boundary case) or is already marked
(t->mark!=0),
then a 0 is returned. Otherwise the edges of the new triangle
are examined
to see which points to the current triangle, and t_state is updated
appropriately to the new triangle and new edge label (so as to
follow
OpenGL stripping rules). This walks you to the "left" edge
of a strip,
leaving a trail of bread crumbs that you will follow in the next
step
going to the right.
Now the calls to StripWalkRight(&t_state) will output the triangle
strip.
The implementation is almost the same as StripWalkLeft(), but you
now
stop when the neighbor does not exist or has a label that is not
==marker
or ==0. At the beginning of StripWalkRight(), instead of
setting
t->mark=marker
you set t->mark= -marker to avoid infinite loops. Also needed
is one OpenGL
call
to output an additional strip vertex.
Hopefully this sketch of the stripping procedure should get you started.
-
Updated Feb 6, 2001