`Overall you need to deal with four things each frame, (a) frustum
culling,`
`(b) priority updates (really, queue-index updates), (c) split/merge`
`surgery, and (d) stripping updates. I *don't* recommend you
try to make`
`a completely optimized implementation from scratch all at once.
Much`
`better (for your sanity, if you are anything like me) to make a
simple,`
`unoptimized implementation, do detailed timing tests, and optimize
one`
`bottleneck after another. I would think of things more in
terms of data`
`structures and and activities on those data structures that as
a single`
`"best" dataflow (in fact the overall picture of the dataflow will`
`evolve as you go along, in the outline I've suggested). Part
of the`
`problem is that the dataflow does become very strange after all
the`
`optimizations. This is unfortunate but I don't know a way
around it.`
`Here is a potential staging of the evolution of the dataflow:``
`

` a) split only, no frustum cull, no stripping,
simple priority`
` (project two points and take
distance between). This takes`
` a single "split" queue and
a very simple loop that stops when`
` the triangle count is met.``
`

` a1) add simple recursive frustum cull (IN/OUT
bits as in paper)``
`

` b) split and merge queues, with a loop like in
the paper (select`
` split or merge depending on
whether below/above triangle count,`
` keep going until queues don't
"overlap"). Simple priority`
` as before recomputed for each
triangle each frame. Still no`
` stripping. Frustum cull
computed as priorities are computed,`
` no optimization for incremental
changes.``
`

` b1) add the incremental optimizations from the
paper, where`
` subtrees need not be
recomputed in most cases (e.g. was`
` IN and still IN means
whole subtree is still IN and doesn't`
` need to be touched).``
`

` c) add incremental stripping. As triangles
are split or merged,`
` break up all strips these
are part of and then perform as`
` many "gluing" operations as
you can to re-form longish strips.`
` It can help to limit strip
length to something like 8 (depending`
` on hardware) since longer
strips actually lose (or at least don't`
` gain) performance because
of internal queue lengths in graphics`
` hardware. Also longer
strips tend to cause more processing`
` as you split and merge.
You might also put off the re-stripping`
` process until after all the
triangles have been split and merged`
` for a frame (you have to keep
a list of triangles needing work`
` so you don't traverse all
the triangles to find them later).``
`

` d) add the deferred priority computation queueing
(lists of triangles`
` per frame that need to be
updated).``
`

` e) even faster frustum culling by deferring the
frustum cull in the`
` same way as you deferred the
priority updates in (d). This is`
` NOT in the paper and I can
help you figure out how to do this when`
` you get this far.`
`
``
``
`` ``
`

`The coherence-based methods can make or break whether you can fully
use`
`the new hardware with wildly high triangle counts per frame.
Touching`
`every triangle every frame with your optimizer will kill you soon`
`because the triangle counts are going up faster than your processor`
`speeds. Also the coherence methods pay off big time when
the frame`
`rates go up. If you double the frame rate ***while cutting
the camera`
`motion per frame in half***, the number of triangles you need to
touch`
`gets cut almost in half when you fully exploit coherence.`
`
`` ``
`

`...moreanother big assumption I made in ROAM was that`
`all lighting is happening in the textures and pixels, and`
`not at the vertices of the view-dependent mesh. Any`
`opinions on whether this will be a good assumption`
`given the latest graphics hardware (like >=nv10)?`
`It is *not* a good assumption for specular lighting`
`on old hardware...`
` ``
`

`One of the design goals of ROAM was to avoid feedback`
`problems like this by using a fast priority queue to`
`directly get either the triangle count or the accuracy`
`you want. That is *if* you know what you want! Because`
`of a lot of variations in compute and graphics load from`
`other parts of an application, these target numbers may need`
`to vary. These "other" resource hogs must be taken into`
`account, but I don't know of a clever universal way of`
`dealing with it. Certainly feedback can be given a shot`
`for some apps, but I think the success in general will be`
`limited and will require a great deal of tuning and luck.`
`It might be easiest to devote some fixed number of`
`milleseconds per frame to the ROAM mesh optimization`
`and drawing, then the fast and robust queue techniques will`
`serve well.`
`
`` ``
`

`Scott LeGrand wrote:``
`

`> OK, so I got the merge queue working. Thanks. However,
here's the`
`> bad news. What I see happening is that mergeable diamonds
tend to`
`> end up locked in the middle of the frame since they tend
to be at`
`> the highest level of subdivision. What this means is that
a bunch of`
`> polygons flow offscreen but they are frozen since none of them
are`
`> mergeable diamonds. Then a mergeable diamond flows offscreen
and`
`> causes a daisychain reaction, frees up a bunch of polygons, and`
`> blam up pops a mountain off in the distance. It's very
disconcerting.`
`>`
`> I am leaning towards dividing every frame as a result.`
`>`
`> Scott``
`

`First I want to make sure what you are seeing is the expected behaviour`
`or not. Expected is that you get the same (or nearly the
same) output`
`triangles as if you had only used splitting from scratch each frame
(BTW,`
`you should implement the split-only optimizer first--the split-plus-merge`
`is a refinement your should attempt only after the split-only works
well,`
`albeit slowly). Minor differences may occur if some triangles
have the`
`same priority. For testing you can force all priorities to
be distinct.`
`This will be slow but only used for sanity checks. One way
to do this is`
`to add a hoard of buckets to your queues and force each triangle
to live`
`in it's own bucket, in effect doing a true sort. Triangles
that`
`otherwise have the same priority can be ordered by their unique
tree`
`indices (something that remains constant whether you are running
the`
`split-only or split-and-merge versions of the optimizer).
If the same`
`triangles happen in both cases then your merging is working right,`
`otherwise you've got some debugging to do.``
`

`For debugging, I can suggest a few possibilities without looking
at your`
`source code or getting into a detailed exchange that should probably
be`
`taken off-list (so as not to bore too many :).``
`

` -- a mergable diamond is two triangles in your active bintree
that`
`share a base edge, and whose four children are all leaves.
A leaf is any`
`active bintree triangle with no left child. Make sure that
what you are`
`determining to be mergable diamonds agrees with this definition.
Put a`
`sanity check in your code. This comes in two parts.
Part (1) is to look`
`all the diamonds listed on the merge queue and make sure they all
fit`
`this definition. Part (2) is to go through all the active
bintree`
`triangles, test to see if they fit the definition, and if they
do make`
`sure they are on the merge queue. Note that finest-level
triangles can`
`only form the *children* of a mergable diamond, and will never
form a`
`mergable diamond themselves.``
`

` -- a mergable diamond should never be locked. The configuration
I`
`described as forming a mergable diamond can *always* be converted
from`
`four triangles to two, and should *always* be on the merge queue.
The`
`priority of the mergable diamond is the max of the two priorities`
`associated with its two triangles. Note that and OUT triangle
should`
`have the lowest possible priority, and that a diamond with two
OUT`
`triangles has the lowest possible priority. This means that
*no*`
`mergable diamonds should exist outside the frustum when the optimizer
is`
`finished. You can add a sanity check for this by looking
for any`
`mergable diamonds with OUT labels on the two triangles, and indicating
an`
`error if you find one.``
`

`-- if your triangle count is really low, you can get cascade effects
as`
`the frustum moves that would cause a mountain to pop up.
This is`
`normal! This will go away when your tri counts get to >3000
tris or so`
`for most terrain (really rough may take 4000-5000).``
`

` -- make sure you *only* count (and draw) triangles in your
active`
`bintree that are leaves and are not labeled OUT. Do not count
active`
`bintree triangles that are OUT or that have a left child.
Sanity check`
`here is to count the hard way and compare to the count you maintain`
`incrementally.``
`

`So if these general guidelines/suggestions are still not working
out for`
`you, I'd be happy to help in more specific detail off-list (for
anyone`
`who needs it). Adding merge--ONCE YOU GET SPLIT-ONLY WORKING--is
a big`
`win and I encourage you to keep working at it.`
`
``
``
`` ``
`

`Tom Hubina wrote:``
`

`> >Ideally you want to use both techniques. You _need_ LOD if the
view distance`
`> >is large and you want detail. But you want to take advantage
of vertex`
`> >arrays/display lists. So, just set it up so that only a few
percent of your`
`> >"blocks" are rebuilt every frame. This is doable, since we assume
the`
`> >viewpoint changes slowly relative to the block size.`
`>`
`> Each engine has different requirements. I don't want my engine
to require`
`> slow moving views. This is one of the reasons why I've been avoiding`
`> methods that require frame to frame coherency methods (like the
original`
`> ROAM paper). It should be possible for the camera to make a full
180 degree`
`> rotation in well under a second. As I understand it, that kind
of stuff`
`> will destroy coherent methods.``
`

`If you "sit and spin" with a frustum that covers 45 degrees from
left`
`screenedge`
`to right screen edge (a fairly narrow view/modest zoom), then turning`
`180 degrees would take four steps with exactly no overlap, 8 with
50%`
`overlap, and 16 with 75% overlap. I would guess that the
frame-to-frame`
`coherence would be a win at around 10-12 steps, which translates
to`
`10-12 frames/sec for the 1sec turn time. Even without frame-to-frame`
`coherence you would probably still be better off using simple splits
than`
`throwing out dynamic adaptive meshing altogether, unless you really
only`
`have a tiny area of terrain. All this said, I completely
agree that ROAM`
`could be a lot better in these cases. So let's fix it!``
`

`Since we're working on a fix, let's toss in one other problem you
didn't`
`mention that kills coherence even more. Take the case where
you have a tiny`
`corner of your screen just touching the horizon, and let your view
direction`
`swing directly towards that corner. Initially you spend your
entire triangle`
`budget on a few pixels of the screen, and then the triangles rapidly
spread out`
`as the view swings, completely destroying coherence at the moment
the`
`horizon comes into view. Here the tiniest change in view
orientation`
`can cause frame-to-frame coherence to fail. (Note: in reality
this isn't`
`as bad as I've made out since you typically hit some finite resolution
limit`
`and get only a handful of triangles in the corner initially).``
`

`One answer is to ignore the frustum for purposes of splitting and
merging,`
`but still use it to cull and decide what to actually send to the
hardware`
`(culling is so fast using the hierarchy that frame-to-frame coherence`
`isn't needed to avoid a bottleneck at the frame rates we're dealing
with).`
`We approximate the perspective divide-by-Z with an orientation-independent`
`divide-by-R. We replace the near clipping plane with a near
clipping sphere.`
`If the triangles were distributed evenly over the sphere of view
directions`
`after optimization (they aren't, but let's use this as a first
cut), then you`
`multiply your old triangle-count target by the number of frustums
it takes`
`to cover a sphere, and everything works! (yes, yes, the number
of triangles`
`you are optimizing is a lot more, and the optimization is still
the bottleneck`
`rather than the graphics hardware...but this will be fixed in a
few months by`
`a faster ROAM-like optimizer ;-).``
`

`Now to fix things up for the case that the optimized triangles aren't
spread`
`out evenly over the view-direction sphere. What we want to
guarantee`
`is that we can instantly move to any view direction and get no
more than`
`the target triangle count. Our optimization priority remains
the same:`
`minimize`
`the maximum projected error (now divide-by-R projections), but
now`
`for all possible view directions, i.e. just ignore direction.
So the only`
`tricky part is that target-count guarantee.``
`

`The optimization outer loop becomes the following. If there
is a frustum`
`direction with too many triangles, find the mergable diamond with
the lowest`
`priority that is in such a frustum and merge it. If there
is a triangle for`
`which all the`
`direction frustums have too few triangles, and which has higher
priority than`
`all other triangles in this state, then split it. Keep going
until the split`
`and`
`merge`
`queues don't overlap in more than one priority and there are no
triangles`
`for which all direction frustums have too few triangles.``
`

`Too keep the frustum counts, keep a number of frustum-direction
buckets.`
`The number of buckets should be maybe 10 times the number of frustums
it`
`takes to cover the view-direction sphere. Each leaf triangle
adds one to the`
`count of each bucket that sees its centerpoint (there are about
10 of them).``
`

`Something like this should work. I've been seriously considering
implementing`
`this for a while, but other things have been higher on my priority
queue ;-).`
`It``
`

`would be cool to hear if anyone gets something like this going though,
so I`
`offer``
`

`it to the list.`
` ``
`

`You asked specifically about walking through the bintree structures
and`
`outputing to OpenGL. Recall that the fastest way to send
a triangulated`
`surface to OpenGL graphics hardware is to group triangles into
strips.`
`A bintree triangle's data structure contains pointers to neighbors
as well`
`as the parent and children. This allows you to easily walk
in the strip`
`directions to construct strips. If you use the incremental
stripping`
`procedure, then walking is not needed, only merging neighboring
strips.`
`I'd recommend looking in the OpenGL programming guide for basic`
`information on what strips are and how to send them to the graphics`
`hardware. Note that you should separate the *construction*
of the`
`strips from the *display* of the strips. Construction is
the process`
`of collecting triangles into strips and storing the results as
arrays.`
`Display is the process of making the OpenGL calls for these arrays.`
`The reason you separate strip construction and display is so the
work`
`of construction can be amortized over several display operations.`
` ``
`

`Leonardo,`
` ``
`

`> If _T_ is the root triangle, how can I traverse a
bintri like this:`
`>`
`> + +`
`> |\ /|`
`> | \ / |`
`> | \/ |`
`> |LN/\RN|`
`> | / \ |`
`> |/ T \|`
`> +------+`
`
`` ``
`

`> It produces an infinite loop because _LN_'s right neighbor
is _T_ that has`
`already been rendered!``
`

`>``
`

`> I was thinking about doing something like this:`
`>`
`> render(t) {`
`> if (t->LC) render(t->LC);`
`> if (t->RC) render(t->RC);`
`>`
`> draw(t);`
`>`
`> if (t->LN) render(t->LN);`
`> if (t->RN) render(t->RN);`
`> if (t->BN) render(t->BN);`
`> }`
`>`
`> But I see it's not that simple.``
`

`If you just want to output all the leaf triangles without strips,
just do a`
`depth-first recursive`
`descent from all the coarsest-level (base mesh) triangles.``
`

`RenderAll(M) {`
` for all t in M, RenderRecurse(t);`
`}``
`

`RenderRecurse(t) {`
` if (t->LC exists) { RenderRecurse(t->LC); RenderRecurse(t->RC);
return; )`
` draw(t);`
`}``
`

`If you want to actually make strips of triangles, then the neighbor
links`
`become useful. First add a "marker" field to the bintree
triangle data`
`structure. Initially clear all t->mark to 0, and set global
int marker=1.`
`Then you can make RenderRecurse as follows to generate strips:``
`

`RenderRecurse(t) {`
` if (t->LC exists) { RenderRecurse(t->LC); RenderRecurse(t->RC);
return; )`
` if (t->mark!=0) return; // the triangle has already
been output`
` t_state={t,base_clockwise}; // you can fiddle with this
to optimize strip`
`length`
` while (StripLeftWalk(&t_state)) ;`
` BgnStrip(t_state); // OpenGL calls to start strip, and output
first two`
`vertices`
` while (StripRightWalk(&t_state)) ;`
` EndStrip(t_state);`
` marker++;`
`}``
`

`The "t_state" variable contains the bintree triangle and label of
which edge`
`and direction for that edge (i.e. one of {base,left,right} with
one of`
`{_clockwise,_counterclockwise} appended).``
`

`The procedure StripWalkLeft(&t_state) marks the triangle (t->mark=marker),`
`then looks at the neighbor associated with the edge label.
If`
`the neighbor does not exist (boundary case) or is already marked
(t->mark!=0),`
`then a 0 is returned. Otherwise the edges of the new triangle
are examined`
`to see which points to the current triangle, and t_state is updated`
`appropriately to the new triangle and new edge label (so as to
follow`
`OpenGL stripping rules). This walks you to the "left" edge
of a strip,`
`leaving a trail of bread crumbs that you will follow in the next
step`
`going to the right.``
`

`Now the calls to StripWalkRight(&t_state) will output the triangle
strip.`
`The implementation is almost the same as StripWalkLeft(), but you
now`
`stop when the neighbor does not exist or has a label that is not
==marker`
`or ==0. At the beginning of StripWalkRight(), instead of
setting`
`t->mark=marker`
`you set t->mark= -marker to avoid infinite loops. Also needed
is one OpenGL`
`call`
`to output an additional strip vertex.``
`

`Hopefully this sketch of the stripping procedure should get you
started.``
`

`-`
` ``
`

`
`

`Updated Feb 6, 2001`