GenesisEngine: Coordinate Systems

(I originally posted this on my MSDN blog.)

This is not going to be a general-interest post so if you’re not interested in my GenesisEngine project or planetary terrain rendering, feel free to move along.

Ok, so someone recently sent me an email and asked me to write about the various coordinate systems in GenesisEngine.  I defined them all briefly with comments in the source code (QuadMesh.cs) but there’s a lot more that could be said on the subject.

As I’ve said before, I’m not doing any original research here – I’m just borrowing ideas from other places and re-implementing them for my own amusement.  For original material you might start with Jochen Winzen’s Interactive Visualization of a Planetary System paper and then move on to blogs such as the one for the Britonia project.  This is a pretty complicated topic and I just barely understand it myself.

The problem

The biggest challenge one faces when building a real-time planetary terrain rendering system is that of numerical precision.  First you want to be able to move around on the surface of the planet about six feet off the ground.  For that we’d like to have terrain mesh resolution of about two meters or so, which requires at least centimeter-level precision in the vertices so we can get smooth elevation changes and have everything fit together nicely.

An earth-sized planet has a radius of 6,378,100 meters or 637,810,000 centimeters.  That number contains nine significant digits, but unfortunately the 32-bit floating point data type only stores approximately seven significant digits.  Of course you can use a float to store very large (and very small) numbers, far larger than 637,810,000, but you can only express those very large numbers with seven digits of precision which means you’re forced to round off the least significant digits when you go above the single millions.  Rounding is fine when you’re observing the entire planet at once from far out in space but it’s completely unacceptable when you’re walking around on the surface.

To be absolutely clear, the issue here is the number of significant digits, not the order of magnitude of the values.  Someone might say, “Oh, just use a coordinate system calibrated in kilometers so then your vertices only have to go out to 6378 or so.  Problem solved!”  Actually, no, because in order to let users walk around on a realistic surface your elevation heights are going to have to be specified as something like 6378.17473 km here and 6378.17485 km there, but a float can only store 6378.174 km.  The last two digits just disappear.

We also have a 64-bit double-precision floating point data type available to us.  A double can represent about 16 digits of precision, which turns out to be enough to model the entire solar system out to Neptune and beyond at millimeter-level precision.  Yeah, there’s a big difference between 7 digits and 16 digits, isn’t there?  So doubles solve our problem.  Great!

Well, there’s one catch.  XNA, and DirectX underneath it, and the graphics hardware underneath that, deals only with 32-bit floats.  We can’t build a mesh with double-precision vertices and just tell XNA to render that mesh.  We have to build a mesh using floats.  We’re free to calculate the vertex positions and elevations using doubles but when we go to create an XNA VertexBuffer with that data we have to convert all of the numbers to floats which chops off digits that we really need to keep.  Hmmm, what to do?

On top of that, we also need to wrestle with the fact that curves and spherical objects are hard to work with in a discrete format.  Planes and grids we can easily do.  Spheres, not so much.  It would be great to do the heavy lifting in “flat plane” mode and only switch over to a spherical shape when we absolutely have to.

The solution to both problems is to work with several different coordinate systems, each of which are appropriate at a different stage of the mesh-generation pipeline.  Whenever you look at spatial data you always have to keep in mind which coordinate system is being used so that you can understand it in context.

So let’s walk through the entire pipeline of a vertex in GenesisEngine from start to finish.  This discussion would be helped a lot with some cool diagrams but I’m lazy so you’ll have to make do with a whole lot of words.  Bear with me.

The planet in GenesisEngine is fundamentally represented by six square, flat quad-trees.  Well, they start out as flat then become spherical somewhere along the way.  If the camera gets too close and we need more detail than we have, we can split the quad into four equal pieces and repeat the following process for each one of the sub-quads.  I’m not talking about the quad-tree subdivision in this post so all you really need to understand is that the planet starts out as a box, like this:


If we get a little closer each of the quads get split into four sub-quads, like this:


The green border lines indicate the area of the original quad and the red border lines indicate boundaries between sub-quads, which helps us keep track of where we are when things go spherical-shaped later on.

Quad grid space

The first coordinate system we have to deal with is what I named “quad grid” coordinates, which is simply the integer column and row of the vertex on the quad.  This is a column-major system so the first number gives you the horizontal location and the second number gives you the vertical location.  See the next screenshot below which makes the triangles and the vertices that define the triangles a little clearer.  (I disabled updates for this shot in order to get in closer without having the quad split.)  Each quad has a square grid of 65 by 65 vertices and the triangles are simply lines drawn between them.  So a vertex starts life with an identity that says, “I’m the vertex on the 23rd column and the 37th row of this quad.”  It doesn’t matter how big or small or how many levels deep the quad is because it’s all relative to the quad itself.


Unit plane space

However, the quad doesn’t exist in total isolation.  It is a member of a group of 1..N quads that make up one of the six faces of the planet, so the next step is to describe this vertex in terms of where it is relative to the face.  I called this coordinate system the “unit plane” system.  Let’s define the face to have a extent from –1 to 1 in both directions, so 0,0 is exactly in the center of the face.  (I guess that’s technically not a unit plane because it’s two units wide and long, but oh well.)  Each quad has a certain position on that face and so does each of its vertices.

In the case where we haven’t split the original quad then the extents of the quad are the same as the extents of the face, from –1 to 1 in both directions.  So the coordinates of the upper left vertex on this quad are -1, 1 and the lower right is 1, -1, and the rest of the vertices are evenly distributed in 64 equal steps between those extremes in both directions.

In the case where we have split the quad into four sub-quads, each of them covers one quarter of the face.  (Refer to the sub-divided picture above.)  So if we examine the upper left quad and consider its upper left vertex, it’s “unit plane” coordinates would be -1, 1 as with the parent quad but its lower right vertex would be 0, 0, which also happens to be the coordinates of the upper left vertex of the lower right quad.

It’s actually slightly more complicated in practice because we have six faces each facing a different direction in 3D space.  The vertices in “unit plane” space are actually 3D vertices that describe a cube two units wide, long, and deep.  For each face, two of the three x-y-z components vary to describe the locations on the face and the other is fixed at 1 or –1 as appropriate.  0,0,0 is the center of the cube (i.e. the center of the planet).  These numbers are unitless so far and would be the same for planets of any real-world size.

Unit sphere space

Now that we have our vertices laid out in nice, evenly spaced rows and columns across our faces, it’s time to get spherical.  Conceptually it’s not difficult to transform our vertices from box-like to sphere-like; we just bulge out the middles of each face and/or pull in the edges and presto, we have a sphere.

There are a couple of ways you can do the math here.  The most straightforward way is to to take each vertex in “unit plane” space, treat it vector, and convert it to a unit length vector.  Tada, instant sphere.  This works ok but results in triangles of uneven sizes over the extents of the face.  There is a better conversion algorithm that requires a bit more math but gives a nicer result and doesn’t cost that much more relative to everything else that’s going on.  Follow the link to learn more about that.

Ok, so now we have our vertices transformed to “unit sphere” space, as in the picture below:


Planet space

Now it’s time to start thinking about real-world sizes.  How big do you want this planet to be?  In GenesisEngine the convention is that one unit in “planet space” coordinates represents one meter of length.  To convert from “unit sphere” coordinates to “planet space” coordinates we simply multiply the vertex vector by the desired radius of the planet in meters.  If we want to do varied terrain heights, which we do because that’s kind of the point of this project, we would add the terrain height at this location to the planet radius before multiplying with vertex vector.  Now we have a lovely full-sized planet.

Mesh space

Up to this point all of the math has been done using doubles.  Now it’s getting close to the time when we have to convert all of these numbers into a mesh we can render, but this is where we run into our issue with significant digits.  We’ve done all of this work to get beautifully precise numbers that describe every elegant rise and fall of our terrain.  We don’t really want to dump them into the meat grinder of 32-bit floats.

The trick here is to temporarily change the origin point of the vertex data so that the origin is exactly in the center of each mesh, or to say it another way, we’re going to translate the geometry of each quad down to the planetary origin so that the overall range of numbers isn’t so large.  This puts us into “mesh space” which is the same as “planet space” except that the planet is now disassembled and all of its individual quads are jumbled up on top of each other at the center of the planet.  We also remember where each quad mesh is supposed to be located on the planet’s surface so we can reconstruct it in its proper location later.

What good does that do us?  I said above that merely changing the order of magnitude of our coordinate system doesn’t help us with the significant digit problem, which is still true.  What we’re doing here is something different; we’re keeping the order of magnitude of the coordinate system the same but we’re collapsing the range of values we need to express into just a small part of that range, which reduces the number of significant digits we need to express them.

Think of it this way: how many significant digits do we need to express the number 12,464,297?  Eight digits, too many for a float.  But what if we subtract out the 12,000,000 part and hold it somewhere off to the side?  How many significant digits do we need to represent 464,297?  Only six digits, which does fit in a float.  That doesn’t solve anything if you’re talking only one big number but if you have 65 x 65 = 4,225 big numbers, all of which are relatively close to the same size, then you can subtract some appropriate big number from all of them and be left with 4,225 small numbers to deal with.

Rendering big numbers without big numbers

Ok, once the vertices are translated into “mesh space” we finally have numbers that both (indirectly) represent the real world and are suitable for storage in floats.  We can build our vertex buffers from these vertices and send them to the graphics card.

But the mesh data we send to the graphics card is all jumbled up in the same small area.  It’s not planet-like, so how do we reconstruct the planet?  We could set our world matrix to translate each XNA mesh back to where it’s supposed to be but that kills our precision.  Well, this is where we play a coordinate-system trick with the camera.  Instead of moving the view frustum around in our real-world coordinate space, we keep the view frustum fixed at the origin point and we rotate and translate every piece of geometry in the world around the view point.  It’s kind of like driving down the freeway and imaging that your car is fixed in space while the earth spins beneath it.

How does that solve our problem?  Well, any geometry that’s close to the camera is going to be rendered close to the origin (i.e. using small numbers) on the graphics card, so those vertices won’t blow the float digit budget.  Any geometry that’s far away from the camera will be rendered far away from the origin (i.e. using big precision-losing numbers) on the graphics card but that doesn’t matter because they’re far away and small.  Rounding is ok if the viewer isn’t close enough to see the damage.

Specifically, we take a mesh that’s encoded to be centered around the origin, and we get its original location relative to the center of the planet (which is still stored in doubles).  We also get the location of the camera in real-world space (also stored in doubles).  We subtract the camera location from the original location of the mesh which gives us the location of the mesh relative to the camera (in “camera space”, I guess), and that tells us how much we need to translate the encoded mesh away from the real-world origin in order to appear in the view frustum exactly the same way it would appear if both the mesh and the view frustum were where they were supposed to be in real-world space (which they’re not).

The end result is that all of the numbers stored in 32-bit floats never represent full coordinate values in the world.  They only represent differences from some reference value.  We can manipulate the reference value to tell us where we need to tell the graphics card to draw the geometry such that everything appears to be in its proper place, even though to the graphics card itself the perspective is quite different.  The important numbers (the ones we see up close and personal) always stay relatively small while in float form and everybody’s happy.

As I said, it’s complicated.



GenesisEngine: The Task Parallel Library Is Great But Threading Can Still Bite You

(I originally posted this on my MSDN blog.)

The new Task Parallel Library in the .Net Framework 4.0 is intended to simplify the process of adding concurrency to your application and it does a great job of it.  I’m really impressed by how well it hides the mechanics of threading and lets you reason declaratively about what you want to do, not how it needs to be done.

My real-time engine, um . . . wasn’t

In GenesisEngine, I had a serious need to do some background processing.  As you move around in the world the engine dynamically generates appropriately-detailed quad nodes based on where you are, and generating the terrain meshes for those nodes takes a lot of CPU work.  Previously I was splitting nodes into more detailed ones on the main game loop thread during the Update pass and it caused a lot of stuttering in the frame rate.  In fact, if you were close the surface you basically couldn’t move in real time; you had to turn updates off, move the camera, turn updates back on, and wait for the detail nodes to be generated for your new location.  Yeah, it pretty much sucked.

Clearly it would be desirable to have all of that expensive node splitting and mesh generation happen on a separate thread.  First, my laptop has two CPU cores and one of them was just sitting there idle.  My desktop machine at work has four cores so three of them were twiddling their thumbs.  The future is multi-core so let’s get on that train; ok, we need concurrency.

Second, I’d like the the game engine to continue to draw smoothly with whatever quad nodes it has at the moment and then later pick up more detailed nodes as they become available.  I don’t want the draw loop to be delayed at all by waiting for nodes to be split.  That means that not only does the node splitting need to be concurrent but it also needs to be asynchronous.

The Task Parallel Library to the rescue

Ok, so that didn’t sound like fun.  I’ve written a decent amount of threading code from scratch in my time and while the principles are fairly straightforward the details can be an absolute killer.  I wasn’t looking forward to doing it again in GenesisEngine so, being a properly lazy developer, I decided to wait until I could use the Task Parallel Library to do it for me.  Once Visual Studio 2010 was out and the XNA 4.0 CTP was released, I had all the pieces in place to fix my problem.  Many excellent articles have already been written on how to use the TPL so I won’t bother with a general tutorial; I’ll just present a walkthrough of how I used it to solve a specific problem.

First, let’s look at the code I was trying to make concurrent and asynchronous:

private void Split()
    var subextents = _extents.Split();
    foreach (var subextent in subextents)
        var node = _quadNodeFactory.Create();
        node.Initialize(_planetRadius, _planeNormalVector, _uVector, _vVector,
                        subextent, Level + 1);
    _hasSubnodes = true;

The Split() method is pretty straightforward:

  1. Divide the extents of the node into four smaller extents.
  2. Creates new child nodes for each sub-extent.
  3. Add the new nodes to the list of children.
  4. Set a flag indicating that it now has children.

The strategy for the new implementation using the Task Parallel Library is pretty similar:

  1. Divide the extents of the node into four smaller extents.
  2. Create four tasks to run asynchronously and in parallel.  Each task will create one child node.
  3. Create another task that will run when all of the first four tasks have completed.  This task will add the new nodes to the list of children and set flags indicating that that split is finished and the node now has children.
private void Split(DoubleVector3 cameraLocation, DoubleVector3 planetLocation)
    var tasks = CreateBackgroundSplitTasks(cameraLocation, planetLocation);

List<Task> CreateBackgroundSplitTasks(DoubleVector3 cameraLocation, DoubleVector3 planetLocation)
    _splitInProgress = true;
    var subextents = _extents.Split();
    var tasks = new List<Task>();
    foreach (var subextent in subextents)
        var capturedExtent = subextent;
        var task = Task.Factory.StartNew(() =>
            var node = _quadNodeFactory.Create();
            node.Initialize(_planetRadius, _planeNormalVector, _uVector, _vVector,
                            capturedExtent, Level + 1);
            node.Update(cameraLocation, planetLocation);
            return node;
    return tasks;

void CreateSplitCompletionTask(List<Task> tasks)
    _splitCompletionTask = Task.Factory.ContinueWhenAll(tasks.ToArray(),
    finishedTasks =>
        foreach (var task in finishedTasks)
        _hasSubnodes = true;
        _splitInProgress = false;

The second implementation clearly has more code but considering what’s being accomplished here, it’s remarkably compact and elegant.  In CreateBackgroundSplitTasks(), we first set a flag indicating that a split is in progress.  This is so that when the main game thread does the next Update pass it won’t try to split the node again.  Next, we use the StartNew() method of the TPL task factory to create and run tasks that create new nodes.  The work that the task performs is defined in a lambda expression.  Each task starts running immediately and independently in the background and will return a reference to the newly-created node when it’s done.  Finally, CreateBackgroundSplitTasks() returns a list of all of the tasks it started.

How does the node know when all of its children are created and it’s safe to draw them?  For that we create a fifth task that is dependent on all of the previous four.  In CreateSplitCompletionTask() we use the very handy ContinueWhenAll() method of the TPL task factory to create another task that will only run after all of our node-creation tasks have completed.  This task puts all of the created nodes into the child list and sets some flags indicating that the split is finished and that the node now has children.

The great thing about the TPL is that nowhere in this code did I need to think about which threads are going to run which blocks of code, or how many threads I should start up, or even the mechanics of how the different threads should signal to each other.  How many cores do I have on this machine?  Don’t know, don’t care.  It’s all very declarative in nature.  It’s just, “Go away and do these four things in parallel as fast as you can, and when they’re all done, do this fifth thing.”  That’s it.


You still have to think about the consequences

Using the TPL does NOT give you a get-out-of-jail-free card.  The TPL lets you elegantly express your good concurrency intentions but, as they say, the road to hell is paved with just such.

If you read both versions of Split() above you’ll notice that the TPL version does something that the single-threaded, synchronous version did not: after a node is created, we immediately call Update() on the node.  It took me quite a while to infer the need for that.

When I first wrote the concurrent implementation it seemed to run and generate child nodes ok but there was occasionally a strange artifact where a recently split node wouldn’t be drawn correctly at first, leaving a black hole where it was supposed to be.  The artifact didn’t happen for all new nodes, only a few, and it only lasted for one frame so it was difficult to see exactly what was going on.  I eventually deduced the problem by carefully reasoning about the possible interleaving of operations in the parallel implementation.

Due to the way the synchronous implementation was written, newly-created nodes were guaranteed to be updated before they were drawn.  This didn’t happen in Split() but happened later in the update pass so it was kind of an implicit contract.  In the parallel implementation, on the other hand, there was no particular guarantee that a new node would be updated before being drawn the first time.  It turned out that due to timing the background split operation would usually complete between the draw and update passes so it would get updated in the next pass and then drawn, and everything was fine.  Sometimes, however, the background split would complete after an update pass but before a draw pass, and thus would be drawn without ever having been updated.  The code didn’t crash in this case (it would have been easier to debug if it had), but it didn’t draw as expected either.

The fix was to ensure that each node gets both initialized and updated in the background task before declaring the split operation complete.  The new nodes are updated with information that is slightly out of date but that doesn’t seem to matter in practice.

This was one of those tricky concurrency bugs that seems pretty trivial when I describe it in hindsight but it was pretty baffling when I first saw it.  The lesson: the TPL lets you easily implement concurrency but it doesn’t (and can’t) free you from the responsibility of understanding all the myriad of potential race conditions, deadlocks, and other assorted multi-threading nastiness.  Sorry, even if you have a nifty library to do the heavy lifting, you still need to understand what it’s doing under the covers and how it will impact your code.

Concurrency changes contracts

The asynchronous implementation created another problem I had to solve.  Some of my unit tests started failing.  It turned out this was because the tests were causing a node to split and then testing that the correct things happened (child nodes created, etc).  The tests were originally written with the assumption that once the call to Split() returned, the post-conditions could be immediately asserted, but with the async implementation that was no longer true.

The fix here was to capture a reference to the split completion task and make it available to the unit tests so that they could initiate a Split then explicitly wait for the completion task to finish.  This necessitated a bit of code in the QuadNode class that serves no other purpose than to aid unit testing but I don’t see any way around it.  The contract of Split() changed when I changed its implementation.  This aspect of the contract wasn’t expressed in the method signature but it changed all the same and required a corresponding change in the tests.  This just emphasizes the point that you have to think about the consequences every time you pull out the TPL.

The TPL is goodness

The TPL is a very fine piece of work and it makes implementing concurrency and parallelism dramatically easier.  Sure, I still ran into a few hiccups but that’s just the nature of the problem space.  I was very pleased with the result after I was finished; I can now move around in GenesisEngine and the frame rate stays pretty smooth at all times.  The splitting and merging of nodes happens in the background and will eventually catch up with my movements.  The engine scales very well with additional cores; on a quad-core processor it’s about three times as fast at generating new terrain as it is on a dual-core.  (The main game thread runs as fast as possible right now and takes up most of one core, leaving one core available for terrain generation on a dual-core processor and three cores available on a quad-core.)  The right thing just happens magically.

There’s still a lot of work to do to make my implementation less naive.  For instance, if a node split is scheduled in the background but the camera moves out of range before it happens, the split request should be discarded without actually doing it.  Also, priority should be given to nodes that are currently in the viewing frustum so the user sees the maximum amount of detail at all times.  The TPL isn’t quite magical enough to do all that for me but I expect it’ll give me the tools to do it in a compact and elegant way.

GenesisEngine: Listen To The Tests!

(I originally posted this on my MSDN blog.)

As I wrote last time, I made a bit of a mess in my GenesisEngine project by jamming too many responsibilities into one class.  I’m working on straightening that out and ran across some interesting observations already.  I’m not finished yet but I’ll share what I’ve discovered so far.

Performing the surgery

I decided to first separate the quad tree node responsibilities from the mesh generation responsibilities since there didn’t seem to be a lot of entangling between them and it appeared to be a straightforward exercise.  It turned out that there actually was a fair bit of entanglement in the split/merge logic and the tests helped me identify that and sort it out.  I’m a big believer in TDD and I’m still often surprised at how much clear feedback my unit tests give me about the quality of my design . . . if I take the time to listen!

Side note: as I’ve said before, I’m working on the GenesisEngine project and blogging about it in part because I wanted to provide some real-world examples that are a bit more complex and interesting than the typical toy problems you see in “intro to ” materials.  The downside of real-world examples is that it’s a lot harder to paste in a bit of code that adequately illustrates what I’m talking about, since, well, it’s complex.  I’ll do my best but if you’re really interested in understanding what’s going on you should probably inspect the diffs on GitHub or grab the code and study it.

So what happened?  The first step in breaking up my SRP problem was to create a new QuadMesh class to handle the generation and management of the terrain mesh data.  I moved the mesh code from QuadNode to QuadMesh and also created new QuadMeshRenderer and QuadMeshSpecs classes plus several other ancillary files.  Once that was done I had to resolve several compiler errors because it turned out that QuadNode.Update() relied on the presence of the mesh data which was no longer there.

Here’s the original version of QuadNode.Update():

public void Update(TimeSpan elapsedTime, DoubleVector3 cameraLocation, DoubleVector3 planetLocation,
                   ClippingPlanes clippingPlanes)
    var cameraRelationship = GetRelationshipToCamera(cameraLocation);
    DetermineVisibility(cameraLocation, planetLocation, cameraRelationship.ClosestVertex);
    if (_isVisible)
        if (clippingPlanes.Near > cameraRelationship.ClosestDistance)
            clippingPlanes.Near = cameraRelationship.ClosestDistance;
        if (clippingPlanes.Far < cameraRelationship.FurthestDistance)
            clippingPlanes.Far = cameraRelationship.FurthestDistance;
    var distanceFromCamera = cameraRelationship.ClosestDistance;
    if (_isVisible && distanceFromCamera < RealWidth() * 1 && !_hasSubnodes
        && Level = RealWidth() * 1.2 && _hasSubnodes)
    if (_hasSubnodes)
        foreach (var subnode in _subnodes)
            subnode.Update(elapsedTime, cameraLocation, planetLocation, clippingPlanes);

The GetRelationshipToCamera() method returned a private CameraRelationship DTO, which looked like this:

private class CameraRelationship
public DoubleVector3 ClosestVertex { get; set; }
public DoubleVector3 FurthestVertex { get; set; }
public double ClosestDistance { get; set; }
public double FurthestDistance { get; set; }

The compiler errors were in QuadNode.GetRelationshipToCamera().  The basic idea here is that QuadNode used to be looking at the mesh data and figuring out the distance from the camera to the closest vertex and the furthest vertex, and then was using that data to do several things:

  1. Figure out whether the node is visible
  2. Set the clipping planes appropriately if this node is closer or father than the clipping planes already are
  3. Decide whether to split or merge the node based on the ratio of camera distance to the real-space width of the node.

Complications set in

Ok, so obviously the GetRelationshipToCamera method needs to move to QuadMesh because it’s inspecting the mesh data, and the CameraRelationship class needs to be promoted to public so it can be shared between QuadNode and QuadMesh.  QuadNode.Update() would call QuadMesh.GetRelationshipToCamera() and use the returned CameraRelationship DTO as it has been before.  Simple.  I made those changes (among others) and got everything to compile.  There was only one change to QuadNode.Update(), which now looked like this:

public void Update(TimeSpan elapsedTime, DoubleVector3 cameraLocation, DoubleVector3 planetLocation, ClippingPlanes clippingPlanes)
    var cameraRelationship = _mesh.GetRelationshipToCamera(cameraLocation);
    // Other stuff is the same . . .

I then looked at my failing tests.  Hmmm.  All of my specs related to splitting and merging were failing because the stubbed-out QuadMesh object was returning null from GetRelationshipToCamera().  That’s not going to work.  To solve that, I would need to create a CameraRelationship object in the spec context, populate it with specific numbers that would cause QuadNode to make the correct decision, and configure GetRelationshipToCamera() on the QuadMesh stub to return it.  That means I’d have to think really hard about what the numbers ought to be in order to provoke the desired behavior in each spec context.  Yuck.

The good news is that I’m lazy and that sounded like a lot of work.  Way too much work, in fact.  I thought about it for a couple of seconds and remembered the golden rule of TDD: “If you have to work hard to write your tests, you’re doing it wrong.”

Laziness FTW!

Ok, what am I doing wrong here?  I have QuadNode going out to QuadMesh, retrieving a bunch of data, and making decisions based on that data.  What kind of decisions?  The same ones I listed above:

  1. Is the node visible to the camera?
  2. Do the clipping planes need to be adjusted to include this node?
  3. What is the ratio of camera distance to the real-space width of the node?

These decisions all have something to do with the mesh data:

  1. The visibility of the node is determined by the mesh because while a quad node has a 2D area, a mesh has a 3D volume.  A large mountain may stick up over the horizon and be visible.
  2. The clipping plane adjustments are determined by the mesh for the same reason: the mesh is 3D.
  3. The camera distance part of the ratio is determined by the closest part of the node, which again is determined by the 3D mesh.

It’s at about this point that I got a mental image in my head of my unit test suite as a grizzled old sensei glowering at me, saying, “Have you learned nothing!  Leave me and meditate upon the design principles!  Perhaps you will find wisdom.”

I was trying to violate at least two principles with my stupid approach.  First, the Law of Demeter, or “only talk to your immediate neighbors.”  QuadNode was reaching through QuadMesh into the CameraRelationship object to get data.

Second, the Tell, Don’t Ask principle, or “don’t ask for information you need to do something, ask the object holding the data to do it for you.”  Rather than telling QuadMesh to make decisions based on its private data and inform QuadNode of the results as I should have done, I was grabbing data from QuadMesh, moving it to QuadNode, and making the decision there.

Mind your own business

Ok, so how to fix it?  The fixes were pretty simple once I had the principles firmly in my mind:

  1. Ask the QuadMesh whether it is visible to the camera.  It has all the information needed to make that decision.
  2. Forward the clipping planes to the QuadMesh and have it modify them if necessary.
  3. Have QuadMesh calculate the ratio of of camera distance to the real-space width of the node and return that number.  (This is technically still getting data from QuadMesh but it’s data that’s easily stubbed out and I’m ok with this until I find something wrong with it.)

Here’s the new QuadNode.Update() method which properly tells QuadMesh to do work and make decisions on its behalf:

public void Update(TimeSpan elapsedTime, DoubleVector3 cameraLocation, DoubleVector3 planetLocation,
                   ClippingPlanes clippingPlanes)
    _mesh.Update(elapsedTime, cameraLocation, planetLocation, clippingPlanes);
    if (_mesh.IsVisibleToCamera && _mesh.WidthToCameraDistanceRatio < 1 && !_hasSubnodes
        && Level  1.2 && _hasSubnodes)
    if (_hasSubnodes)
        foreach (var subnode in _subnodes)
            subnode.Update(elapsedTime, cameraLocation, planetLocation, clippingPlanes);

There’s another interesting lesson here as well, derived from Tell, Don’t Ask: it’s ok to create public members on the class that are highly specific to questions that other classes need to ask, as long as doing so helps you to hide private information.  The QuadMesh.WidthToCameraDistanceRatio is a very specific sort of property.  If I were designing this class as part of a generalized public framework this wouldn’t be something it would occur to me to implement.  But I’m not designing a public framework; I’m designing a highly specific set of application classes that work with each other to solve a problem.  In this case my goal should be to keep as much information hidden as possible (in this case, like the distance from the camera to the mesh) and only expose answers to questions or processed information that answers a specific question.  This reduces coupling, increases cohesion, and makes the code more flexible and maintainable in the long run.

Side node: I’m maybe still not getting to the heart of Tell, Don’t Ask, since I’m still querying properties on QuadMesh rather than sending commands to QuadMesh, but it’s the best that I understand how to do right now.

It’s just magical how good unit tests will guide you to quality designs and warn you away from bad designs if you take the time to listen.  You’d think I’d get used to it after awhile but the novelty hasn’t worn off for me yet.  It’s ridiculously awesome.  I giggle like a kid every time I see it.

If you want to examine the diff or download the source as it is after this fix, you can find it here.

GenesisEngine: Yes, SRP Violations Hurt

(I originally posted this on my MSDN blog.)

In the process of my continuous learning about agile development, one of my biggest problems is that it’s easy to find materials that say, “Do this, don’t do that,” but offer only trivial examples at best.  I’m always wishing for some non-trivial examples of what the principles, or the violation of the principles, look like in the real world.  Part of the reason why I put GenesisEngine up on GitHub and am blogging about it here is to provide some slightly-less-than-trivial examples of good techniques, but just as importantly, examples of mistakes and how to fix them.

True confessions

So, I have a confession to make.

I work on GenesisEngine in my spare (ha!) time and progress has been kind of slow.  I spent a fair amount of time up front on infrastructure and was careful to build it all with TDD and SOLID principles.  I had a good time building that stuff but really, all of the interesting parts have to do with the generation and rendering of the terrain itself.  Everything else is just plumbing.

So after spending quite a number of weeks telling my friends that I was working on this nifty terrain engine project but having only a featureless white ball to show them, I really, really wanted to get something working that resembled actual planetary terrain.  The problem was moderately complex and I was growing impatient.  I started cutting corners.  My QuadNode class started out fairly small but it quickly accumulated a lot of responsibilities and started to sprawl all over the place.  I was violating the Single Responsibility Principle, and frankly, I made a mess.

Warning signs

One of the early warning signs that you have a fat class that does too many thing is that it’s not fun to write tests for it.  Rather than just having simple inputs and outputs to manage in your tests, you have to construct elaborate chains of actions just to get your object into the state that you want to test.  The tests aren’t elegant statements of intent; they’re full of confusing noise.

You’ll also see a lot of combinatorial explosion going on where you get a ridiculous number of different contexts that say, “when A is true and B is true and C is true”, then “when A is true and B is true and C is false”, and so on through all the combinations of states.  It’s tedious to write tests for each combination, especially when they’re messy tests anyway.

As I got deeper into the functionality of my quad tree terrain generation and rendering system, I started to clearly see those warning signs in my code.  But . . . I just wanted to get something working.  Like, now, darn it!  I was tired of waiting, and I resisted the obvious need to refactor the QuadNode class because it would take more time than I was willing to spend.  Rather than stopping to figure out how many responsibilities I had running around in QuadNode and then figuring out how to tease them apart into separate classes, I simply stopped writing tests for that class.  Once I did that then it was easy to not build the Perlin noise generation system test-first either.


In my non-technical life I’m into long-distance backpacking and in that world we have a term for when you’re about half a day out from town after four or five days and 100+ miles on the trail, and someone says the magic word.  “Pizza.”  Or maybe “hamburgers”.  The technical term for what happens then is “stampede”.  All common sense and self-preservation go right out the window and everyone hurtles down the trail at breakneck speed in an effort to get to town.  Sometimes people punish their bodies in ways they end up regretting later.

We stampede in software development, too.  We spend a lot of time being careful, doing thing right, making steady progress, but at some point close to the end we sometimes say, “Ah, screw it, let’s just hack it together and make it work!”  The result is usually something we call technical debt.  You build up a pile of messy stuff that you’ve got to go back and fix later.

I guess that’s not always a bad thing.  If you’re trying to hit an aggressive deadline and you need to just throw yourself forward right at the end, building up some technical debt is a valid way to do that.  Or if, like me, you just want to see something working and you’re not willing to wait, you can hack something together to scratch that itch.

The really evil thing about technical debt is not the short-term impact of creating it.  You can sometimes derive a lot of benefit from technical debt in the short term.  No, the evil thing about technical debt is when you don’t immediately go back and clean it up once your short-term goal is realized.

Anatomy of an SRP violation

Right now the QuadNode class is 472 text lines long.  Visual Studio code analysis reports that it has one of the worst maintainability indexes of any class in the project.  It has at least three big responsibilities jammed into it right now:

  1. As the name implies, it acts as a node in the quad tree.
  2. It also owns the job of generating heightfield data, vertex buffers, and index buffers.  This clearly has nothing to do with #1.
  3. It also has to decide when it’s appropriate to split itself into four node children or to merge and delete its children.  I first thought that was a trivial aspect of #1 but it turns out to be a huge deal in its own right.

Here’s one of the QuadNode spec contexts.  When a node is updated, it may decide to do certain things based on the state of the world.  In this case, when a non-leaf node (that is, a node that has children) is far enough away from the camera, the children nodes should be removed and disposed because we don’t need that level of detail any more.

public class when_a_nonleaf_node_is_updated_and_the_camera_is_far : QuadNodeContext
    public static DoubleVector3 _nearCameraLocation;
    public static DoubleVector3 _farCameraLocation;
    Establish context = () =>
        _nearCameraLocation = DoubleVector3.Up * 11;
        _farCameraLocation = DoubleVector3.Up * 15 * 10 * 2;
        _node.InitializeMesh(10, Vector3.Up, Vector3.Backward, Vector3.Right, _extents, 0);
        _node.Update(new TimeSpan(), _nearCameraLocation, DoubleVector3.Zero, _clippingPlanes);
    Because of = () =>
        _node.Update(new TimeSpan(), _farCameraLocation, DoubleVector3.Zero, _clippingPlanes);
    It should_remove_subnodes = () =>
    It should_dispose_subnodes = () =>
        foreach (var subnode in _node.Subnodes)
            ((IDisposable)subnode).AssertWasCalled(x => x.Dispose());

This is not a horrible test set.  Believe me, I’ve seen (and written!) worse.  But let’s look at a couple of things that it’s trying to tell me:

  • The process of setting up a non-leaf node in the context is built on indirect side-effects.  Instead of just telling my class under test, “Hey, assume you’re a non-leaf node”, I have to initialize the node’s mesh, then call .Update() with camera coordinates that are near enough to cause the node to split itself and generate children, then call .Update() again with different camera coordinates that are far enough to cause the node to merge its children.  The spec isn’t able to say what it means explicitly; it’s very roundabout.  Someone unfamiliar with the code base would probably have to put in significant effort to understand how the spec works.
  • There’s no way to determine whether the QuadNode we’re testing decided to merge its children except by inspecting its children.  Again, this is relying on indirect side-effects.  There’s no way to get a clear statement from the class that says, “Yes, I’ve decided to merge!”, which is really what I’m interested in testing here.
  • This spec context is one of four that test a combinatorial set of conditions:
    • When a non-leaf node is far away from the camera
    • When a non-leaf node is close to the camera
    • When a leaf node is far away from the camera
    • when a leaf node is close to the camera
    • when a leaf node is at the maximum allowable tree depth and is close to the camera
  • There is another factor that isn’t even mentioned in these specs because I didn’t want to deal with a doubling of the condition set.  A node should only be split if it’s not over the horizon and out of sight, and it should be merged if it does get too far over the horizon even if the camera isn’t far enough to cause a merge on its own.  That would turn my five contexts into nine.  Yuck.

The implementation of .Update() in QuadNode is about as circuitous as these specs would lead you to believe.  There’s a lot of stuff going on in Update but it’s not clearly explained.  There are quite a few tests and branches and it’s not very maintainable.

So what’s the root problem here?  The root problem is that I violated the Single Responsibility Principle.  The decision of whether to split or merge a quad node is a good-sized responsibility all on its own.  There are different ways to make that decision and it’s probably something I’ll want to fiddle with a lot over time since it heavily impacts performance and memory footprint.  I probably need a SplitMergeStrategy class for the QuadNode to depend on, or maybe even separate SplitStrategy and MergeStrategy classes.

What would that buy me?  First, it would help break apart the combinatorial set.  The QuadNode wouldn’t have to care anything about the position of the camera or whether it’s below the horizon.  All it would have to know is that if it’s a leaf node, make a call to SplitStrategy, otherwise make a call to MergeStrategy.  If the return value is true, do the appropriate thing.

SplitStrategy and MergeStrategy, for their part, wouldn’t have to know whether they’re being called by a leaf or non-leaf node.  They trust the QuadNode to take care of that question.  They just need to think about the camera distance and the horizon and respond with yes or no.  Not only does that reduce the combinatorial set but it also makes the inputs and outputs very explicit.  Inputs are numbers, output is a boolean.  No mysterious multiple calls to QuadNode.Update to set up the context and no mysterious poking at child nodes to determine the results.

Cleaning up my mess

The technical debt I incurred certainly accomplished my short-term goal.  I’ve got a working proof of concept of a planetary terrain engine and I feel satisfied at reaching that milestone.  However, now I have a problem.  The implementation of my terrain generation is very naive and does all of its work on the main thread.  At low altitudes this causes so much stuttering as to render the program virtually unusable unless you first turn off updates, move somewhere, then turn updates back on and wait awhile.  The fix for that is obviously to a) enlist my other cores for terrain generation and b) do the generation asynchronously so that camera movement and frame rate aren’t impacted, even if I have to wait a bit for higher levels of detail to show up.

Well, yes, that’s a great plan except that my QuadNode class is a mess.  The code that I need to make more complex with threading and async logic is exactly the code that’s already overly-complex and obtuse and isn’t fully covered by tests.  Ah, ok, now we see the downside of technical debt.  You get a quick spike of progress and then a long, slow, painful slide into hell.

I’ve promised myself that before I do any more significant work on this project, I’m going to clean up my mess and break QuadNode into multiple classes with single responsibilities.  I’m curious to see how it turns out.  If you want to take a closer look at the code as it is at the time of this writing, the permalink to the current tree is here.

GenesisEngine: Using WPF in XNA and other non-WPF applications

(I originally posted this on my MSDN blog.)

There are a couple of posts on the excellent Pandemonium game development blog (which sadly seems to have not been updated recently) that talk about the importance of making your game engine easily configurable and and diagnosable.  That’s important for any application, of course, but it’s particularly critical for graphics engine where things happen in real-time and a lot of what you see on the screen is not easily interpreted to root causes.  Diagnostic and configuration tools help you figure out what’s going on with your engine.

For GenesisEngine, I knew I wanted to have two debugging features:

  1. The ability to easily view the current configuration options and change them at runtime.
  2. The ability to view statistics and diagnostic information that would help me understand what the app is doing.

As I noted before, XNA doesn’t give you much help out of the box when it comes to building a UI with buttons, checkboxes, textboxes, and all those other things that we take for granted in standard Windows apps.  Development tools are important but I didn’t want to spend a lot of time building them.  Because I’m ok with my app being Windows-only right now, it made sense to try to use a Windows-based presentation system, like, say WPF.

The problem was that the XNA and WPF systems are very, very different and there wasn’t a whole lot of material that explained how to glue them together in one app.  Fortunately, the answer is pretty simple even if it was a little hard to find so I’ll share it here to help out anyone else who may be wondering the same thing.

To be clear, my approach here is to display WPF windows from an XNA application.  Embedding an XNA surface inside a WPF application is a whole different subject!  And actually this has nothing to do with XNA: the approach found below will work for any kind of application where you want to control the main app thread yourself and run WPF on a secondary thread.

In order for WPF to work correctly, it needs a few things:

  1. A separate STA thread
  2. A thread dispatcher object for that thread
  3. A message pump

Here’s my WindowManager that makes those things happen:

public class WindowManager : IWindowManager, IDisposable
    IContainer _container;
    IScreenCustodian _settingsCustodian;
    IScreenCustodian _statisticsCustodian;
    Dispatcher _windowDispatcher;
    public WindowManager(IContainer container)
        _container = container;
        _windowDispatcher.Invoke((Action)(() =>
            // We pull these out of the container here instead of doing normal
            // constructor injection because we need them to be created on this thread.
            _settingsCustodian =
            _statisticsCustodian =
    public void ShowAllWindows()
        _windowDispatcher.Invoke((Action)(() =>
    void StartUIThread()
        var dispatcherCreatedEvent = new ManualResetEvent(false);
        var thread = new Thread(() =>
            _windowDispatcher = Dispatcher.CurrentDispatcher;
        thread.IsBackground = true;
    public void Dispose()
        if (_windowDispatcher != null)

There are a few notable things here.  First, all of the WPF-related objects need to be created on the WPF thread.  I’m pulling them all out of my IoC container which means that they have to be pulled from the container on the WPF thread, not on the main app thread, which means that my WindowManager has to retrieve them from the container itself rather than having them injected.  Side node: I may be over-relying on the container again here but I have a very simple UI system at the moment so I haven’t run into major problems.

Second, when the WindowManager creates the UI thread it sets it to use the STA threading model which WPF requires.  It also makes it a background thread so that it won’t keep the application alive if the main thread quits.  That’s appropriate for GenesisEngine but maybe not for other apps.  The Event object is used to verify that the UI thread is indeed created and running before we continue.

Third, we call Dispatcher.Run to start the message pump on the UI thread.  If this isn’t done then WPF won’t work.

Fourth, all interaction between the main app thread and the WPF elements has to go through Dispatch.Invoke to marshal the calls onto the UI thread.  You can see that in the ShowAllWindows method.

Lastly, the WindowManager is disposable so that it can cleanly shut down the dispatcher’s message pump when appropriate.  Actually, I suspect I still have an issue with clean shutdown somewhere because occasionally the MSpec runner will complain about mysterious errors when cleaning up my unit tests but I haven’t yet invested a lot of time in chasing down the root cause.

This code seems to work pretty well to create and display WPF windows for my XNA app.  I’m not doing a whole lot with them yet; the statistics window updates itself once per second and shows a few interesting numbers but the settings window isn’t hooked up to anything yet.  I’ll make more use of them shortly but the infrastructure appears to be working.

GenesisEngine: Input

(I originally posted this on my MSDN blog.)

Once I had the event aggregator set up in GenesisEngine I could think about how to turn keyboard and mouse input into events that other parts of the app could respond to.

The XNA framework doesn’t offer as much help in this area as you might be used to in Windows Forms or WPF.  There isn’t any built-in windowing, or UI controls, or a commanding system so you pretty much have to build your own from scratch.  This isn’t a terribly difficult task, I suppose, but I like the way mine turned out.


All XNA offers for input is a KeyboardState and MouseState struct every update cycle that contain the current state of the keyboard (the current up or down state of every key) and the current state of the mouse cursor (where it currently is and whether each button is currently up or down).

In order to figure out interesting things like did was a certain key just now pressed this update or has it been held down for awhile, or how far did the mouse move between the last update and this one, you’ve got to track both the last state and the current one and check the differences yourself.  The XnaInputState class handles this responsibility but it’s pretty trivial so I won’t list it here.


The InputMapper class is a bit more interesting.  It stores mappings between states and event messages that should be sent when those states occur, where states in this case mean a key was pressed, or a key is being held down, or the mouse cursor moved.  The mappings are set up in code right now but could be loaded from a config file in the future.  This is from the main Genesis program class:

void SetInputBindings()
    // TODO: we don't specify which mouse button must be down
    // (hardcoded to right button ATM),
    // this can be extended when we need to.

When a message and an input state are mapped together, InputMapper stores them in lists for later use.  Specifically, it stores a set of delegates that will be executed when the correct input conditions are detected and these delegates send the proper messages to the event aggregator to be forwarded to whomever is interested in them.

I’m creating and storing a delegate that sends an event message rather than simply storing the type of the message because that was the only way I could figure out how to call EventAggregator.SendMessage with a strongly-typed message object.  Essentially I have to capture a generic type parameter, save it away, and later pass it to another generic method without losing its type information.  Creating a delegate at save time accomplishes that.  I’m not thrilled with how obtuse it makes the code but it’s livable for now.  I wouldn’t mind finding a better solution, though.

public void AddKeyPressMessage(Keys key) where T : InputMessage, new()
    _keyPressEvents.Add(new KeyEvent { Key = key, Send = x =>
        _eventAggregator.SendMessage(new T { InputState = x}) });
public void AddKeyDownMessage(Keys key) where T : InputMessage, new()
    _keyDownEvents.Add(new KeyEvent { Key = key, Send = x =>
        _eventAggregator.SendMessage(new T { InputState = x }) });
public void AddMouseMoveMessage() where T : InputMessage, new()
    _mouseMoveEvents.Add(new MouseMoveEvent { Send = x =>
        _eventAggregator.SendMessage(new T { InputState = x }) });
private class KeyEvent
    public Keys Key;
    public Action Send;
private class MouseMoveEvent
    public Action Send;

During each update, InputMapper is told to handle input and is passed an IInputState reference.  Based on this input state, it finds any message-sending delegates who’s conditions match the current conditions and executes those delegates.  InputMapper doesn’t know anything about who’s interested in input events, it just fires them.

public void HandleInput(IInputState inputState)

private void SendKeyPressMessages(IInputState inputState)
    foreach (var keyEvent in _keyPressEvents.Where(keyEvent =>

I like how the responsibilities are clearly separated in this system:

  1. Tracking input state changes – XnaInputState
  2. Firing events based on the current input state – InputMapper
  3. Actually sending the events to listeners – EventAggregator
  4. Receiving and acting on events – Implementers of IListener

I think it would be interesting to see how to incorporate the new Reactive Extensions into the input system.  Rather than checking the current input state against a set of mappings every time, the InputMapper would set up some LINQ expressions against an input event sequence.  I haven’t tried using the Reactive Extensions yet but from what I’ve read so far it seems like it should simplify the concepts here.

GenesisEngine: The Event Aggregator

(I originally posted this on my MSDN blog.)

GenesisEngine is still a pretty small code base at this point but there are some design elements that I’m pretty happy with.  I’ll run through a series of posts describing these parts so that people can learn from them or maybe critique them and find ways to make them even better.

Event Aggregator

I lifted the design of the event aggregator directly from Jeremy Miller’s posts on the subject and from his implementation in StoryTeller.  There’s not much I can add to what Jeremy’s already said but I’ll summarize the concept.

Here’s the very small interface for my aggregator:

public interface IEventAggregator
    void SendMessage(T message);
    void AddListener(object listener);

An event, or message, can be any type you want to use.  You can put as much context data as you wish into your message object.  When you send a message, the message object is forwarded to all listeners who have stated a desire to receive that type of message.

When you add listeners to the event aggregator you don’t explicitly list what messages you want the listener to receive.  Instead, the listener’s class definition is marked up by implementing one or more flavors of IListener, like so:

public interface IListener
    void Handle(T message);

public class Settings : ISettings,
    // Class stuff here

The event aggregator looks for those interfaces to figure out which messages the listener is interested in:

public void SendMessage(T message)
    IEnumerable recipients;
    lock (_lockObject)
        recipients = FindEligibleListeners();
    SendMessageToRecipients(message, recipients);

private IEnumerable FindEligibleListeners()
    var eligibleListeners = new List();
    foreach (var weakReference in _listeners)
        // We need to create a strong reference before testing aliveness
        // so that the GC doesn't yank it out from under us.  Don't convert
        // this to a LINQ expression because it doesn't guarentee that behavior
        var strongReference = weakReference.Target as IListener;
        if (strongReference != null)
    return eligibleListeners;

I use StructureMap as my IoC container and I’m using Jeremy’s neat trick of auto-registering listeners when they are created by the container, so most of the time I don’t even have to explicitly add listeners to the aggregator:

public class EventAggregatorTypeInterceptor : TypeInterceptor
    public object Process(object target, IContext context)
        return target;

    public bool MatchesType(Type type)
        return type.ImplementsInterfaceTemplate(typeof(IListener));

I don’t have a RemoveListener() method on my event aggregator right now for two reasons:

  1. The event aggregator holds weak references to all listeners so it’s not necessary to explicitly remove them for garbage-collection purposes.
  2. So far I haven’t had a need to remove a listener before the end of its lifetime so there’s been no need to implement that functionality yet.

I’m very happy with this eventing design.  It’s simple, has low overhead, and just feels elegant to me.

The source for GenesisEngine is available here.