GenesisEngine: Listen To The Tests!

(I originally posted this on my MSDN blog.)

As I wrote last time, I made a bit of a mess in my GenesisEngine project by jamming too many responsibilities into one class.  I’m working on straightening that out and ran across some interesting observations already.  I’m not finished yet but I’ll share what I’ve discovered so far.

Performing the surgery

I decided to first separate the quad tree node responsibilities from the mesh generation responsibilities since there didn’t seem to be a lot of entangling between them and it appeared to be a straightforward exercise.  It turned out that there actually was a fair bit of entanglement in the split/merge logic and the tests helped me identify that and sort it out.  I’m a big believer in TDD and I’m still often surprised at how much clear feedback my unit tests give me about the quality of my design . . . if I take the time to listen!

Side note: as I’ve said before, I’m working on the GenesisEngine project and blogging about it in part because I wanted to provide some real-world examples that are a bit more complex and interesting than the typical toy problems you see in “intro to ” materials.  The downside of real-world examples is that it’s a lot harder to paste in a bit of code that adequately illustrates what I’m talking about, since, well, it’s complex.  I’ll do my best but if you’re really interested in understanding what’s going on you should probably inspect the diffs on GitHub or grab the code and study it.

So what happened?  The first step in breaking up my SRP problem was to create a new QuadMesh class to handle the generation and management of the terrain mesh data.  I moved the mesh code from QuadNode to QuadMesh and also created new QuadMeshRenderer and QuadMeshSpecs classes plus several other ancillary files.  Once that was done I had to resolve several compiler errors because it turned out that QuadNode.Update() relied on the presence of the mesh data which was no longer there.

Here’s the original version of QuadNode.Update():

public void Update(TimeSpan elapsedTime, DoubleVector3 cameraLocation, DoubleVector3 planetLocation,
                   ClippingPlanes clippingPlanes)
{
    var cameraRelationship = GetRelationshipToCamera(cameraLocation);
    DetermineVisibility(cameraLocation, planetLocation, cameraRelationship.ClosestVertex);
    if (_isVisible)
    {
        if (clippingPlanes.Near > cameraRelationship.ClosestDistance)
        {
            clippingPlanes.Near = cameraRelationship.ClosestDistance;
        }
        if (clippingPlanes.Far < cameraRelationship.FurthestDistance)
        {
            clippingPlanes.Far = cameraRelationship.FurthestDistance;
        }
    }
    var distanceFromCamera = cameraRelationship.ClosestDistance;
    if (_isVisible && distanceFromCamera < RealWidth() * 1 && !_hasSubnodes
        && Level = RealWidth() * 1.2 && _hasSubnodes)
    {
        Merge();
    }
    if (_hasSubnodes)
    {
        foreach (var subnode in _subnodes)
        {
            subnode.Update(elapsedTime, cameraLocation, planetLocation, clippingPlanes);
        }
    }
}

The GetRelationshipToCamera() method returned a private CameraRelationship DTO, which looked like this:

private class CameraRelationship
{
public DoubleVector3 ClosestVertex { get; set; }
public DoubleVector3 FurthestVertex { get; set; }
public double ClosestDistance { get; set; }
public double FurthestDistance { get; set; }
}

The compiler errors were in QuadNode.GetRelationshipToCamera().  The basic idea here is that QuadNode used to be looking at the mesh data and figuring out the distance from the camera to the closest vertex and the furthest vertex, and then was using that data to do several things:

  1. Figure out whether the node is visible
  2. Set the clipping planes appropriately if this node is closer or father than the clipping planes already are
  3. Decide whether to split or merge the node based on the ratio of camera distance to the real-space width of the node.

Complications set in

Ok, so obviously the GetRelationshipToCamera method needs to move to QuadMesh because it’s inspecting the mesh data, and the CameraRelationship class needs to be promoted to public so it can be shared between QuadNode and QuadMesh.  QuadNode.Update() would call QuadMesh.GetRelationshipToCamera() and use the returned CameraRelationship DTO as it has been before.  Simple.  I made those changes (among others) and got everything to compile.  There was only one change to QuadNode.Update(), which now looked like this:

public void Update(TimeSpan elapsedTime, DoubleVector3 cameraLocation, DoubleVector3 planetLocation, ClippingPlanes clippingPlanes)
{
    var cameraRelationship = _mesh.GetRelationshipToCamera(cameraLocation);
    // Other stuff is the same . . .
}

I then looked at my failing tests.  Hmmm.  All of my specs related to splitting and merging were failing because the stubbed-out QuadMesh object was returning null from GetRelationshipToCamera().  That’s not going to work.  To solve that, I would need to create a CameraRelationship object in the spec context, populate it with specific numbers that would cause QuadNode to make the correct decision, and configure GetRelationshipToCamera() on the QuadMesh stub to return it.  That means I’d have to think really hard about what the numbers ought to be in order to provoke the desired behavior in each spec context.  Yuck.

The good news is that I’m lazy and that sounded like a lot of work.  Way too much work, in fact.  I thought about it for a couple of seconds and remembered the golden rule of TDD: “If you have to work hard to write your tests, you’re doing it wrong.”

Laziness FTW!

Ok, what am I doing wrong here?  I have QuadNode going out to QuadMesh, retrieving a bunch of data, and making decisions based on that data.  What kind of decisions?  The same ones I listed above:

  1. Is the node visible to the camera?
  2. Do the clipping planes need to be adjusted to include this node?
  3. What is the ratio of camera distance to the real-space width of the node?

These decisions all have something to do with the mesh data:

  1. The visibility of the node is determined by the mesh because while a quad node has a 2D area, a mesh has a 3D volume.  A large mountain may stick up over the horizon and be visible.
  2. The clipping plane adjustments are determined by the mesh for the same reason: the mesh is 3D.
  3. The camera distance part of the ratio is determined by the closest part of the node, which again is determined by the 3D mesh.

It’s at about this point that I got a mental image in my head of my unit test suite as a grizzled old sensei glowering at me, saying, “Have you learned nothing!  Leave me and meditate upon the design principles!  Perhaps you will find wisdom.”

I was trying to violate at least two principles with my stupid approach.  First, the Law of Demeter, or “only talk to your immediate neighbors.”  QuadNode was reaching through QuadMesh into the CameraRelationship object to get data.

Second, the Tell, Don’t Ask principle, or “don’t ask for information you need to do something, ask the object holding the data to do it for you.”  Rather than telling QuadMesh to make decisions based on its private data and inform QuadNode of the results as I should have done, I was grabbing data from QuadMesh, moving it to QuadNode, and making the decision there.

Mind your own business

Ok, so how to fix it?  The fixes were pretty simple once I had the principles firmly in my mind:

  1. Ask the QuadMesh whether it is visible to the camera.  It has all the information needed to make that decision.
  2. Forward the clipping planes to the QuadMesh and have it modify them if necessary.
  3. Have QuadMesh calculate the ratio of of camera distance to the real-space width of the node and return that number.  (This is technically still getting data from QuadMesh but it’s data that’s easily stubbed out and I’m ok with this until I find something wrong with it.)

Here’s the new QuadNode.Update() method which properly tells QuadMesh to do work and make decisions on its behalf:

public void Update(TimeSpan elapsedTime, DoubleVector3 cameraLocation, DoubleVector3 planetLocation,
                   ClippingPlanes clippingPlanes)
{
    _mesh.Update(elapsedTime, cameraLocation, planetLocation, clippingPlanes);
    if (_mesh.IsVisibleToCamera && _mesh.WidthToCameraDistanceRatio < 1 && !_hasSubnodes
        && Level  1.2 && _hasSubnodes)
    {
        Merge();
    }
    if (_hasSubnodes)
    {
        foreach (var subnode in _subnodes)
        {
            subnode.Update(elapsedTime, cameraLocation, planetLocation, clippingPlanes);
        }
    }
}

There’s another interesting lesson here as well, derived from Tell, Don’t Ask: it’s ok to create public members on the class that are highly specific to questions that other classes need to ask, as long as doing so helps you to hide private information.  The QuadMesh.WidthToCameraDistanceRatio is a very specific sort of property.  If I were designing this class as part of a generalized public framework this wouldn’t be something it would occur to me to implement.  But I’m not designing a public framework; I’m designing a highly specific set of application classes that work with each other to solve a problem.  In this case my goal should be to keep as much information hidden as possible (in this case, like the distance from the camera to the mesh) and only expose answers to questions or processed information that answers a specific question.  This reduces coupling, increases cohesion, and makes the code more flexible and maintainable in the long run.

Side node: I’m maybe still not getting to the heart of Tell, Don’t Ask, since I’m still querying properties on QuadMesh rather than sending commands to QuadMesh, but it’s the best that I understand how to do right now.

It’s just magical how good unit tests will guide you to quality designs and warn you away from bad designs if you take the time to listen.  You’d think I’d get used to it after awhile but the novelty hasn’t worn off for me yet.  It’s ridiculously awesome.  I giggle like a kid every time I see it.

If you want to examine the diff or download the source as it is after this fix, you can find it here.

GenesisEngine: Yes, SRP Violations Hurt

(I originally posted this on my MSDN blog.)

In the process of my continuous learning about agile development, one of my biggest problems is that it’s easy to find materials that say, “Do this, don’t do that,” but offer only trivial examples at best.  I’m always wishing for some non-trivial examples of what the principles, or the violation of the principles, look like in the real world.  Part of the reason why I put GenesisEngine up on GitHub and am blogging about it here is to provide some slightly-less-than-trivial examples of good techniques, but just as importantly, examples of mistakes and how to fix them.

True confessions

So, I have a confession to make.

I work on GenesisEngine in my spare (ha!) time and progress has been kind of slow.  I spent a fair amount of time up front on infrastructure and was careful to build it all with TDD and SOLID principles.  I had a good time building that stuff but really, all of the interesting parts have to do with the generation and rendering of the terrain itself.  Everything else is just plumbing.

So after spending quite a number of weeks telling my friends that I was working on this nifty terrain engine project but having only a featureless white ball to show them, I really, really wanted to get something working that resembled actual planetary terrain.  The problem was moderately complex and I was growing impatient.  I started cutting corners.  My QuadNode class started out fairly small but it quickly accumulated a lot of responsibilities and started to sprawl all over the place.  I was violating the Single Responsibility Principle, and frankly, I made a mess.

Warning signs

One of the early warning signs that you have a fat class that does too many thing is that it’s not fun to write tests for it.  Rather than just having simple inputs and outputs to manage in your tests, you have to construct elaborate chains of actions just to get your object into the state that you want to test.  The tests aren’t elegant statements of intent; they’re full of confusing noise.

You’ll also see a lot of combinatorial explosion going on where you get a ridiculous number of different contexts that say, “when A is true and B is true and C is true”, then “when A is true and B is true and C is false”, and so on through all the combinations of states.  It’s tedious to write tests for each combination, especially when they’re messy tests anyway.

As I got deeper into the functionality of my quad tree terrain generation and rendering system, I started to clearly see those warning signs in my code.  But . . . I just wanted to get something working.  Like, now, darn it!  I was tired of waiting, and I resisted the obvious need to refactor the QuadNode class because it would take more time than I was willing to spend.  Rather than stopping to figure out how many responsibilities I had running around in QuadNode and then figuring out how to tease them apart into separate classes, I simply stopped writing tests for that class.  Once I did that then it was easy to not build the Perlin noise generation system test-first either.

Stampede!

In my non-technical life I’m into long-distance backpacking and in that world we have a term for when you’re about half a day out from town after four or five days and 100+ miles on the trail, and someone says the magic word.  “Pizza.”  Or maybe “hamburgers”.  The technical term for what happens then is “stampede”.  All common sense and self-preservation go right out the window and everyone hurtles down the trail at breakneck speed in an effort to get to town.  Sometimes people punish their bodies in ways they end up regretting later.

We stampede in software development, too.  We spend a lot of time being careful, doing thing right, making steady progress, but at some point close to the end we sometimes say, “Ah, screw it, let’s just hack it together and make it work!”  The result is usually something we call technical debt.  You build up a pile of messy stuff that you’ve got to go back and fix later.

I guess that’s not always a bad thing.  If you’re trying to hit an aggressive deadline and you need to just throw yourself forward right at the end, building up some technical debt is a valid way to do that.  Or if, like me, you just want to see something working and you’re not willing to wait, you can hack something together to scratch that itch.

The really evil thing about technical debt is not the short-term impact of creating it.  You can sometimes derive a lot of benefit from technical debt in the short term.  No, the evil thing about technical debt is when you don’t immediately go back and clean it up once your short-term goal is realized.

Anatomy of an SRP violation

Right now the QuadNode class is 472 text lines long.  Visual Studio code analysis reports that it has one of the worst maintainability indexes of any class in the project.  It has at least three big responsibilities jammed into it right now:

  1. As the name implies, it acts as a node in the quad tree.
  2. It also owns the job of generating heightfield data, vertex buffers, and index buffers.  This clearly has nothing to do with #1.
  3. It also has to decide when it’s appropriate to split itself into four node children or to merge and delete its children.  I first thought that was a trivial aspect of #1 but it turns out to be a huge deal in its own right.

Here’s one of the QuadNode spec contexts.  When a node is updated, it may decide to do certain things based on the state of the world.  In this case, when a non-leaf node (that is, a node that has children) is far enough away from the camera, the children nodes should be removed and disposed because we don’t need that level of detail any more.

[Subject(typeof(QuadNode))]
public class when_a_nonleaf_node_is_updated_and_the_camera_is_far : QuadNodeContext
{
    public static DoubleVector3 _nearCameraLocation;
    public static DoubleVector3 _farCameraLocation;
    Establish context = () =>
    {
        _nearCameraLocation = DoubleVector3.Up * 11;
        _farCameraLocation = DoubleVector3.Up * 15 * 10 * 2;
        _node.InitializeMesh(10, Vector3.Up, Vector3.Backward, Vector3.Right, _extents, 0);
        _node.Update(new TimeSpan(), _nearCameraLocation, DoubleVector3.Zero, _clippingPlanes);
    };
    Because of = () =>
        _node.Update(new TimeSpan(), _farCameraLocation, DoubleVector3.Zero, _clippingPlanes);
    It should_remove_subnodes = () =>
        _node.Subnodes.Count.ShouldEqual(0);
    It should_dispose_subnodes = () =>
    {
        foreach (var subnode in _node.Subnodes)
        {
            ((IDisposable)subnode).AssertWasCalled(x => x.Dispose());
        }
    };
}

This is not a horrible test set.  Believe me, I’ve seen (and written!) worse.  But let’s look at a couple of things that it’s trying to tell me:

  • The process of setting up a non-leaf node in the context is built on indirect side-effects.  Instead of just telling my class under test, “Hey, assume you’re a non-leaf node”, I have to initialize the node’s mesh, then call .Update() with camera coordinates that are near enough to cause the node to split itself and generate children, then call .Update() again with different camera coordinates that are far enough to cause the node to merge its children.  The spec isn’t able to say what it means explicitly; it’s very roundabout.  Someone unfamiliar with the code base would probably have to put in significant effort to understand how the spec works.
  • There’s no way to determine whether the QuadNode we’re testing decided to merge its children except by inspecting its children.  Again, this is relying on indirect side-effects.  There’s no way to get a clear statement from the class that says, “Yes, I’ve decided to merge!”, which is really what I’m interested in testing here.
  • This spec context is one of four that test a combinatorial set of conditions:
    • When a non-leaf node is far away from the camera
    • When a non-leaf node is close to the camera
    • When a leaf node is far away from the camera
    • when a leaf node is close to the camera
    • when a leaf node is at the maximum allowable tree depth and is close to the camera
  • There is another factor that isn’t even mentioned in these specs because I didn’t want to deal with a doubling of the condition set.  A node should only be split if it’s not over the horizon and out of sight, and it should be merged if it does get too far over the horizon even if the camera isn’t far enough to cause a merge on its own.  That would turn my five contexts into nine.  Yuck.

The implementation of .Update() in QuadNode is about as circuitous as these specs would lead you to believe.  There’s a lot of stuff going on in Update but it’s not clearly explained.  There are quite a few tests and branches and it’s not very maintainable.

So what’s the root problem here?  The root problem is that I violated the Single Responsibility Principle.  The decision of whether to split or merge a quad node is a good-sized responsibility all on its own.  There are different ways to make that decision and it’s probably something I’ll want to fiddle with a lot over time since it heavily impacts performance and memory footprint.  I probably need a SplitMergeStrategy class for the QuadNode to depend on, or maybe even separate SplitStrategy and MergeStrategy classes.

What would that buy me?  First, it would help break apart the combinatorial set.  The QuadNode wouldn’t have to care anything about the position of the camera or whether it’s below the horizon.  All it would have to know is that if it’s a leaf node, make a call to SplitStrategy, otherwise make a call to MergeStrategy.  If the return value is true, do the appropriate thing.

SplitStrategy and MergeStrategy, for their part, wouldn’t have to know whether they’re being called by a leaf or non-leaf node.  They trust the QuadNode to take care of that question.  They just need to think about the camera distance and the horizon and respond with yes or no.  Not only does that reduce the combinatorial set but it also makes the inputs and outputs very explicit.  Inputs are numbers, output is a boolean.  No mysterious multiple calls to QuadNode.Update to set up the context and no mysterious poking at child nodes to determine the results.

Cleaning up my mess

The technical debt I incurred certainly accomplished my short-term goal.  I’ve got a working proof of concept of a planetary terrain engine and I feel satisfied at reaching that milestone.  However, now I have a problem.  The implementation of my terrain generation is very naive and does all of its work on the main thread.  At low altitudes this causes so much stuttering as to render the program virtually unusable unless you first turn off updates, move somewhere, then turn updates back on and wait awhile.  The fix for that is obviously to a) enlist my other cores for terrain generation and b) do the generation asynchronously so that camera movement and frame rate aren’t impacted, even if I have to wait a bit for higher levels of detail to show up.

Well, yes, that’s a great plan except that my QuadNode class is a mess.  The code that I need to make more complex with threading and async logic is exactly the code that’s already overly-complex and obtuse and isn’t fully covered by tests.  Ah, ok, now we see the downside of technical debt.  You get a quick spike of progress and then a long, slow, painful slide into hell.

I’ve promised myself that before I do any more significant work on this project, I’m going to clean up my mess and break QuadNode into multiple classes with single responsibilities.  I’m curious to see how it turns out.  If you want to take a closer look at the code as it is at the time of this writing, the permalink to the current tree is here.

GenesisEngine: Behavior-oriented Language

(I originally posted this on my MSDN blog.)

As I wrote in my previous post, BDD is largely about preserving the flow of intent from your user stories to your unit tests (specifications, in BDD parlance) to your product code.  As developers, we’re in the habit of switching over from user intent (features that solve problems) to developer intent (mechanics of the code) when we write tests, but preserving as much user intent as possible all the way through is a lot better for maintainability and it helps drive better initial designs, too.  It’s the same as the ubiquitous language in DDD.  Don’t abandon the ubiquitous language; stick with it as long as possible.

In other words, don’t focus on how the code works, think about how the system behaves.  The code will follow naturally.

It turns out, though, that it’s surprisingly hard to do this well.  At least, I find that I often have to remind myself to leave implementation language out of my context and specification names and to focus on how a user would describe the behavior in domain language.

I recently prepared a presentation on MSpec for my group at work and I used the Camera specs in the GenesisEngine project as an example of specs written in the behavior-focused style.  There’s nothing like putting your work in front of other people to make you take a fresh look at it with a critical eye!  As I read over my Camera specs, I realized that I had let some implementation language sneak in when I wasn’t looking.  In other places I had been pretty vague about the behavior that was actually expected.

For instance, consider this context:

[Subject(typeof(Camera))]
public class when_view_parameters_are_set_by_look_at : CameraContext
{
    Because of = () =>
        _camera.SetViewParameters(new DoubleVector3(0, 1, 1), DoubleVector3.Zero);
    It should_set_the_camera_location_correctly = () =>
        _camera.Location.ShouldEqual(new DoubleVector3(0, 1, 1));
    It should_set_the_camera_yaw_correctly = () =>
        _camera.Yaw.ShouldEqual(0f);
    It should_set_the_camera_pitch_correctly = () =>
        _camera.Pitch.ShouldEqual((float)(-Math.PI / 4));
    It should_set_the_camera_roll_correctly = () =>
    _camera.Roll.ShouldEqual(0f);
    It should_set_the_camera_view_transformation_correctly = () =>
        _camera.OriginBasedViewTransformation.ShouldEqual(
            GenerateOriginBasedViewMatrix(_camera.Location, _camera.Yaw,
                                          _camera.Pitch, _camera.Roll));
}

It should set location/yaw/pitch/roll/transformation correctly?  What the heck does that mean?  That tells very little about what my intent actually was.  I was just being lazy and didn’t want to bother with trying to carefully describe the intent.

Actually, I bet what I was thinking was something like, “Hmm, my expectation here is that these properties should be set to specifc numbers.  I don’t want to state those numbers in the spec names, though, because that’s a test detail.  I’ll just say it should set the properties ‘correctly’ because that sounds more generalized.”

But what was my real intent for the behavior of the camera when the view parameters are set via a look-at point?  Well, the real intent is that the camera should go to the requested location and set its orientation to whatever values are needed to face toward the look-at point from that location, and finally generate a new view transformation based on the new camera state.  Ok, now that’s a description that’s phrased in terms of the problem domain, not the implementation domain.  Let’s see if we can improve those specs:

[Subject(typeof(Camera))]
public class when_view_parameters_are_set_by_look_at : CameraContext
{
    Because of = () =>
        _camera.SetViewParameters(new DoubleVector3(0, 1, 1), DoubleVector3.Zero);
    It should_go_to_the_requested_location = () =>
        _camera.Location.ShouldEqual(new DoubleVector3(0, 1, 1));
    It should_set_the_yaw_to_face_toward_the_look_at_point = () =>
        _camera.Yaw.ShouldEqual(0f);
    It should_set_the_pitch_to_face_toward_the_look_at_point = () =>
        _camera.Pitch.ShouldEqual(-MathHelper.Pi / 4);
    It should_set_the_roll_to_face_toward_the_look_at_point = () =>
        _camera.Roll.ShouldEqual(0f);
    It should_generate_a_view_transformation_for_the_current_state = () =>
        _camera.OriginBasedViewTransformation.ShouldEqual(
            GenerateOriginBasedViewMatrix(_camera.Location, _camera.Yaw,
                                          _camera.Pitch, _camera.Roll));
}

That’s better.

In the location spec, I got away from the implementation language of “setting the location (property)” and used the domain language of “going to a location (in the world)”.  Very similar, but different perspectives.  For the orientation components, I described the intent of facing in a particular direction.  And for the view transformation, I called out the fact that the transformation is dependent on the new state.

Now, a lot of you may be looking askance at me right now.  Isn’t this nitpicking pretty silly?  Well, sure, it’s not earthshattering or anything.  I didn’t fix a bug or implement a new feature with these changes.  But I think I made the code a little bit cleaner, and that’s the real reason why I started this project in the first place.  It’s not about terrain rendering, it’s about fine-tuning my techniques.  Lesson learned: avoid writing specs that say something should be done “correctly”.  Describe what the correct behavior actually is.

The diff of the improvements I made to the Camera specs can be found in two parts here and here.

BDD Unit Testing is Not That Complicated

(I originally posted this on my MSDN blog.)

One of the first sessions at the Alt.Net Seattle conference was one on Behavior-Driven Development, or BDD.  Actually, we had three suggested sessions all related to BDD that we combined into one session, which was probably a bit of a mistake in hindsight because we had a couple different groups of people looking for different things (BDD at the unit test level vs. BDD at the acceptance test level), which caused a bit of controversy.

I think that at the unit test level, BDD really isn’t that different than normal TDD that we all know and love.  All it really brings to the table is a strong emphasis on the arrange-act-assert structure and an even stronger emphasis on the behaviors you’re trying to build in your code rather than the mechanics of the code itself.  In other words, BDD asks that you think in terms of what the user wants to do and how you’re going to enable them to do it.  You give clear, explicit names to each scenario that you need to implement and you also give clear names to each expectation that you have for the scenario.  The overall point is simply to write tests that people can actually read.

Anyway, Charlie Poole (one of the developers of NUnit) made a comment to the effect of, “Well, I’ve been doing that sort of thing in my unit tests for years already.  Why do we even have to have a special name for this?”  I also noticed a lot of other people asking things like, “Well, what about SOLID principles?  Do they still apply?  How about mocking frameworks or IoC containers?  Can I still use those?”

This confusion is really unnecessary, and Charlie’s right: it’s unfortunate that we even have a name for it that makes it sound like it’s something different than TDD.  At least at the unit test level, BDD is not a brand new way of writing tests.  It’s just the same old red-green-refactor workflow that we’ve always used; just with a stronger emphasis on expressing customer-oriented intentions so that when other developers have to pick up your code and maintain it later, they’ll know why your tests exist, what user value they map to, and when they break, it’ll be obvious what needs to be fixed.  You still use all the same state-based and interaction-based testing techniques in your tests and the same SOLID principles in your product code.  Nothing changes.

Relax – it’s not that complicated.

 

GenesisEngine: Using WPF in XNA and other non-WPF applications

(I originally posted this on my MSDN blog.)

There are a couple of posts on the excellent Pandemonium game development blog (which sadly seems to have not been updated recently) that talk about the importance of making your game engine easily configurable and and diagnosable.  That’s important for any application, of course, but it’s particularly critical for graphics engine where things happen in real-time and a lot of what you see on the screen is not easily interpreted to root causes.  Diagnostic and configuration tools help you figure out what’s going on with your engine.

For GenesisEngine, I knew I wanted to have two debugging features:

  1. The ability to easily view the current configuration options and change them at runtime.
  2. The ability to view statistics and diagnostic information that would help me understand what the app is doing.

As I noted before, XNA doesn’t give you much help out of the box when it comes to building a UI with buttons, checkboxes, textboxes, and all those other things that we take for granted in standard Windows apps.  Development tools are important but I didn’t want to spend a lot of time building them.  Because I’m ok with my app being Windows-only right now, it made sense to try to use a Windows-based presentation system, like, say WPF.

The problem was that the XNA and WPF systems are very, very different and there wasn’t a whole lot of material that explained how to glue them together in one app.  Fortunately, the answer is pretty simple even if it was a little hard to find so I’ll share it here to help out anyone else who may be wondering the same thing.

To be clear, my approach here is to display WPF windows from an XNA application.  Embedding an XNA surface inside a WPF application is a whole different subject!  And actually this has nothing to do with XNA: the approach found below will work for any kind of application where you want to control the main app thread yourself and run WPF on a secondary thread.

In order for WPF to work correctly, it needs a few things:

  1. A separate STA thread
  2. A thread dispatcher object for that thread
  3. A message pump

Here’s my WindowManager that makes those things happen:

public class WindowManager : IWindowManager, IDisposable
{
    IContainer _container;
    IScreenCustodian _settingsCustodian;
    IScreenCustodian _statisticsCustodian;
    Dispatcher _windowDispatcher;
    public WindowManager(IContainer container)
    {
        _container = container;
        StartUIThread();
        _windowDispatcher.Invoke((Action)(() =>
        {
            // We pull these out of the container here instead of doing normal
            // constructor injection because we need them to be created on this thread.
            _settingsCustodian =
                _container.GetInstance<IScreenCustodian>();
            _statisticsCustodian =
                _container.GetInstance<IScreenCustodian>();
        }));
    }
    public void ShowAllWindows()
    {
        _windowDispatcher.Invoke((Action)(() =>
        {
            _settingsCustodian.ShowInactive();
            _statisticsCustodian.ShowInactive();
        }));
    }
    void StartUIThread()
    {
        var dispatcherCreatedEvent = new ManualResetEvent(false);
        var thread = new Thread(() =>
        {
            _windowDispatcher = Dispatcher.CurrentDispatcher;
            dispatcherCreatedEvent.Set();
            Dispatcher.Run();
        });
        thread.SetApartmentState(ApartmentState.STA);
        thread.IsBackground = true;
        thread.Start();
        dispatcherCreatedEvent.WaitOne();
    }
    public void Dispose()
    {
        if (_windowDispatcher != null)
        {
            _windowDispatcher.InvokeShutdown();
        }
    }
}

There are a few notable things here.  First, all of the WPF-related objects need to be created on the WPF thread.  I’m pulling them all out of my IoC container which means that they have to be pulled from the container on the WPF thread, not on the main app thread, which means that my WindowManager has to retrieve them from the container itself rather than having them injected.  Side node: I may be over-relying on the container again here but I have a very simple UI system at the moment so I haven’t run into major problems.

Second, when the WindowManager creates the UI thread it sets it to use the STA threading model which WPF requires.  It also makes it a background thread so that it won’t keep the application alive if the main thread quits.  That’s appropriate for GenesisEngine but maybe not for other apps.  The Event object is used to verify that the UI thread is indeed created and running before we continue.

Third, we call Dispatcher.Run to start the message pump on the UI thread.  If this isn’t done then WPF won’t work.

Fourth, all interaction between the main app thread and the WPF elements has to go through Dispatch.Invoke to marshal the calls onto the UI thread.  You can see that in the ShowAllWindows method.

Lastly, the WindowManager is disposable so that it can cleanly shut down the dispatcher’s message pump when appropriate.  Actually, I suspect I still have an issue with clean shutdown somewhere because occasionally the MSpec runner will complain about mysterious errors when cleaning up my unit tests but I haven’t yet invested a lot of time in chasing down the root cause.

This code seems to work pretty well to create and display WPF windows for my XNA app.  I’m not doing a whole lot with them yet; the statistics window updates itself once per second and shows a few interesting numbers but the settings window isn’t hooked up to anything yet.  I’ll make more use of them shortly but the infrastructure appears to be working.