Eric Lee

Further thoughts on short sprints

(I originally posted this on my MSDN blog.)

I wrote about my preference for short Scrum sprints last month, where “short” means no more than two weeks. Recently there was another internal email thread on roughly the same subject where some people listed some concerns they had about moving their team to two-week sprints. I’ve combined, paraphrased, and edited some of the questions and included my answers for them below.

The environment isn’t nailed down and requirements are still up in the air. I need four weeks in order to maintain control.

When you have a lot of uncertainty in your project and your requirements are up in the air, that’s precisely the time when you want to have short sprints so that you have frequent adjustment points.

The whole idea of a sprint is that you build a crisp short-term plan based on facts that you know and can rely on. If there’s a lot of uncertainty in your environment, which is easier: to build one plan that will still be accurate and useful four weeks later or to build a two-week plan, execute it, and then build another two-week plan using additional facts that you’ve learned during the first sprint? Long sprints are for stable environments where the need for flexibility and responsiveness is low. Short sprints are for changing environments where you don’t know a lot right now but in two weeks you’ll know more.

Planning for four-week sprints is painful as it is; I don’t want to do that twice as often!

With shorter sprints, your planning for each sprint should get shorter as well. In fact, I’ve found that when going from four week to two week sprints your planning can be reduced by more than half because you simply don’t need all of the process that you need in a four week sprint.

For example, in a four week sprint it’s important to carefully estimate the number of required hours for each task, then track the number of hours you spend on each task, and generate a burn-down chart so that you can tell if you’re tracking to the plan. Most teams have some point at about halfway through the sprint where they evaluate the burn-down chart and make adjustments to the sprint commitments depending on how they’re doing, because four-week plans rarely survive unchanged.

Well, instead of doing that, how about if you skip the burn-down chart and just build a new plan at the two-week point? You can save all the effort of detailed tracking and you get a more up-to-date plan as well. Remember, building a two-week plan isn’t nearly as expensive as building a four-week plan so you’re doing about the same amount of planning (or probably less) overall.

How far off track can you get in two weeks, anyway? Certainly not as far as in four weeks, so there’s not as much need for oversight. And if you do start to get wildly off-track, just glancing at a sprint task board or a list of completed stories will probably tell you that you’ve got problems because the total number of items is small enough that you can understand them at a glance and eyeball it pretty reliably.

Meeting proliferation – yuck!

The same goes for meetings. There may be more meetings but each one is much shorter because there’s less to cover. If it’s hard to get all stakeholders together for a demo every two weeks, you might schedule a big public demo every four weeks (after two two-week sprints). Short meeting tend to be more productive because they’re crisp and people don’t get burned out. Four-hour planning meetings (or 6 hours, or 12!) are way too painful to be productive.

I have multiple teams in different geographic locations that need to stay in sync. Won’t short sprints hinder that?

Syncing multiple teams ought to be easier with short sprints because no team is ever more than two weeks away from being able to make major adjustments to their plan in response to the needs of another team. I’m not sure how long sprints would help the syncing issue. Long sprints give you the illusion that you know exactly where you’re going to be four weeks from now, but you probably don’t. You can make a pretty decent predictions for several weeks or even several months into the future using product burndown velocity, and that reminds everyone of what they really are – predictions. Not guarantees. Now, predictions are useful. I’m not saying that you don’t think about the future at all. I’m just saying that you need to be realistic about your level of certainty.

I’m not sure my team is ready to transition to two-week sprints. As scrum master, should I just mandate it?

I would not try to impose two-week sprints over the objections of the team. One of the fundamental principles of Scrum is that the team should be self-organizing as much as possible. If there’s general agreement that short sprints wouldn’t work for some reason, then imposing them probably won’t be successful. That doesn’t mean you can’t keep evangelizing the idea and hope to eventually change people’s minds, though.

If your organization is still feeling shaky about Scrum in general and people are still climbing the learning curve, then you should probably just stick with whatever sprint system you’re using at the moment unless it’s obviously broken. People can only absorb a certain amount of change at once. It might be wise to let your recent changes soak in a bit before you muck around with the system.

Anything else I should know about short sprints?

The one thing that short sprints really do demand is that you have to be able to write very good user stories that are small, well-defined, but still vertical slices that deliver real business value. That’s not an easy skill to develop, but I think it’s totally worthwhile because it pushes you to really understand the features at the top of your product backlog. Big, vaguely-defined user stories are hard to get done in two weeks, so they make you feel like maybe you need three or four week sprints, but I think the right answer is to not work with big, vaguely-defined user stories. There’s almost always a way to break them up in a way that makes sense if you take the time to thoroughly understand them, and only good things can come of thoroughly understanding your top-ranked stories.

Ok, there’s probably one other thing that short sprints demand. They demand that your entire team be familiar with, comfortable with, and committed to agile principles. If your team really wants to work in a high-planning, waterfall system and they’re just doing this Scrum thing because someone higher up told them they had to, then long sprints at least gives them some of the long-term planning that they desire. Short sprints will just make them even more uncomfortable than they already are. That says nothing about the viability of short sprints – it’s about people’s comfort zones.

To sum up, the whole premise of Scrum is that you make crisp firm plans based on facts that you definitely know, and you avoid making firm plans based on anything you don’t know or only pretend to know. Planning beyond the knowledge horizon doesn’t really change what you know, it just tricks you into thinking you know more than you do. The key is to execute hard on the facts in front of you and stay loose on everything else. And remember, people are the ultimate limiting factor so don’t drive them faster than they’re willing to go.

Everything I’ve said about two-week sprints applies to one-week sprints, only more so. The ultimate conclusion to this line of thinking is Lean/Kanban development where you don’t have time-boxed sprints at all; you just have single-piece workflow and a pull model. I haven’t really gone there yet because I’m still consolidating my grasp of Scrum principles but a lot of the industry thought-leaders are already there.

What goal does your culture value?

(I originally posted this on my MSDN blog.)

There have been several blog posts written recently on the topic of TDD and whether it ultimately makes you more productive or just slows you down. I don’t have much to add to that discussion but I found a comment left by Ben Rady for one of Bob Martin’s posts and thought that it was excellent (the comment, not the post, though the post was good too):

TDD slows you down if your goal is to be “done coding”. If your definition of done amounts to “It compiles, and nobody can prove that it doesn’t work” then writing a bunch of tests to prove whether or not it works makes your job harder, not easier. Sadly, in many organizations this is what “done” means, and so good developers wind up fighting against their environment, while the bad ones are encouraged to do more of the same.

If, on the other hand, your goal is to deliver a working system, then TDD makes you go faster. This is the only useful definition of “done” anyway, because that’s where you start making money.

This was a particularly interesting point because I was thinking of a point I heard someone make here at work last week – that Microsoft has a far higher tester to developer ratio than a lot of other leading software companies. Those other companies have a quality standard that is comparable to Microsoft but somehow they achieve it with many fewer testers. Why is that?

I’ve spent most of my career working as a developer of test tool in test organizations at Microsoft so I have a huge amount of respect for the great job that Microsoft testers do. But, having worked here for fifteen years, I believe that a large part of the work our test teams do is avoidable; it’s the unfortunate result of our traditionally developer-centric culture which has a lengthy history of focusing on the “done coding” goal rather than the “working system” goal. We need so many testers because they have to spend a large part of their time wrangling the devs into moving in the right direction.

I’m not sure if it’s cause or effect, but there’s definitely a strong correlation between our “done coding” culture and the strong wall we have between the development and testing disciplines at Microsoft. Developers write product code and testers write test code and never the twain shall meet. Developers are often completely ignorant of the tools and automated test suites that the testers use to test the builds. If a test tool gets broken by a product change, it’s pretty rare that a developer would either know or care. I’m pretty sure there’s a better way to do it.

To be fair, there’s nothing particularly unusual about Microsoft’s historical culture; that’s the way virtually the entire industry operated fifteen years ago. But in the past several years the industry (or a significant part of it, anyway) has made large strides forward and Microsoft is still playing catch-up. Again, to be fair, Microsoft is an enormous company with many different micro-cultures; there are plenty of teams at Microsoft who are very high-functioning, where developers take complete responsibility for delivering working systems, and where testers have the time to do deep and creative exploration of edge cases because the features just work. But from where I sit that doesn’t appear to be part of our broad corporate culture yet.

A lot of people are working hard to change that, and it is changing. As frustrating as it can be to deal with our historical cultural baggage, it’s also fascinating to watch a culture change happen in real time. I’m glad to be here and to be a small part of it.

Edit: I’m proud to say that Microsoft does value quality software quite a lot. It’s just that we take the long way around to achieving that quality; we’re apt to try to “test it in” after it’s written rather than focusing on reducing the need for testing in the first place. That’s the problem I’m talking about here.

Software Development is NP-hard

(I originally posted this on my MSDN blog.)

Here’s one more thought on the subject of complexity in software development: software development is NP-hard.

Software development (in the sense of building large projects end-to-end) has these characteristics:

A proposed solution can be easily proved correct or not correct.
The cost of searching for the correct solution grows exponentially as the problem set grows in size.
There are no known shortcuts that make the process of searching for the correct answer dramatically easier.

What’s the best way to deal with NP-hard problems?

Successive approximation and heuristics.

Irreducible Complexity in Software Development

(I originally posted this on my MSDN blog.)

My previous post talked about how software development can’t be modeled by any process that’s significantly less complex than the development process itself. I’d like to expand on that a bit more.

Irreducible Complexity

I think people are attracted to modeling and detailed design documents because they’re overwhelmed by the amount of complexity they’re facing in their project and they hope that the models will be significantly less complex than the software they’re used to model. That may be true, but you can’t lose substantial amounts of complexity without also losing important detail and nuance, and it’s the details that have impacts on the project out of all proportion to their size.

For models to be able to completely express a program, they have to be approximately as complex as the program they’re expressing. If they’re significantly less complex then they’re not adequate to fully express a working program and you’ve still got a lot more work to do after you finish your models, and that work is likely to invalidate the model you built in pretty short order. As Bertrand Meyer famously said, “Bubbles don’t crash.”

Compress This!

A useful analogy might be that of compressing data. Most raw data can be compressed to some extent. But the pigeonhole principle tells us that any general-purpose lossless compression algorithm that makes at least one input file smaller will make some other input file larger. In other words, there’s a fundamental limit to the amount of lossless compression that can be applied to any data set; if that weren’t true, you could recursively compress any data set to zero bytes.

If you close one eye and squint real hard, you could view the history of programming languages to this point as an exercise in data compression. We went from assembly to C to C++ to C#, and hundreds of other languages, and at each step we figured out to make the languages more succinctly expressive of what we want to do. That’s great! But at some point we’re going to run into that fundamental data compression limit where a more abstract language actually makes the amount of effort larger, not smaller. (Some would argue that Cobol managed to hit that limit a long time ago.)

I suspect that’s what happens when people try to extensively “model” software in documentation or any artifact other than code. It seems like it ought to be simpler but it turns out to be more complex than just writing the code in the first place.

Planning for Battle

That’s not to say that design artifacts are useless. They’re great for thinking in broad terms and making rough plans. Just don’t rely on them for more than they’re capable of doing. As Eisenhower said, “In preparing for battle I have always found that plans are useless, but planning is indispensable.”

Software Development is Like Weather Forecasting

(I originally posted this on my MSDN blog.)

A recent internal email thread asked about the difference in philosophy between Agile development and Waterfall-style development (or anything that promotes BDUF). There’s the Agile Manifesto which clearly articulates the basic assumptions of that movement, but what would a Waterfall Manifesto look like?

Someone observed that there are some actual ideas behind the Waterfall model that make it attractive to organizations. Software costs a lot to develop so we should do everything possible to push down the cost of development. If we can avoid making any major mistakes during the development process, that should logically drive down cost. So let’s lay out a comprehensive, detailed design before we write any code, generate tons of documentation, and lock our feature set so it doesn’t change. It all sounds good in theory but it usually breaks down in practice, and here’s why.

Waterfall-style engineering starts with the fundamental assumption that software development can be adequately modeled or approximated using some abstraction that’s cheaper, faster, or less error-prone than actual software development. If you can satisfactorily solve the problem using this abstraction, then you just translate the abstraction into real code and you’re finished with less expense than if you’d simply written the code.

However, it’s starting to look like software development can’t be adequately approximated with any model that’s simpler than the development process itself. The most efficient way to understand what a software system needs to look like is to actually build the software system in question. Software development is fundamentally an experimental, empirical discipline.

An analogy might be weather forecasting. We used to hope that with the right tools we’d be able to built perfect weather models that would tell us if it’s going to rain on Tuesday four weeks from today. Turns out that we can’t do that, and barring a fundamental change in our understanding, we’ll probably never be able to do it. The only way to find out if it’s going to rain on Tuesday four weeks from now is to wait four weeks (well, maybe four weeks minus a couple of days) and see what happens. We don’t know how to build adequate approximations of weather systems that don’t catastrophically break down over time.

Of course, short-term weather forecasting is still useful as long as you understand that nothing’s guaranteed. In the same way, planning documents and design exercises and the like are still useful in software development as long as you understand that they’re a pretty poor representation of reality and they’ll break down if you use them to predict the long-term future. Use them to identify the next few steps in the process, take those steps, then build a new plan.

Using WFC With Authenticated SSL

(I originally posted this on my MSDN blog.)

My smart card renewal tool uses an authenticated SSL connection to communicate with a WCF web service hosted in IIS; that is, a client certificate is required for connection and IIS automatically maps that certificate into a domain account identity so I can impersonate it.

Using WCF in this situation is pretty straightforward but there are a few fiddly details that may not be obvious at first. Here’s how to do it. As with the smart card stuff, I’m not an expert on this topic. What follows is just what I’ve discovered through research and experimentation.

The Client

On the client side you have the ServiceModel configuration section in your app.config file and a few lines of code to connect to the server. The configuration looks like this:

<system.serviceModel>
  <client>
    <endpoint name=“IMyService“
        address=https://myserver/myservice/MyService.svc
        binding=“wsHttpBinding“
        bindingConfiguration=“wsHttpBindingAuthenticated“
        contract=“MyApp.Contracts.IMyService“>
    </endpoint>
  </client>
  <bindings>
    <wsHttpBinding>
      <binding name=“wsHttpBindingAuthenticated“useDefaultWebProxy=“true“>
        <security mode=“Transport“>
          <transport clientCredentialType=“Certificate“/>
        </security>
      </binding>
    </wsHttpBinding>
  </bindings>
</system.serviceModel>
 

In this example the name of the endpoint is the name of the contract I’m exposing through WCF. It can be anything you like but I like to use the contract name to reduce confusion. The address setting is the URL to the .svc file hosted in IIS. The contract setting is the fully-qualified name of contract interface. The rest of it is just telling WCF to rely on the SSL connection for security and to present a client certificate upon connection.

The code in the client is pretty trivial. It sets up a channel factory, configures the client certificate to use, and creates a channel. Very simple.

var channelFactory = new ChannelFactory<IMyService>(“IMyService”);
channelFactory.Credentials.ClientCertificate.Certificate = certificate.X509Certificate;
IMyService myService = channelFactory.CreateChannel();
 

However, when you actually make calls on your service you’ll need to trap all kinds of possible exceptions and do something intelligent with them. You should go through all the possibilities you can think of – server name not resolved, server not responding, IIS disabled, incorrect service URL, client network down, etc., etc. I have nearly 100 lines of code in my app just to catch various exceptions that might be thrown when I try to use my WCF service and to turn them into friendly error messages that suggest possible causes and solutions.

The Server

The server is slightly more complicated but not too bad if you know exactly what needs to be done. First is the ServiceModel configuration in the web.config file:

<system.serviceModel>
  <services>
    <service name=“MyApp.Service.MyService“
             behaviorConfiguration=“MyApp.Service.MyServiceBehavior“>
      <endpoint address=“”
                binding=“wsHttpBinding“
                bindingConfiguration=“wsHttpBindingAuthenticated“
                contract=“MyApp.Contracts.IMyService“ />
    </service>
  </services>
  <behaviors>
    <serviceBehaviors>
      <behavior name=“MyApp.Service.MyServiceBehavior“>
        <serviceMetadata httpGetEnabled=“false“/>
        <serviceDebug includeExceptionDetailInFaults=“False“ />
        <serviceCredentials>
          <clientCertificate>
            <authentication mapClientCertificateToWindowsAccount=“true“ />
          </clientCertificate>
        </serviceCredentials>
      </behavior>
    </serviceBehaviors>
  </behaviors>
  <bindings>
    <wsHttpBinding>
      <binding name=“wsHttpBindingAuthenticated“>
        <security mode=“Transport“>
          <transport clientCredentialType=“Certificate“/>
        </security>
      </binding>
    </wsHttpBinding>
  </bindings>
</system.serviceModel>
 

In this configuration the service name is the fully-qualified name of the concrete class that implements the service contract interface. The endpoint doesn’t need to specify an address since that will be taken care of by IIS, and the contract setting refers to the fully-qualified name of the contract interface just as in the client configuration.

The behavior section turns off some WCF options and turns on the option to automatically map the client certificate to the corresponding Windows domain account.

The binding section is the same as in the client config and tells WCF to rely on SSL for security and to expect a client certificate as proof of identity.

The .svc file is just one line:

<%@ ServiceHost Service=”MyApp.Service.MyService“ %>
 

The .svc file is the file that the client URL will point to; this is the entry point that IIS uses to figure out which concrete class to instantiate and expose to incoming requests.

The concrete class that implements the service contract has no special magic in it at all so I won’t bother to show it here. The only requirement is that you implement a default constructor on the class. IIS will instantiate an instance of the class for you and invoke the method that the client requested.

IIS

There is also some configuration you need to do in IIS.

First, I assume you have a proper SSL certificate set up for your server. If you want to create a self-signed certificate for your development environment, there are excellent instructions available on how to do so.

Second, you need to install the “Client Certificate Mapping Authentication” component for the Web Server server role. In Windows Server 2008, you can find the web server role in the Server Manager:

Click the “Web Server (IIS)” role, click “Add Role Services, and make sure that both the Windows Authentication and the Client Certificate Mapping Authentication services are selected. You don’t want IIS Client Certificate Mapping Authentication unless you want to explicitly map multiple certificates to a single domain account.

In IIS Manager, first look at the Authentication page for your web server (not for the web application) and enable both “Active Directory Client Certificate Authentication” and “Windows Authentication”.

Next, look at the Authentication page for your web application and make sure that “Windows Authentication” is enabled and that “Anonymous Authentication” is disabled:

View the SSL page for your web application and make sure that it’s configured to require SSL and to require a client certificate:

The final step is that your .svc file (and nothing else) needs to be configured to allow anonymous authentication. Doing this in IIS Manager is a little non-obvious the first time.

In your web application, switch to Content View.
Find your .svc file, right-click on it, and choose “Switch To Features View”.
Now the .svc file will be displayed in the left-hand tree view under your web application. Select it and look at the Authentication page for this one file (not the entire web app). Enable Anonymous Authentication and turn off everything else:

Impersonation

After you successfully build, deploy, and configure all of that, you should be able to connect to your web service using a client certificate and IIS will automatically map that to a Windows domain account. If you check ServiceSecurityContext.Current.WindowsIdentity in your web service code, you’ll see the name of the domain account that the certificate was issued to. However, you’re still not running in full impersonation mode! In order to have the web service fully act as the user you need to call ServiceSecurityContext.Current.WindowsIdentity.Impersonate() and then do your work, like so:

using (ServiceSecurityContext.Current.WindowsIdentity.Impersonate())
{
    // Act as the user here.
}
 

While impersonating the user you can do anything you’d like with resources on the web server machine. However, if you want to touch anything on a different machine you’ll need to set up constrained Kerberos delegation between the two machines because that counts as a double hop (the hop from the client machine to the web server is the first one) so it will be disallowed by default. I’ll post about that soon.

Renewing Smart Card Certificates Via The Internet

(I originally posted this on my MSDN blog.)

About six months ago I published some example code for enrolling for smart card certificates across domains. Of course, once you’re able to enroll for a smart card certificate across domains, at some point you’ll also need to renew that certificate across domains or remotely via the internet. Renewing within a domain is trivial – you just set up the template for autoenrollment and you’re good to go. Doing the same thing without domain support is a difficult and poorly-documented exercise, so I’m sharing my hard-won knowledge with you.

My scenario:

Users are located externally and connect via the public internet to secure web servers in my domain.
Users use smart cards to authenticate to the web server.
Users need to renew their smart card certificates periodically.

Here’s some guidance on how to do it.

The Disclaimer

As stated before, I’m not a smartcard, certificate, or security expert. This code seems to work for me but may be wrong in some unknown aspect. Corrections are welcome!

The Concept

The basic idea goes like this:

The client builds a CMC renewal request for the certificate.
The client adds the certificate template OID to the CMC request so that the server will know which template to use.
The client gets a string representation of the CMC request and sends it across the network to the server via an authenticated SSL connection (client certificate is required).
The server impersonates the user, submits the request to the CA server, and returns the response to the client.
The client installs the response and deletes the old certificate.

The Code

This code uses the new-ish Certificate Enrollment API (certenroll) that’s available only on Vista+ and Windows Server 2008+. It won’t run on XP or Server 2003. I’m not going to post a complete, working program here since it’s fairly bulky with other concerns. I’ll just cover the key parts that deal with the renewal process itself, without a lot of error handling and other things. Hopefully I haven’t excluded anything important!

Renewal Processor

The first part is the renewal processor class that drives the high-level renewal workflow of creating a request, sending it to the server, installing the response, and rolling back the request if anything goes wrong. The IRenewalService interface handles all the magic of talking to the server which is a topic for another time.

public class RenewalProcessor : IRenewalProcessor
{
    ICertificateRenewalRequestFactory _requestFactory;
    IRenewalService _renewalService;

    public RenewalProcessor(ICertificateRenewalRequestFactory requestFactory, IRenewalService renewalService)
    {
        _requestFactory = requestFactory;
        _renewalService = renewalService;
    }

    public void Renew(ICertificate certificate)
    {
        ICertificateRenewalRequest request = null;
        ICertificateRenewalResponse response = null;

        try
        {
            request = _requestFactory.Create(certificate);
            response = _renewalService.Enroll(request);

            InstallResponse(certificate, request, response);
        }
        catch
        {
            if (HasOutstandingRequest(request, response))
            {
                request.Cancel();
            }

            throw;
        }
    }

    private bool HasOutstandingRequest(ICertificateRenewalRequest request, ICertificateRenewalResponse response)
    {
        return request != null && (response == null || (response != null && !response.IsInstalled));
    }

    private void InstallResponse(ICertificate certificate, ICertificateRenewalResponse response)
    {
        response.Install();
        certificate.Delete();
    }
}

Certificate Facade

The second part is a facade that helps me manage the certificates. Some facilities are provided by .Net and other facilities are provided by certenroll, so this facade glues them together into a cohesive entity for me.

public class Certificate : ICertificate
{
    public Certificate(X509Certificate2 certificateToWrap)
    {
        this.X509Certificate = certificateToWrap;
    }

    public string ToBase64EncodedString()
    {
        byte[] rawBytes = this.X509Certificate.GetRawCertData();
        return Convert.ToBase64String(rawBytes);
    }

    public string TemplateOid
    {
        get
        {
            var managedTemplateExtension = (from X509Extension e in this.X509Certificate.Extensions
                                            where e.Oid.Value == “1.3.6.1.4.1.311.21.7”
                                            select e).First();

            string base64EncodedExtension = Convert.ToBase64String(managedTemplateExtension.RawData);
            IX509ExtensionTemplate extensionTemplate = new CX509ExtensionTemplate();
            extensionTemplate.InitializeDecode(EncodingType.XCN_CRYPT_STRING_BASE64, base64EncodedExtension);

            return extensionTemplate.TemplateOid.Value;
        }
    }

    public X509Certificate2 X509Certificate { get; private set; }

    public void Delete()
    {
        X509Store store = new X509Store(StoreLocation.CurrentUser);
        store.Open(OpenFlags.ReadWrite);
        store.Remove(this.X509Certificate);

        this.PrivateKey.Delete();
    }

    private IX509PrivateKey PrivateKey
    {
        get
        {
            RSACryptoServiceProvider managedPrivateKey = (RSACryptoServiceProvider)this.X509Certificate.PrivateKey;

            IX509PrivateKey key = new CX509PrivateKey();
            key.ContainerName = managedPrivateKey.CspKeyContainerInfo.UniqueKeyContainerName;
            key.ProviderName = “Microsoft Base Smart Card Crypto Provider”;
            key.Open();

            return key;
        }
    }
}

Renewal Request

The third part is the renewal request. This is the class that builds a CMC renewal request for the certificate and knows how to cancel an in-progress request if necessary. Canceling is important because creating a request creates a new key container on the smart card and if you have several aborted attempts without cleaning up you could fill up the card with empty containers and not be able to renew your certificate. The renewal service (not shown) will get the base64-encoded string representation of the request and send it to the server for enrollment.

public class CertificateRenewalRequest : ICertificateRenewalRequest
{
    private ICertificate _certificateToRenew;
    private IX509PrivateKey _requestPrivateKey;

    public CertificateRenewalRequest(ICertificate certificateToRenew)
    {
        _certificateToRenew = certificateToRenew;
    }

    public string ToBase64EncodedString()
    {
        IX509CertificateRequestCmc cmcRequest = CreateCmcRequest();
        IX509Enrollment enrollment = CreateEnrollment(cmcRequest);
        string base64EncodedRequest = enrollment.CreateRequest(EncodingType.XCN_CRYPT_STRING_BASE64);

        CacheRequestPrivateKey(enrollment);

        return base64EncodedRequest;
    }

    public void Cancel()
    {
        // Canceling the request means we need to delete the private key created by the
        // enrollment object if we got that far.

        if (_requestPrivateKey != null)
        {
            _requestPrivateKey.Delete();
        }
    }

    private IX509CertificateRequestCmc CreateCmcRequest()
    {
        string base64EncodedCertificate = _certificateToRenew.ToBase64EncodedString();

        IX509CertificateRequestCmc cmcRequest = new CX509CertificateRequestCmc();
        var inheritOptions = X509RequestInheritOptions.InheritNewSimilarKey | X509RequestInheritOptions.InheritRenewalCertificateFlag | X509RequestInheritOptions.InheritSubjectFlag | X509RequestInheritOptions.InheritExtensionsFlag | X509RequestInheritOptions.InheritSubjectAltNameFlag;
        cmcRequest.InitializeFromCertificate(X509CertificateEnrollmentContext.ContextUser, true, base64EncodedCertificate, EncodingType.XCN_CRYPT_STRING_BASE64, inheritOptions);
        AddTemplateExtensionToRequest(cmcRequest);

        return cmcRequest;
    }

    private IX509Enrollment CreateEnrollment(IX509CertificateRequestCmc cmcRequest)
    {
        IX509Enrollment enrollment = new CX509Enrollment();
        enrollment.InitializeFromRequest(cmcRequest);
        return enrollment;
    }

    private void AddTemplateExtensionToRequest(IX509CertificateRequestCmc cmcRequest)
    {
        CX509NameValuePair templateOidPair = new CX509NameValuePair();
        templateOidPair.Initialize(“CertificateTemplate”, _certificateToRenew.TemplateOid);
        cmcRequest.NameValuePairs.Add(templateOidPair);
    }

    private void CacheRequestPrivateKey(IX509Enrollment enrollment)
    {
        IX509CertificateRequest innerRequest = enrollment.Request.GetInnerRequest(InnerRequestLevel.LevelInnermost);
        _requestPrivateKey = ((IX509CertificateRequestPkcs10)innerRequest).PrivateKey;
    }
}

Renewal Response

The renewal response is pretty simple – all it needs to do is install the response to the smart card.

public class CertificateRenewalResponse : ICertificateRenewalResponse
{
    private string _base64EncodedResponse;

    public CertificateRenewalResponse(string base64EncodedResponse)
    {
        _base64EncodedResponse = base64EncodedResponse;

        IsInstalled = false;
    }

    public bool IsInstalled { get; private set; }

    public void Install()
    {
        var enrollment = new CX509Enrollment();
        enrollment.Initialize(X509CertificateEnrollmentContext.ContextUser);

        enrollment.InstallResponse(InstallResponseRestrictionFlags.AllowNone, _base64EncodedResponse, EncodingType.XCN_CRYPT_STRING_BASE64, null);

        IsInstalled = true;
    }
}

Request Processor

The request processor is the server-side WCF component that receives the renewal request, enrolls it with the CA, and returns the response to the client. It impersonates the user when it does the enrollment and relies on Kerberos delegation to transfer the user’s credentials to the CA.

public class RequestProcessor : IRequestProcessor
{
    public string Enroll(string base64EncodedRequest)
    {
        ICertRequest2 requestService = new CCertRequest();
        RequestDisposition disposition = RequestDisposition.CR_DISP_INCOMPLETE;
        string configuration = GetCAConfiguration();

        // Submit the cert request in the security context of the caller – this REQUIRES Kerberos delegation to be correctly set up in the domain!
        using (ServiceSecurityContext.Current.WindowsIdentity.Impersonate())
        {
            disposition = (RequestDisposition)requestService.Submit((int)Encoding.CR_IN_BASE64 | (int)Format.CR_IN_CMC, base64EncodedRequest, null, configuration);
        }

        if (disposition == RequestDisposition.CR_DISP_ISSUED)
        {
            string base64EncodedCertificate = requestService.GetCertificate((int)Encoding.CR_OUT_BASE64);
            return base64EncodedCertificate;
        }
        else
        {
            string message = string.Format(CultureInfo.InvariantCulture, “Failed to get a certificate for the request.  {0}”, requestService.GetDispositionMessage());
            throw new InvalidOperationException(message);
        }
    }

    private string GetCAConfiguration()
    {
        CCertConfig certificateConfiguration = new CCertConfig();
        return certificateConfiguration.GetConfig((int)CertificateConfiguration.CC_DEFAULTCONFIG);
    }

    private enum RequestDisposition
    {
        CR_DISP_INCOMPLETE = 0,
        CR_DISP_ERROR = 0x1,
        CR_DISP_DENIED = 0x2,
        CR_DISP_ISSUED = 0x3,
        CR_DISP_ISSUED_OUT_OF_BAND = 0x4,
        CR_DISP_UNDER_SUBMISSION = 0x5,
        CR_DISP_REVOKED = 0x6,
        CCP_DISP_INVALID_SERIALNBR = 0x7,
        CCP_DISP_CONFIG = 0x8,
        CCP_DISP_DB_FAILED = 0x9
    }

    private enum Encoding
    {
        CR_IN_BASE64HEADER = 0x0,
        CR_IN_BASE64 = 0x1,
        CR_IN_BINARY = 0x2,
        CR_IN_ENCODEANY = 0xff,
        CR_OUT_BASE64HEADER = 0x0,
        CR_OUT_BASE64 = 0x1,
        CR_OUT_BINARY = 0x2
    }

    private enum Format
    {
        CR_IN_FORMATANY = 0x0,
        CR_IN_PKCS10 = 0x100,
        CR_IN_KEYGEN = 0x200,
        CR_IN_PKCS7 = 0x300,
        CR_IN_CMC = 0x400
    }

    private enum CertificateConfiguration
    {
        CC_DEFAULTCONFIG = 0x0,
        CC_UIPICKCONFIG = 0x1,
        CC_FIRSTCONFIG = 0x2,
        CC_LOCALCONFIG = 0x3,
        CC_LOCALACTIVECONFIG = 0x4,
        CC_UIPICKCONFIGSKIPLOCALCA = 0x5
    }
}

Things To Watch Out For

There are a few tricky issues that I ran across.

WCF and Authenticated SSL

I’m using WCF to communicate between the client and the server. Doing that on top of an authenticated SSL connection is a subject all its own and I hope to post on that separately.

Multiple PIN Prompts

You’ll get multiple prompts for your PIN during the renewal process; either two or three depending on which OS you’re running on and how you organize your calls. The basic problem is that the smart card CSP and the SSL CSP are separate and don’t talk to each other so they can’t reuse a PIN that the other one gathered. At a minimum you’ll get one PIN prompt from SSL and another from the smart card system.

If you’re running on Windows 7, you might even get three prompts if you do the logical thing and first build your request, then connect to the server and send the request, then install the response. This is because in previous versions of Windows each CSP would cache the PIN you entered, but Windows 7 actually converts the PIN to a secure token and caches that. Unfortunately there’s only one global token cache but the CSPs can’t use tokens generated by others, so first the smart card CSP prompts you and caches a token, then SSL prompts you and caches its own token (overwriting the first one), then the smart card system prompts you again (because its cached token is gone).

The solution to minimize the problem is to first do a dummy web service call just to force SSL to set up the connection, then do all the smart card stuff. Once SSL has an active connection you can do multiple web service calls on it without incurring additional PIN prompts.

Impersonation

The idea here is that we first set up the certificate template on the CA to allow users to submit their own renewal requests by signing the request with the old, valid certificate. Then we submit certificate renewal requests with the user’s credentials and the only certificate they can request is a renewal of the one they’ve already got. Because we’re not using an enrollment agent service account here, there’s no risk of an elevation of privileges that would allow a user to get any other certificate.

This plan relies on Kerberos delegation between the web server and the CA server in order for the enrollment to be submitted in the correct context because it’s a double-hop that’s not allowed by default. This is an entire subject in itself and will hopefully be a blog post sometime soon.

However, there are two bugs I ran into while attempting to user Kerberos delegation to the CA. The first bug is that the call to ICertConfig.GetConfig() will fail with a “file not found” error when run while impersonating the user. That makes no sense to me and is probably a Windows bug, but fortunately you can easily call it before you start the impersonation.

The second bug is that certificate enrollment requests will fail on a Windows Server 2003 CA when the user’s credentials are based on a client certificate and delegated from another machine. The call will make it to the CA, but the CA will fail the enrollment with error 0x80070525: “The specified account does not exist.” This is a bug in Windows Server 2003 and was fixed in Windows Server 2008. (Yes, some of our infrastructure is kind of old, but we’re upgrading it now.)

The Root Of All Evil in Scrum

(I originally posted this on my MSDN blog.)

Sometimes the Scrum process kind of breaks down. Maybe there’s confusion over backlog items, or some people end up with nothing to do, or there’s a general sense of spinning the wheels but not getting anywhere. I’ve seen these kinds of symptoms in my own work and observed them in other teams.

Personally, I think the most common cause for general malaise in the Scrum process is a lack of clear, tight focus on delivering business value. Whenever you let the team go off the rails and start building infrastructure components with no immediate consumer, or you invest a lot of effort into tracking hours spent on non-development tasks, or you find yourself at your end-of-sprint demo talking a lot about work you did but having nothing to actually, um, demo – pain is sure to follow.

If Scrum isn’t working for you, the first thing you should do is see if you can draw a clear line between every task on your sprint backlog and some concrete, deliverable feature that your customers(*) actually care about. If you can’t, stop and clarify your goals until you can. If there’s no believable rationale for a particular task that your customer would get excited about, just drop it. It’s not important and it’s distracting you from your mission. If you don’t have a plan for delivering something that’s done, demo-able, and potentially shippable at the end of the sprint, narrow the scope until you know how to do so. If you have some kind of required process in place and you can’t succinctly explain exactly how that process enables you to deliver working software in a better, faster, and cheaper manner, then drop the process. It’s not helping.

It’s not always easy to cast things in terms of measurable business value. Sometimes it takes a lot of skill to do so in a meaningful way. Thinking about narrow vertical feature slices helps, but sometimes it takes a lot of hard work to identify the appropriate slices. In the end, though, it’s absolutely worth it.

(* Of course, you also need to keep a complete list of your customers in mind. An end-user won’t directly care if your project has good diagnostic logging, but your Operations team definitely will. They’re customers too!)

Tom DeMarco: Software Engineering Is Dead

(I originally posted this on my MSDN blog.)

This is a little late but there was an interesting internal thread about Tom DeMarco’s recent article in IEEE Software entitled “Software Engineering: An Idea Whose Time Has Come and Gone?” In it he recants his early writing on the topic of metrics and control in software engineering projects and says that software projects are fundamentally experimental and uncontrollable. Consistency and predictability in software development are largely unattainable goals. Much of what we think of when we think about “software engineering” is a dead end.

Microsoft is fairly big on software engineering so the article caused a bit of a ruckus with some folks. Someone snidely noted that software engineering seems to have worked well for Windows 7 and questioned DeMarco’s motive for writing the article. I thought Windows 7 was an interesting example to bring up. How could a gigantic software project like Windows function without software engineering? What does it have to say about DeMarco’s claim that software engineering is dead?

The Windows 7 project is widely considered a success, especially compared to the Vista project which had many well-publicized problems. Now, I don’t know much about the Windows engineering process because I’ve never worked on the Windows team, but my impression from the comments of people who are familiar with it leads me to believe that the success of Windows 7 vs. Vista was due in large part to a reduction and streamlining of process and a change in focus from centralized to decentralized control.

For example, here’s a blog post by Larry Osterman (a senior Windows developer) where he says:

This is where one of the big differences between Vista and Windows 7 occurs: In Windows 7, the feature crew is responsible for the entire feature. The crew together works on the design, the program manager(s) then writes down the functional specification, the developer(s) write the design specification and the tester(s) write the test specification. The feature crew collaborates together on the threat model and other random documents. Unlike Windows Vista where senior management continually gave “input” to the feature crew, for Windows 7, management has pretty much kept their hands off of the development process.

Larry goes on to describe the kinds of control and process that the Windows 7 engineering system did have, and of course there was still a lot of it. But my impression (again as an outsider) is that the process was directed more toward setting a bar and leaving the precise means of hitting the bar up to the people doing the work.

In DeMarco’s article, he says:

Can I really be saying that it’s OK to run projects without control or with relatively little control? Almost. I’m suggesting that first we need to select projects where precise control won’t matter so much. Then we need to reduce our expectations for exactly how much we’re going to be able to control them, no matter how assiduously we apply ourselves to control. […] So, how do you manage a project without controlling it? Well, you manage the people and control the time and money.

Seems like when you compare the engineering process of Vista vs. Windows 7, Vista was more about management trying to directly control the details of the engineering process and Windows 7 was more about managing the people, time, and money.

Larry wrote about Windows 7:

A feature is not permitted to be checked into the winmain branch until it is complete. And I do mean complete: the feature has to be capable of being shipped before it hits winmain – the UI has to be finished, the feature has to be fully functional, etc. […] Back in the Vista day, it was not uncommon for feature development to be spread over multiple milestones – stuff was checked into the tree that really didn’t work completely. During Win7, the feature crews were forced to produce coherent features that were functionally complete – we were told to operate under the assumption that each milestone was the last milestone in the product and not schedule work to be done later on. That meant that teams had to focus on ensuring that their features could actually be implemented within the milestone as opposed to pushing them out.

and Tom wrote in his article:

You say to your team leads, for example, “I have a finish date in mind, and I’m not even going to share it with you. When I come in one day and tell you the project will end in one week, you have to be ready to package up and deliver what you’ve got as the final product. Your job is to go about the project incrementally, adding pieces to the whole in the order of their relative value, and doing integration and documentation and acceptance testing incrementally as you go.”

Sounds to me like they’re saying the same thing.

I think DeMarco’s article upset some people because of the provocative title but the content really does match real-world experience, even on gigantic projects. He’s not saying that engineering (broadly defined) is bad or that all control is bad, and he’s not denying that processes needs to scale with the size of the project. Rather, Tom’s famous statement that “you can’t control what you can’t measure” has been used to justify a lot of sins in the name of software engineering, and it’s that unbalanced and control-freak-oriented brand of engineering that he’s saying is dead. Judging from Windows 7, I’d say he’s absolutely right.

We can’t predictively measure or control some of the most important aspects of a software development project, and if we try, we soon discover that the “observer effect” from physics applies to us as we end up changing the thing we’re trying to measure (and inevitably not for the better). The best we can do is to anecdotally evaluate how we’re doing by considering our past, inspecting our results, and making course corrections for the future.

That’s what agile software development is all about.

Tracking Sprint Progress

(I originally posted this on my MSDN blog.)

Over the past few months there have been several interesting email threads on internal aliases at work that have helped clarify my thinking about various Agile topics. I thought I’d share some of the things I wrote in hopes that it’ll be useful for someone.

The first one I want to share is has to do with tracking Scrum sprint process. A few people, including me, were advocating a low-tech approach to tracking sprint tasks using stickie notes on a whiteboard and not bothering with hours spent on each task. Someone asked, “If you don’t track estimated vs. actual hours spent on each task, how do you create a sprint burndown chart?”

My answer:

If your sprints are fairly short, like two weeks, and if your stories are fairly small and granular, then you can do a rough burndown chart just showing the number of story points that have been completed so far in the sprint and get almost all of the value you need from the chart. I’m personally convinced that a lot of teams spent a lot of time collecting detailed metrics and then don’t actually do anything useful with those metrics to justify the time spent to generate them.

What does a detailed hours-spent burndown chart buy you? If your sprints are long (say four+ weeks), then it helps you do mid-course corrections and cut stories if you’re not trending well. It might also help you identify chronic estimation problems so you can work on fixing them.

I think both of those benefits can be gained by simply shortening the length of your sprint cycle. Over a two-week sprint, you’re not likely to drift very far off course. At the end of the two weeks, where you might have a mid-course correction in a long sprint, you simply plan another sprint based on where you are now. And if people made bad estimates of story sizes that caused you to over-commit and not finish the sprint backlog, both the estimation and the surprises discovered during implementation should still be fresh in people’s minds and you can discuss them in your retrospective. You don’t need detailed metrics to capture that information and hold it for weeks until you can get around to discussing it.

Sprint burndown charts based on hours spent vs hours estimated look really cool and they appeal to the innate engineer/geek sense in all of us. I’ve just found that when I weigh the benefit that data actually gives me vs. the hassle it cost everyone to collect it, it’s generally not worth it.

So to be specific, I prefer to put my stories up on a whiteboard with sticky notes for individual tasks in a column underneath each story. We don’t estimate the tasks in hours. (During the sprint planning meeting, we do a quick double-check of the decomposed tasks against the original size estimate for the story to make sure we’re still comfortable with it.) Then as the sprint progresses, people grab task stickies and move them to an “in progress” row as they work on them, and then to a “done” row when they’re done. A quick glance at the board will give us a ballpark feel for where we are, which in a two-week sprint is all we need. If we want a slightly better feel for how we’re trending, we might build a burndown chart based on story points completed vs. scheduled.

There’s very little overhead with this system, it’s pleasant to use, we don’t have to harass devs about entering their hours every day, and it gives us the information we need to deliver business value in a predictable manner, which is the whole point.