Why AAA style mocking is better than Record / Playback

by Rasmus Kromann-Larsen May 24, 2009 21:06

If you've followed me on twitter for more than a couple of days, you will most probably have heard my grumbling each time I run into issues using record / playback mocking - so I thought I'd write a short post on my experiences with both and why I think that I keep bumping into issues with record / playback.

Phases of a test

If you take a look at the normal flow of a test method without mocking, it will usually perform some kind of setup, then perform some action that invokes the code under test - and then at the end, make some assertions about the state of some component that you want verified. While some tests continue after this point, this is where you should stop if you're following the "one assert per test" rule - this is sort of the single responsibility principle for tests. This flow is also the inspiration for the AAA name - first you Arrange your test setup, then you Act upon the class under test - and then you Assert something about the state. Here's a simple example without mocking:

[Test]
public void CanRemoveCategories()
{
    // Arrange
    var collection = new CategoryCollection("test");

    // Act
    collection.RemoveCategory("test");

    // Assert
    Assert.That(collection.Count, Is.EqualTo(0));
}

This test is chronologically sound, it makes sense and it is easy to read - but then again, this is a state-based test, I said I was going to talk about behavioral tests - mock tests.

Phase confusion

Many people find that mocking is rather difficult to understand and that it often very hard to read and understand. Since our tests act as an API-description for our code - and since we want to be able to figure out how to fix our failed tests - readability is important. Now, let's look at one of my tests from a Record / Playback point of view. I'm using Rhino Mocks as my mocking framework in this test:

[Test]
public void ShouldIgnoreRespondentsThatDoesNotExistRecordPlayback()
{
    var guid = Guid.NewGuid();
    IEventRaiser executeRaiser;

    using(_mocks.Record())
    {
        Expect.Call(_view.Respondents).Return(new[] {guid.ToString()});
        Expect.Call(_repository.GetById(guid)).Return(null);

        _view.ExecuteOperation += null;
        executeRaiser = LastCall.IgnoreArguments()
            .Repeat.Any()
            .GetEventRaiser();

        Expect.Call(_view.OperationErrors = null)
            .IgnoreArguments()
            .Constraints(List.IsIn("Non-existant respondent: " + guid));
    }

    using(_mocks.Playback())
    {
        new BulkRespondentPresenter(_view, _repository);
        executeRaiser.Raise(null, EventArgs.Empty);
    }
}

Now, at a glance, can you tell me what this test is really doing? There's something with a view and a repository and we can probably deduce quite a bit from the test name. But it's rather hard to separate the different phases I talked about before Arrange, Act and Assert. Below, I've tried to annotate the test with the phases:

[Test]
 public void ShouldIgnoreRespondentsThatDoesNotExistRecordPlayback()
 {
     // Arrange
     var guid = Guid.NewGuid();
     // Part of Act
     IEventRaiser executeRaiser;

     using(_mocks.Record())
     {
         // Arrange (or Assert?)
         Expect.Call(_view.Respondents).Return(new[] {guid.ToString()});
         Expect.Call(_repository.GetById(guid)).Return(null);

         // Part of Act
         _view.ExecuteOperation += null;
         executeRaiser = LastCall.IgnoreArguments()
             .Repeat.Any()
             .GetEventRaiser();

         // Assert
         Expect.Call(_view.OperationErrors = null)
             .IgnoreArguments()
             .Constraints(List.IsIn("Non-existant respondent: " + guid));
     }

     using(_mocks.Playback())
     {
         // Arrange
         new BulkRespondentPresenter(_view, _repository);
         // Act
         executeRaiser.Raise(null, EventArgs.Empty);
     }
 }

No wonder it's hard to read and understand. The phases are mixed all over - and the Asserts are in the middle of the test - this is nothing like the natural flow of the previous test without mocking. I usually like to have the phases separated in my test with comments as well, but it's just not possible in this test. I wrote this test up rather quickly, so there might be a better way of doing it that I am missing - if there is, please yell at me :-)

Sorting out the confusion

AAA mocking, as the name suggests, is all about clearing out the confusion in that last test - it's about maintaining the original test flow. It just so happens also to have some other benefits, that I will get into later in the post. I've written the same test as above in an AAA style, this time with Moq, since I'm trying it out at the moment, but Rhino Mocks has similar syntax. Moq is pretty heavy on lambda expressions, but even if you haven't worked with those yet, I'm sure you will grasp the idea. If you want a general introduction to mocking with Moq, Justin Etheredge has a small series about it.

[Test]
public void ShouldIgnoreRespondentsThatDoesNotExist()
{
    // Arrange
    var guid = Guid.NewGuid();
    _viewMock.Setup(x => x.Respondents).Returns(new[] { guid.ToString() });
    _repositoryMock.Setup(x => x.GetById(guid)).Returns(() => null);

    // Act
    _viewMock.Raise(x => x.ExecuteOperation += null, EventArgs.Empty);

    // Assert
    _viewMock.VerifySet(x => x.OperationErrors =
        It.Is<IList<string>>(l => l.Contains("Non-existant respondent: "+guid)));
}

Those comments are actually in my original test as well - and in my test live template I generate all my tests with. If you compare this test to the one above, you will see that it has more or less the same components, but this time, they're arranged in a way that makes sense for the next reader of the test. The fact that the test is shorter is also slightly unfair, since my first test used an event raiser, which involves "many" lines of code. Also the separation of the phases allowed me to move the actual construction of the presenter our of the actual test and into shared setup code.

So what techniques did AAA mocking introduce to help alleviate the pains? First of all, the mocks no longer has states - that's what record and playback really refer to: A mock in record state will record calls made on it and then expect them to be called again during the playback state. Furthermore, it cleanly separates mock setup from mock expectations.

What is gained?

So what did we gain with AAA style mocking over the traditional record / playback style?

  • The main selling point for me is readability and test simplicity - it is much easier for me to explain mocking to someone else with AAA.
  • If you have done any fairly advanced record / playback mocking, you will find that when the mocks have states, it will often result in subtle test failures.
  • Clean separation of test phases.
  • Greatly improved ability to move shared code out of tests. Since you have the first part of your test handling setup, extending this part to start before the actual test (in a setup method) is no problem. With record / playback mocking, you will often run into state failures if you attempt it.

kick it on DotNetKicks.com

Tags:

Development | Testing

Code Coverage – What You Need To Know

by Rasmus Kromann-Larsen February 23, 2009 00:34

People often talk about the percentage of code coverage they have from unit tests, whether it be actual coverage or the goal they or their project have set. Having a high coverage percentage is often seen as a quality, but code coverage is not really a metric that gives much value in itself. In this post I want to investigate the different types of code coverage that exists and address some of the problems with code coverage.

Types of Code Coverage

There are various forms of code coverage, not just the one we seem to always talk about. A short introduction to some of the existing coverage types:

Statement coverage – How many percent of statements / lines have been covered?

Branch coverage – How many percent of branches (control structures) have been covered? In the case of an if statement, has the expression evaluated to both true and false. There are variations of branch coverage that talk about more detailed decision coverage, like covering all permutations of true and false in an expression that contain stuff like AND and OR.

Path coverage – How many percent of all possible paths have been covered? This may not sound too different from the two above, but it is actually much more complicated. An example could be a function with two simple if statements (A and B) after each other. To obtain full path coverage, you would have to get all four permutations of true and false in the two if statements (A B - !A B - A !B - !A !B) whereas in statement coverage you would only have to exercise the if statements in isolation. Mind you that this is the simple case – if statements with multiple sub expressions, nested ifs and loops are much worse.

The coverage type we usually talk about is actually statement coverage – but as you can see from just looking at branch and path coverage, much is left to be desired. Coverage types like path coverage are also hard to measure, as many loops will have a near-infinite number of paths.

Another thing that is very hard to measure with code coverage is multi-threaded behavior. It is definitely not caught in our statement coverage and if getting full path coverage is hard, consider getting full path coverage with multiple threads.

One Problem – Test Quality

The problem with using code coverage on its own is that code coverage tells you nothing about the quality of the tests that cover the code. Code coverage tells you how much of your code has actually been executed, but it tells you nothing about the asserts that were made in the tests – that is, it says nothing about the correctness of the code.

The perfect example of this is writing a test with zero asserts (state or interaction). This test will produce a certain percentage of coverage, while ensuring nothing except that the code can successfully execute. While such sanity tests can be useful in some tricky cases with exceptions, this is often a worthless test – it has no quality whatsoever, it doesn’t verify any intent of the programmer who wrote the code it is testing.

The thing that is easy to sell about code coverage is that is is reasonably easy to measure. It is easy to set a percentage, a goal and then try to obtain this goal. 100% code coverage is a great goal – it might be unrealistic in most situations, but it is a good thing to strive towards. But if the tests suck or the programmers who write the tests become lazy the code coverage will be nothing but misleading.

The problem with test quality is that measuring it is really really hard – at least programmatically. How can you even begin to measure how much this code matches the intent of the programmer. Usually the number of asserts that make sense for a given test match only a very low number of the values in the system. And even if we could measure test quality using a program, we might as well do away with the tests and just measure programmer intent versus the real program.

The solution, to me, is discipline and good engineering principles. Code coverage can be really valuable to a team that treat their test code like production code, sharing ownership of the code and doing regular inspections / code reviews to ensure that the test quality is high.

The Inverse Property

One of the things I like best about code coverage is it’s quality as an inverse property – that is, a tool that can specifically tell me which parts of my code that I have not tested. This is a clear signal that you need to be more careful when touching this code.

This is also one of the reasons that I actually like to remove tests that either are very low quality or haven’t kept up to date with the intent of the code they are testing. The ideal solution is to rewrite the tests to match the intent of the code / desired quality, but this is not always realistic. To me, such tests are more harmful than no tests at all – they give a false sense of security and will only confuse if anyone look at them. At least if there is no tests my coverage tool can tell me there is a problem in this part of the system.

And while we are at it, it is actually amusing that some people treat test code like it is nowhere near production code and then the same people seem to think that it is blasphemy to delete even a single test. Treat your test code like your normal code, delete / rewrite it if it is obsolete or doesn’t make sense. Maintaining code that doesn’t add value to the system makes no sense.

Conclusion

Code coverage is a useful metric, but often not in isolation. If you use it then be aware of the implications of the way you’re using it, increasing test quality is much more valuable to me than reaching a specific coverage percentage, although it is good to have some sort of goal. Write tests to verify intent, not to increase the coverage percent – use coverage to find the intents not covered.

kick it on DotNetKicks.com

Tags:

Development | Testing

Testing a Visitor - Mocking and Test Readability

by Rasmus Kromann-Larsen January 22, 2009 21:53

The other day I was using TDD to write a visitor for an object graph at work. I often use mocks a lot and was using mocks in this particular batch of tests as well. However, in the end creating my own fake class turned out to be much better (in my opinion). Favoring state-based testing over interaction-based testing (sometimes) can really simplify the noise within a test and provide clarity.

What I wanted to test was that objects of the correct types were visited from the parent object and was thus producing a number of tests that look something like this:

private MockRepository mocks;

[SetUp]
public void Setup()
{
    mocks = new MockRepository();
}

[Test]
public void ShouldVisitObject1BelowRootObject()
{
    var rootObject = ObjectMother.Build<RootObject>();

    var object1 = ObjectMother.Build<Object1>();
    rootObject.AddObject(object1);

    var visitor = mocks.DynamicMock<IVisitor>();

    using (mocks.Record())
    {
        Expect.Call(delegate { visitor.Visit(object1); });
    }

    using (mocks.Playback())
    {
        rootObject.Accept(visitor);
    }
}

Now this was a pretty big object graph, so there were a lot of tests that looked very much like this one. What I am doing here is interaction-based testing, using a mock to verify that my class under test calls a particular method. There is quite a bit of mock noise in this test, stuff that is related to setting up mocks and our expectations. Noise like this can be much worse - and it was worse in my production code.

In this particular model, all the objects have a collection of sub items - and this collection conveniently contained some the various object types to visit. So I had a lot of tests that very almost identical to the one above. I thought I'd be clever and refactor my tests so I wouldn't have to duplicate as much code for each of these particular cases. After wrestling with it for a bit I came up with this:

private MockRepository mocks;

[SetUp]
public void Setup()
{
    mocks = new MockRepository();
}

[Test]
public void ShouldVisitObject1BelowRootObject()
{
    TestVisitor<RootObject, Object1>(
        delegate(IVisitor visitor, Object1 obj)
        {
            visitor.Visit(obj);
        });
}

It still looks kind of weird, with the delegate floating around (using C# 2.0 at work for now, so no lambda syntax). The test is still arguably even less readable. The delegate is actually used in the helper method to add the expectation to the mock, since the visitor has overload methods for concrete classes (this is the whole idea of visitor, implementing double dispatch - promise I will post about it soon), you can't really abuse generics.

The helper method is shown below and is also kind of hairy.

private delegate void VisitExpectDelegate<T>(IVisitor visitor, T child);

private void TestVisitor<TParent, TChild>(VisitExpectDelegate<TChild> visitExpectDelegate)
    where TParent : AbstractObject
    where TChild : AbstractObject
{
    var parent = ObjectMother.Build<TParent>();
    var child = ObjectMother.Build<TChild>();
    parent.AddObject(child);

    var visitor = mocks.DynamicMock<IVisitor>();

    using (mocks.Record())
    {
        visitExpectDelegate(visitor, child);
    }

    using (mocks.Playback())
    {
        parent.Accept(visitor);
    }
}

Another small thing that annoyed me besides the readability issue is that the code only works for a very specific case, checking that sub items are visited below a parent object. Furthermore, trying to mock visitor behavior of a deeper class hierarchy can turn into a mocking headache real fast.

I asked a colleague about the readability of my test and if he thought it was acceptable. He suggested that I tried faking (creating my own class) the visitor instead of mocking it. In essence, he suggested that I'd use state-based testing over my current interaction-based method.

The result was this:

private CountingVisitor visited;

[SetUp]
public void Setup()
{
    visited = new CountingVisitor();
}

[Test]
public void ShouldVisitObject1BelowRootObject()
{
    var rootObject = ObjectMother.Build<RootObject>();
    rootObject.AddObject(ObjectMother.Build<Object1>());

    rootObject.Accept(visited);

    Assert.That(visited.TotalCount, Is.EqualTo(2));
    Assert.That(visited.GetCount<Object1>(), Is.EqualTo(1));
}

My fake was basically just a visitor that would count how many times it had visited a given type of object and the total count of visits it had made. For me this test is so much more readable - build an object graph, give it to the visitor, make asserts about the visit count using NUnit's "Assert.That" syntax. The cool thing about this is that it makes no assumptions about sub items and can actually be used for any visitor that visits any object graph. It could also test deeper object graphs with ease. I am aware that it doesn't test which particular instances are visited, but I didn't feel that it would add much value to add this to the visitor, although it is possible.

The fake visitor looks like this and is really just a few generic tricks and a dictionary.

private class CountingVisitor : IVisitor
{
    private readonly Dictionary<Type, int> _count = new Dictionary<Type, int>();
    private int _totalCount = 0;

    private void Add<T>(T obj)
    {
        if (!_count.ContainsKey(typeof(T))) _count[typeof(T)] = 0;
        _count[typeof (T)] += 1;
        _totalCount += 1;
    }

    public int TotalCount
    {
        get { return _totalCount; }
    }

    public int GetCount<T>()
    {
        if(!_count.ContainsKey(typeof(T))) return 0;
        return _count[typeof (T)];
    }

    public void Visit(RootObject obj) { Add(obj); }
    public void Visit(Object1 obj) { Add(obj); }
    public void Visit(Object2 obj) { Add(obj); }
}

So remember the state-based testing, mocks are useful animals and sometimes they will be the only reasonable way or the easiest and you should use them, but other types of fakes (particularly hand-crafted ones - I think this one is actually called a "spy") can really give a good boost in readability and flexibility with a minimal code effort.

kick it on DotNetKicks.com

Tags: , , , ,

Development | Testing

Powered by BlogEngine.NET 1.6.1.0
Theme by Mads Kristensen | Modified by Mooglegiant | Adjusted by Rasmus Kromann-Larsen

About Me

I am a danish .NET developer blogging about the technical side of my life, mostly .NET stuff, but also fundamental topics like design patterns, principles and productivity boosters.

In addition, I am a core group member of Aarhus .NET User Group.