Wednesday, March 8, 2006

Introduction to Unit Testing

Notes for a lecture given to Brandeis University’s COSI 22a.

What Is Unit Testing, and Why Should I Care?

Unit testing is the process of writing tests for individual bits of your program, in isolation. A “bit” is a small piece of functionality. We’ll discuss how small later. How can you know whether or not your program works if you don’t test it? If you’ve ever lost points on a programming assignment because something didn’t work right, you could’ve saved yourself from that by testing your program.

If you go on to take COSI 31a, you will do better on the programming assignments if you write tests! More importantly, it’s a good habit to get into as a programmer. Having tests for your code turns programming from an art — “gee, it looks right and seems to work, I think I’m done” — to a science —; “this is the evidence I have to support the claim that my program is behaving correctly.”

Unit testing is one of the easier ways to get into all the nooks and crannies of your code and make sure it’s doing the right thing. The act of writing tests often helps reveal areas where it isn’t clear what it means to do “the right thing.”

What to Test

To figure out what to test, start by thinking about what it means for your program to work. If you have a formal specification, that’s a great place to start. For your homework assignments, you’ve had such a specification, the Java API reference for whichever class you were supposed to be implementing.

You should also think about what all the different parts of the task are. You want at least one test for every public method in every public class. One way to measure the quality of unit tests is a metric called coverage. Coverage measures how much of your code is hit when you run your tests. Consider the following code for the function isNegative:

if(n > 0)
    return false;
else
    return true; 

If you wrote one test for this function, which tested n = -5, you would only have 50% coverage, because two of the four lines are hit by that test (the first two are never executed.) To achieve complete coverage, you also need a test for a positive n, say n = 5. Conceptually, you’re not fully testing the function if you only test that it returns true for negative numbers, you also need to test that it returns false for positive numbers; otherwise, it could be replaced by a function that always returned true and your test suite (the collection of all of your tests) would have no idea! This is a common error I saw in the homeworks. A lot of people were doing things like only testing isEmpty() on an empty list.

There’s one trap I should mention here. If you’re writing your test suite and thinking about how to achieve maximum coverage, one way to do it is to look at the source for your class while you’re writing the test suite and go through every method and branch. The problem with this is that it ties your test suite to implementation details of your code. It’s important to think about the logical cases of the underlying problem you’re solving. Consider the isNegative example. What does it return for n = 0? According to a mechanical coverage check, you don’t need to add a test for that, since you’ve already test both cases in the code. The zero case is something that it’s easy to get wrong, though. It’s the boundary between negative and positive. A good rule of thumb is to always write specific tests for boundary conditions. The isNegative above does the wrong thing, and it’s very easy to miss unless you explicitly check isNegative(0). The way to figure out where the boundary cases, the corner cases, the weird inputs which will give you problems are is to have a detailed mental picture of what a particular method is supposed to. If you understand what it really means to test whether a number is negative, it should occur to you that 0 is an interesting case to check. Think about ways to implement the functionality, and ways to implement it incorrectly. When comparing the size of two lists, you should probably test not only cases like {1, 2, 3} == {1, 2, 3, 4}, but also {1, 2, 3, 4} == {1, 2, 3}, because catching one but not the other is a common mistake to make. Figuring out what the easy mistakes are is hard. Of course, figuring out the hard mistakes is harder.

Also make sure to test the side effects and error conditions. If a method is supposed to throw particular exceptions on particular invalid inputs, does it? If LinkedList.addAll(Collection) is supposed to return true to indicate that the list was modified, does it return false when the collection is empty? A well-written spec makes this job a lot easier. Look at the documentation for the method and make sure you’re testing that it does everything that the documentation specifies, and exactly what the documentation specifies.

Another source of tests is bugs. When you find a bug, it indicates something that you forgot to test. When this happens, write a test case for it. You should do this before fixing the bug to verify that the test case fails when the bug is present. Then fix the bug and make sure that the test case starts passing. Things that you got wrong once are things that you’re liable to get wrong again as things change. These sorts of tests are called regression tests, because they’re testing that your quality is always moving forward and never regressing.

How to Test It

Take a look at the included PizzaTest class and Pizza documentation. I’ve written a package, Pizza, for determining a set of toppings that will make a group of people happy when they’re trying to order a pizza. Full source code for Pizza is on the web, see below for the URL.

The test suite is structured into groups of tests which test units of functionality. The simple classes, Topping and ToppingConstraint, have one group for each class. Pizza has a few different groups. I isolated each group so that it doesn’t depend on anything done in any of the other groups. Each group that needs to construct a Pizza initializes its own topping list. This way the test groups aren’t dependent on each other and a failure in one small area of the test suite won’t randomly break a bunch of tests that should work. In order for a test suite to be useful, you want it to help you figure out exactly what is failing. There are trade-offs, though. I use Topping.equals(Object), even in tests for completely unrelated things. These tests will break if Topping.equals is broken. It would be a lot of extra busywork to avoid using Topping.equals, and it couldn’t be done without tying myself to the internal makeup of Topping. I shouldn’t need to rewrite the entire test suite if another attribute is added to toppings! One solution to this would be to indicate in some fashion that some of the other groups of tests, such as the applyConstraints() tests, depend on the toppings() tests, and we shouldn’t even bother running the applyConstraints() tests if the toppings() tests fail. There are frameworks to help you write unit tests, such as JUnit, which allow you to express this.

The first test group, mustSetToppings(), is testing that an error condition is generated under circumstances that it should be and not generated under other circumstances. It’s also a good example of how to test whether or not an exception is thrown.

The second test group, toppings(), tests the Topping class. It’s a fairly trivial class, but we test it anyway. It’s nice to not have to worry about whether or not it’s working. The test suite can get things wrong, of course, so don’t get overconfident. Note that the way equality of toppings is defined, they must have both the same name and the same type, so the tests for Topping.equals(Object) test cases where they have the same name but different types and vice versa, not just a case where they’re completely different and a case where they’re completely identical. We also test the case of them being completely different. This way, if, say, the name equality test is broken, we will know exactly what went wrong, because the “same names, different types” test will fail if the name test is broken to return false negatives, and the “different names, same types” test will fail if the name test is broken to return false positives.

applyConstraints() is the most complicated test group. This makes sense, it’s testing the really hard bit of Pizza. The individual tests are straightforward, the tricky part was figuring out which tests there should be. To come up with those test cases, I spent a lot of time thinking about the different ways in which this could go wrong. I intentionally picked a loosely-specified problem to make this job more interesting. The problem that Pizza is attempting to solve, how it’s supposed to work, what sorts of results it should return… these are all somewhat open to interpretation. That’s often what you have to do when you’re programming. A lot of times, you’ll get a vague problem, and you have to figure out how to solve it. Sometimes these are “business requirements” handed to you by your boss, sometimes it’s you thinking that it would be cool to do “foo”. The homeworks and labs have been based on fairly detailed specifications, and there have still been ambiguities! It took me around four hours to write all of this code, Pizza and PizzaTest and PizzaMain, and at least an hour, maybe two hours, was writing the applyConstraints() tests. Most of that time was figuring out what tests I needed to write!

Further Resources

These lecture notes and all associated source code are in the public domain.