Introduction to Unit Testing
Notes for a lecture given to Brandeis University’s COSI 22a.
What Is Unit Testing, and Why Should I Care?
Unit testing is the process of writing tests for individual bits of your program, in isolation. A “bit” is a small piece of functionality. We’ll discuss how small later. How can you know whether or not your program works if you don’t test it? If you’ve ever lost points on a programming assignment because something didn’t work right, you could’ve saved yourself from that by testing your program.
If you go on to take COSI 31a, you will do better on the programming assignments if you write tests! More importantly, it’s a good habit to get into as a programmer. Having tests for your code turns programming from an art — “gee, it looks right and seems to work, I think I’m done” — to a science —; “this is the evidence I have to support the claim that my program is behaving correctly.”
Unit testing is one of the easier ways to get into all the nooks and crannies of your code and make sure it’s doing the right thing. The act of writing tests often helps reveal areas where it isn’t clear what it means to do “the right thing.”
What to Test
To figure out what to test, start by thinking about what it means for your program to work. If you have a formal specification, that’s a great place to start. For your homework assignments, you’ve had such a specification, the Java API reference for whichever class you were supposed to be implementing.
You should also think about what all the different parts of the task are.
You want at least one test for every public method in every public class.
One way to measure the quality of unit tests is a metric called coverage.
Coverage measures how much of your code is hit when you run your
tests. Consider the following code for the function isNegative
:
if(n > 0) return false; else return true;
If you wrote one test for this function, which tested n = -5
, you
would only have 50% coverage, because two of the four lines are hit by that test
(the first two are never executed.) To achieve complete
coverage, you also need a test for a positive n
, say n = 5
.
Conceptually, you’re not fully testing the function if you only test that it
returns true
for negative numbers, you also need to test that it
returns false
for positive numbers; otherwise, it could be replaced
by a function that always returned true
and your test suite (the
collection of all of your tests) would
have no idea! This is a common error I saw in the homeworks. A lot of
people were doing things like only testing isEmpty()
on an empty
list.
There’s one trap I should mention here. If you’re writing your test suite
and thinking about how to achieve maximum coverage, one way to do it is to
look at the source for your class while you’re writing the test suite and
go through every method and branch. The problem with this is that it ties
your test suite to implementation details of your code. It’s important to
think about the logical cases of the underlying problem you’re solving.
Consider the isNegative
example. What does it return for
n = 0
? According to a mechanical coverage check, you don’t need
to add a test for that, since you’ve already test both cases in the code.
The zero case is something that it’s easy to get wrong, though. It’s
the boundary between negative and positive. A good rule of thumb is
to always write specific tests for boundary conditions. The
isNegative
above does the wrong thing, and it’s very easy to miss
unless you explicitly check isNegative(0)
.
The way to figure out where the boundary cases, the corner cases, the
weird inputs which will give you problems are is to have a detailed mental
picture of what a particular method is supposed to. If you understand what
it really means to test whether a number is negative, it should occur to
you that 0 is an interesting case to check. Think about ways to implement the
functionality, and ways to implement it incorrectly. When comparing the
size of two lists, you should probably test not only cases like
{1, 2, 3} == {1, 2, 3, 4}
, but also {1, 2, 3, 4} == {1, 2, 3}
,
because catching one but not the other is a common mistake to make. Figuring
out what the easy mistakes are is hard. Of course, figuring out the hard
mistakes is harder.
Also make sure to test the side effects and error conditions. If a method
is supposed to throw particular exceptions on particular invalid inputs, does
it? If LinkedList.addAll(Collection)
is supposed to return
true
to indicate that the list was modified, does it return false
when the collection is empty? A well-written spec makes this job a lot easier.
Look at the documentation for the method and make sure you’re testing that
it does everything that the documentation specifies, and exactly what the
documentation specifies.
Another source of tests is bugs. When you find a bug, it indicates something that you forgot to test. When this happens, write a test case for it. You should do this before fixing the bug to verify that the test case fails when the bug is present. Then fix the bug and make sure that the test case starts passing. Things that you got wrong once are things that you’re liable to get wrong again as things change. These sorts of tests are called regression tests, because they’re testing that your quality is always moving forward and never regressing.
How to Test It
Take a look at the included PizzaTest
class and Pizza
documentation. I’ve written a package,
Pizza
, for determining a set of toppings that will make a group of people
happy when they’re trying to order a pizza. Full source code for Pizza
is on the web, see below for the URL.
The test suite is structured into groups of tests which test units of
functionality. The simple classes, Topping
and ToppingConstraint
,
have one group for each class. Pizza
has a few different groups.
I isolated each group so that it doesn’t depend on anything done in any
of the other groups. Each group that needs to construct a Pizza
initializes its own topping list. This way the test groups aren’t dependent
on each other and a failure in one small area of the test suite won’t randomly
break a bunch of tests that should work. In order for a test suite to be useful,
you want it to help you figure out exactly what is failing. There are trade-offs,
though. I use Topping.equals(Object)
, even in tests for completely
unrelated things. These tests will break if Topping.equals
is
broken. It would be a lot of extra busywork to avoid using Topping.equals
,
and it couldn’t be done without tying myself to the internal makeup of
Topping
. I shouldn’t need to rewrite the entire test suite if
another attribute is added to toppings! One solution to this would be to
indicate in some fashion that some of the other groups of tests, such as
the applyConstraints()
tests, depend on the toppings()
tests, and we shouldn’t even bother running the applyConstraints()
tests if the toppings()
tests fail. There are frameworks to help
you write unit tests, such as JUnit, which allow you to express this.
The first test group, mustSetToppings()
, is testing that an error
condition is generated under circumstances that it should be and not
generated under other circumstances. It’s also a good example of how to test
whether or not an exception is thrown.
The second test group, toppings()
, tests the Topping
class. It’s a fairly trivial class, but we test it anyway. It’s nice to
not have to worry about whether or not it’s working. The test suite
can get things wrong, of course, so don’t get overconfident. Note that
the way equality of toppings is defined, they must have both the same
name and the same type, so the tests for Topping.equals(Object)
test
cases where they have the same name but different types and vice versa, not
just a case where they’re completely different and a case where they’re
completely identical. We also test the case of them being completely
different. This way, if, say, the name equality test is broken, we will
know exactly what went wrong, because the “same names, different types”
test will fail if the name test is broken to return false negatives,
and the “different names, same types” test will fail if the name test is
broken to return false positives.
applyConstraints()
is the most complicated test group. This makes sense,
it’s testing the really hard bit of Pizza
. The individual tests are
straightforward, the tricky part was figuring out which tests
there should be. To come up with those
test cases, I spent a lot of time thinking about the different ways in which
this could go wrong. I intentionally picked a loosely-specified problem to
make this job more interesting. The problem that Pizza
is attempting
to solve, how it’s supposed to work, what sorts of results it should return…
these are all somewhat open to interpretation. That’s often what you have to
do when you’re programming. A lot of times, you’ll get a vague problem,
and you have to figure out how to solve it. Sometimes these are “business
requirements” handed to you by your boss, sometimes it’s you thinking
that it would be cool to do “foo”. The homeworks and labs have
been based on fairly detailed specifications, and there have still been
ambiguities! It took me around four hours to write all of this code,
Pizza
and PizzaTest
and PizzaMain
, and at least
an hour, maybe two hours, was writing the applyConstraints()
tests.
Most of that time was figuring out what tests I needed to write!
Further Resources
- Pizza source code
- Some other types of testing: integration testing, stress testing, fuzz testing, performance testing
- JUnit is a framework for doing unit testing in Java. It handles a lot of the grunt work for you. There are some additional packages for it that will automatically measure the coverage of your test suite.
These lecture notes and all associated source code are in the public domain.