Software testing: The good, the bad and the ugly

Recently I have started working on a large scale legacy project and I’ve had some time to understand how things work there. It is a fairly old JEE5-based tech setup and there are plenty of areas that can cause headaches. But a particular pain point is the way testing is handled. Yes, there are tests. There are quite a number of them. Some 9000 get executed everytime you commit a change. They run for almost an hour, so you can’t exercise them with every build on your local machine. The vast majority of these tests aren’t unit tests. With a specific test scenario, they normally test against the full application, which has to run in a container. Sometimes, there are interdependencies between them resulting in randomly failing tests.

I have worked on a number of projects in the past and none of them were perfect. But given the state of software testing in my current project they now feel like paradise. There’s an upside to it, however. Confronted with the described situation I remembered a book that I bought last year: Working effectively with Unit Tests (WEWUT) by Jay Fields. It’s an excellent collection of best-practices and patterns that can be applied when writing tests. I again picked up Jay’s book and recommended it to some of my colleagues as well. I’m quite sure that it will not be possible to enhance the existing tests by simply applying a few patterns and best-practices – there would have to be a lot of changes to the project to even enable developers to write helpful tests. However, I think that it might be helpful to contrast the situation described above with some of what Jay advocates in his book.

Motivations for testing

There’s different motivations why we write tests. Sometimes it’s done just because there’s a requirement in the project to have a certain level of code coverage, or – when fixing a bug – to provide a test that will demonstrate that the fix works and ensures it will not happen again. When things get done because someone told you to, the results are likely to be of questionable value. In our case, a lot of tests address previous bugs in the system. Given the type of test that is implemented for almost all of those test – Jay uses the term Broad Stack Tests – it takes a long time running them, and when a test fails, you’re often left in the dark about the actual reason for a failure.

So what are good reasons for writing test?

Creating a safety net

The most common motivator for writing tests is probably to create a safety net that enables developers to change code after it has been created. When writing a test with this reasoning in mind, it’s important to have an understanding which part of your system benefits from tests. Writing, understanding and maintaining tests comes with a cost and if tests are written badly, this cost can be significant. And it’s not only financial cost, it’s a killer for developer motivation as well.

Fast feedback

For me it’s equally important that get fast feedback through tests. No matter if you write new code or change something, a well written test suite can save a lot of time. To get fast feedback, tests need to be run frequently. It’s not enough to execute a test suite once before you commit something. There might be a few long running tests that you don’t want to execute constantly, but they should definitely not be the majority of you tests.

Documentation

Another important reason for writing tests is documentation: Well written tests should describe how a component or a system is expected to behave. For this reason it’s important to have concise names for tests. And more importantly, it’s necessary that a developer can understand what a test is actually doing without spending hours digging through countless lines of code. In my current project, if I want to understand something, I have to go to several business documents or ticket descriptions in an issue tracker, read up and see how it matches to actually implemented code. Bad quality code adds to the problem and it’s absolutly ridiculous how much time get’s lost in the process.

Test-Driven-Development

If you practice TDD, there’s other drivers for your tests: You want to use your tests to create well-structured components by breaking up problems into smaller units. You want to approach a problem by defining the simplest thing that can possibly work and you want to create a development process that enables quick iteration. Unfortunately, many developers just use the term TDD as a synonym just for writing tests, while it really is a software development process.

Customer Acceptance

Some of the tests that exist in my new project are implemented on top of business definitions. I do find value in these type of tests in general, but it’s important to be careful how to enable these tests. In our case there’s immensely complicated Excel files that define input-, intermediate- and output values. Developers are then expected to implement tests on top of that. I’ve used customer acceptance tests before and they always were a bit nasty to work with, but we’ve been using JBehave, a test library that uses the Gherkin syntax. Test specifications could be read in a natural way and we didn’t try to go into too much detail. It’s worth noting that these have not been unit tests and we only had a few of them.

Rules and patterns

Jay has some very specific recommandations in his book. Before you can try to apply them, you have to make the decision to actually write unit tests, though. This seems to be the biggest issue with my current problem: Only rarely a test focuses on a single class or some other type of unit. Once you have arrived there, I find the following advises very helpful. WEWUT has more detailed explanations on all of those and introduces a number of other concepts as well.

DAMP, not DRY

Test code is different to production code. While production code works in collaboration to fulfill a task, test code does not do that. Tests live on their own and have a specific task each: To test a piece of functionality. If you are writing tests you should aim to create little universes. It’s important that tests aren’t coupled. For that reason it’s not as important to remove all duplication from test code. It’s more important that a developer looking at a specific test is able to quickly understand what a test is doing without having to to look up information in a different location of the code base. The Don’t-Repeat-Yourself (DRY) rule should be used very carefully in a testing context. Jay gives the advise that you should only apply DRY in a testing context, if the result can be used for tests globally. When thinking about tests, we should remember the acronym DAMP instead: Descriptive And Meaningful Phrases.

One concrete example for DRY misuse and violating the little universes guideline is having @Setup methods: A developer looking at a test doesn’t immediately see that setup code is invoked before a test is run and will have to spend more time to analyze what’s going on. Also, setup methods often execute unnecessary code that is only needed for a few of the tests.

Creating test data

There’s different approaches to create test data. Some use fixtures, others just create objects in individual tests. The solution advocated in WEWUT are Testdata-Builders. They are nothing new and provide a way to create data objects with sensible defaults that can be adjusted in every individual test. By using a builder class, you avoid test coupling by using common fixtures and you won’t have to change multiple tests when a constructor changes. There’s different syntax for creating builders. Some just use static methods to create a builder:

anOrder().with(aCustomer().build()).build();

Jay uses a helper class called a that contains the test builders and is used to start the construction process.

a.order.w(a.customer.build()).build();

This helps facilitating IDE features but with a lot of domain classes the helper class could grow quite big, so personally I like the first approach better.

Solitary and Sociable tests

Another important concept that is expanded on in WEWUT is the seperation of test types into Solitary tests and Sociable tests. As the name suggests, Solitary unit tests should only test functionality of one class, everything else should be stubbed or mocked. Conversely, sociable unit tests will test interactions between components, but will not go into great detail on every single collaborator. Having a lot of solitary tests accompanied by a smaller number of sociable tests helps reduce cascading failures. Solitary test should be easy to understand, while solitary unit tests can be a bit more complex. Jay advises to fix solitary tests first, and only then look at failures from sociable unit tests. Using this approach can help to reduce the amount of time spent understanding and maintaining tests.

Test names

I’ve stated before, that test code is different from production code. Jay goes as far as saying that it’s not immediatly clear why tests are implemented using methods. Unlike in production code test methods are not called from other test code. They are only required so that a testing framework can invoke them. But since they are formally required, they should have meaningful names so that someone looking at tests can understand what a test is doing. Method names are basically documentation.

Test structure and Assertions

A helpful pattern to write understandable tests is the AAA rule. It stands for Arrange-Act-Assert and it’s a guideline to improve readability by structuring tests to first set up a context, then execute the component that should be tested and afterwards assert the result.

Equally important is another advise: One assertion per test. I haven’t followed this approach strictly myself until now, but I can see how it helps keeping the size of a test small: Every test is focused on exactly one thing. If your code requires to test more than one value, maybe it’s a good idea to revisit the implementation of the component that you are testing. If you have more than one assertion, every test run will stop once the first one is violated. You will have to iterate to work your way through a test. If you have one assertion only you’ll get all violations with one test run and gain more insight immediatly.

Combine this with another rule: Expect literals. Don’t use variables, constants and complex objects in your assertions, rather stick with literals. That way a developer will not need to look up the internals of an object. My current project applies the same static analysis rules for test code as for production code. I strongly think that this is wrong. If you use literals in production code that’s a problem, because often it’s not clear what meaning is associated and the same value could be required somewhere else. Tests work different: You should not reuse code between tests anyway and you are using example values in test code. You can use custom assertions if there is a need to assert a complex object.

public class CustomAssert {
  public static void assertMoney(double expected, Money candidate) {
      assertEquals("Unexpected money value!", expected, candidate.toDouble(), 0);
  }
}

You can then then use this assertion in your tests, saving you the trouble to convert to a literal value in every test:

@Test
public void addMoney() {
  assertMoney(3.0, new Money(1.0).add(new Money(2.0));
}

@Test
public void subtractMoney() {
  assertMoney(2.0, new Money(4.0).subtract(new Money(2.0));
}

Having a custom assertion for this is a good example of well-applied DRY in tests. It’s usable globally for potentially every test in your application and as a developer you will probably not need to look into the method definition to understands what’s happening, because the API is quite clear and you have probably seen it in action before.

TL;DR

For different reasons it’s universally accepted that complex software systems need automated testing. But often test code is given less attention than production code and the result can be very frustrating. Jay’s book helped me to better understand what good tests can look like. WEWUT is not a step-by-step tutorial that will tell you what you have to do, of course you will always have to think about the context that you are working in. But I think it enlarges our toolkit and can help us write better and more maintainable tests for our software.