The automated tests on my project have never all been green at the same time; Fact! I’m talking about over the course of 2 years. This is a very sad state of affairs.
Why? Surely it can’t be that hard to get tests passing. Before I start, I should give an idea of the scope of these tests. There are Unit tests which are run as part of the product build. If these fail, then the product build fails. Therefore they are 99.9% reliable. (nothings ever 100%!).
Our product deals with SQL Server and version control systems, so we have engine/integration tests that run against all of the version control systems we support.
And lastly we have GUI tests that run against almost all of the version control systems we support and a smattering of OS/SQLServer combinations.
So add all these up and we have 25 different test runs that run each and every night, and although they parallelise, they take about 10 hrs to run.
Over the past couple of months, we’ve made concerted efforts to either fix failing tests, or delete them. After all, tests that fail all the time (assuming it’s not a product failure) are no good. Likewise, intermittently failing tests are arguable worse because you need to make sure they are not failing this time because of an introduced issue. In this case, the danger is assuming they have failed due to their intermittent/unstable nature so you ignore them, when in fact they have actually failed due to a bug that was introduced yesterday.
So we have improved the tests significantly, and 50% of the test projects completely passed the night before last. YEY!
But last night only 3 of them passed… SIGH! Why? As has become a daily ritual, I trawl through the projects, and investigate the failures… turns out one of the our database servers became incommunicado thus throwing the test run into disarray.
We host all of our test database servers, version control systems etc in hyper-v vm’s that are rolled back to a known good snapshot every night, and are used by our project only. So we are doing everything we can to make sure the environments are in a good state before running the tests.
But even so, this dependency on external systems will always be a ‘random’ point of failure. I’m in no doubt that even if we fixed all of the issues in the tests, and got the actual tests 100% reliable (ok I said nothings ever 100%, but you know what I mean!), there would still be the chance that one of the database servers has broken, or perhaps a VM didn’t roll back properly, or perhaps a repository on one of the version control systems decided to stop working.
What can we do? This is a question we’ve been asking for a while and this prompted another question; “Why do we need to talk to sql server, version control etc?”, to which the answer is “To make sure our product behaves how we expect it do, and to do that it needs to work, and to do that it needs to talk to a database and a vcs!”.
So what if it didn’t have to talk to a real sql server and a real vcs? What if we could separate the database and version control interaction layers in the tests like they are in the product, and mock out a database and version control system? They have tests which specifically test the communication layers to the database and vcs systems, but then the bulk of the tests which are testing our product, would not have this dependency.
This would have several benefits. Firstly , it would mean we would have total control over what the database/vcs mocked objects would be making available to our product. Secondly there would not be the overhead in creating databases, creating/restoring repositories which would make the tests significantly quicker (we think!), and thirdly it can only help to make the tests more reliable and consistent.
The downsides are it would mean trying to mock out sql server… that would not be easy, but then it’s probably not as hard as we think, and we would have to be careful to write some really good tests for testing the database and vcs interaction layers.
But we think this is the way to go, and after writing test automation for more years than I care to remember, removing the unpredictability when relying on external systems can only be a good thing!