> Not all bugs can be reproduced by an automated test
While that may be true, I'd say in this case, order-dependent tests are evil. This case study pretty much fits the definition -- the test result changes depending on what other tests are run before it (due to shared state across tests). It certainly is possible to write automated tests for this -- reset that state to its default before each test!
One thing that Google recently (in the past year or so) added to its CI setup is a job that goes through periodically and randomizes the order in which tests are run, thereby increasing the chance order-dependent tests are caught. (Early on, I remember being annoyed by all the triggers due to floating point error accumulation in a monitoring library -- do I really care about an error of 1e-10 when my noise is >10?)
That's a long-winded way of saying that it's also possible to catch order-dependent tests by shuffling the order in which they run, although built-in support may vary depending on your testing framework. (from a brief search, it looks like rspec and junit both support randomization; xunit forces it).
That said, I absolutely agree with the other conclusions, especially that binary search for debugging is immensely useful (even when dealing with Google-scale tens of thousands of unrelated commits due to monorepo), and definitely empathize with debugging often taking much longer than the bugfix.
While that may be true, I'd say in this case, order-dependent tests are evil. This case study pretty much fits the definition -- the test result changes depending on what other tests are run before it (due to shared state across tests). It certainly is possible to write automated tests for this -- reset that state to its default before each test!
One thing that Google recently (in the past year or so) added to its CI setup is a job that goes through periodically and randomizes the order in which tests are run, thereby increasing the chance order-dependent tests are caught. (Early on, I remember being annoyed by all the triggers due to floating point error accumulation in a monitoring library -- do I really care about an error of 1e-10 when my noise is >10?)
That's a long-winded way of saying that it's also possible to catch order-dependent tests by shuffling the order in which they run, although built-in support may vary depending on your testing framework. (from a brief search, it looks like rspec and junit both support randomization; xunit forces it).
That said, I absolutely agree with the other conclusions, especially that binary search for debugging is immensely useful (even when dealing with Google-scale tens of thousands of unrelated commits due to monorepo), and definitely empathize with debugging often taking much longer than the bugfix.