> Not all bugs can be reproduced by an automated test While that may be true, I'...

> Not all bugs can be reproduced by an automated test

While that may be true, I'd say in this case, order-dependent tests are evil. This case study pretty much fits the definition -- the test result changes depending on what other tests are run before it (due to shared state across tests). It certainly is possible to write automated tests for this -- reset that state to its default before each test!

One thing that Google recently (in the past year or so) added to its CI setup is a job that goes through periodically and randomizes the order in which tests are run, thereby increasing the chance order-dependent tests are caught. (Early on, I remember being annoyed by all the triggers due to floating point error accumulation in a monitoring library -- do I really care about an error of 1e-10 when my noise is >10?)

That's a long-winded way of saying that it's also possible to catch order-dependent tests by shuffling the order in which they run, although built-in support may vary depending on your testing framework. (from a brief search, it looks like rspec and junit both support randomization; xunit forces it).

That said, I absolutely agree with the other conclusions, especially that binary search for debugging is immensely useful (even when dealing with Google-scale tens of thousands of unrelated commits due to monorepo), and definitely empathize with debugging often taking much longer than the bugfix.