"I ran a t-test on the untreated / treated samples and the difference is significant! The treatment worked!"
...but the data table shows a clear trend over time across both groups because the samples were being irradiated by intense sunlight from a nearby window. The model didn't account for this possibility, so it was rejected, just not because the treatment worked.
That's a relatively trivial example and you can already imagine ways in which it could have occurred innocently and not-so-innocently. Most of the time it isn't so straightforward. The #1 culprit I see is failure to account for some kind of obvious correlation, but the ways in which a null hypothesis can be dogshit are as numerous and subtle as the number of possible statistical modeling mistakes in the universe because they are the same thing.
I think you're more observing an issue with experimental models not challenging a null hypothesis, than with poor null hypotheses themselves. In other words, papers creating experiments that don't actually challenge the hypothesis. There was a major example of this with COVID. A typical way observational studies assessed the efficacy of the vaccines was by looking at outcomes between normalized samples of nonvaccinated and vaccinated individuals who came to the hospital and seeing their overall outcomes. Unvaccinated individuals generally had worse outcomes, so therefore the vaccines must be effective.
This logic was used repeatedly, but it fails to account for numerous obvious biases. For instance unvaccinated people are generally going to be less proactive in seeking medical treatment, and so the average severity of a case that causes them to go to the hospital is going to be substantially greater than for a vaccinated individual, with an expectation of correspondingly worse overall outcomes. It's not like this is some big secret - most papers mentioned this issue (among many others) in the discussion, but ultimately made no effort to control for it.
I'm not great at stats so I don't understand this example. Wouldn't the sunlight affect both groups equally? How can an equal exposure to sunlight create a significant difference?
...but the data table shows a clear trend over time across both groups because the samples were being irradiated by intense sunlight from a nearby window. The model didn't account for this possibility, so it was rejected, just not because the treatment worked.
That's a relatively trivial example and you can already imagine ways in which it could have occurred innocently and not-so-innocently. Most of the time it isn't so straightforward. The #1 culprit I see is failure to account for some kind of obvious correlation, but the ways in which a null hypothesis can be dogshit are as numerous and subtle as the number of possible statistical modeling mistakes in the universe because they are the same thing.