I didn't say it was easy--I just said it struck a balance relative to trying to openly publish the whole dataset. Yes. Obviously comes with administrative overhead. So did dealing with the initial researcher. If the institution can manage the one, it can manage the other.
> As for the synthetic datasets that's basically just having tests
An appropriate synthetic dataset would inevitably be part of a great test suite, but it's also pretty simple to write narrow unit-tests that embed rather than stretch the same assumptions and biases that are also in the code (i.e., simple enough that even people who feed themselves with code do it).
An independent project/practice for synthesizing sample datasets from the real dataset lowers the bar and clarifies the best-practice for releasing a dataset that a verifier could actually use to spot simple bugs, edge-cases, and algorithm issues. Ideally, yes, this practice nudges the researchers to bother running their program over generated sample datasets as well, and to pay attention to whether the results make sense.
I didn't say it was easy--I just said it struck a balance relative to trying to openly publish the whole dataset. Yes. Obviously comes with administrative overhead. So did dealing with the initial researcher. If the institution can manage the one, it can manage the other.
> As for the synthetic datasets that's basically just having tests
An appropriate synthetic dataset would inevitably be part of a great test suite, but it's also pretty simple to write narrow unit-tests that embed rather than stretch the same assumptions and biases that are also in the code (i.e., simple enough that even people who feed themselves with code do it).
An independent project/practice for synthesizing sample datasets from the real dataset lowers the bar and clarifies the best-practice for releasing a dataset that a verifier could actually use to spot simple bugs, edge-cases, and algorithm issues. Ideally, yes, this practice nudges the researchers to bother running their program over generated sample datasets as well, and to pay attention to whether the results make sense.