The original repo is about using a subset of a language to compare language implementations. I can see the point in that. But language benchmarks like this are incredibly useless and very easy to get wrong anyway. For example it you actually cared about performance for the bounce example you would never write it like this in C. Bouncing 100 balls in a loop 50 times with 4 ifs just tests the branch predictor. There is nothing to learn from this in practice.
Respectfully disagree. This is a compiler engineering tool backed by peer-reviewed research (DLS'16, 112+ citations) and used in 30+ academic publications across PLDI, OOPSLA, and ECOOP. It requires understanding controlled experimental methodology and compiler optimization theory to interpret correctly. Perhaps that context clarifies its purpose. The goal is to assess compiler effectiveness for a common set of core language abstractions (objects, closures, arrays), not to represent application-level performance or claim that production C code would be written this way. Your "branch predictor" criticism actually validates the benchmark's design: if different language implementations handle the same branching patterns with dramatically different performance, that reveals genuine differences in compiler designs.