@chrislattner out of curiosity, can you share on which CPU the numbers in https://docs.modular.com/mojo/notebooks/Matmul.html are obtained and/or the fraction of peak performance on that machine? Speedups over naive Python feel kind of strawman. I can see a 3500x improvement over that with practically vanilla MLIR, similar schedule, and no autotuning.