What fundamentals would make the jit, this specific jit faster? Because if it's ...

teruakohatu · 2025-07-08T13:21:16 1751980876

The article points out they have only begun adding optimisers to the jit compiler.

Unoptimised jit < optimised interpreter (at least in this instance)

They are working on it presumably because they think there will eventually be a speed ups in general or at least for certain popular workloads.

taeric · 2025-07-08T13:43:00 1751982180

The article also specifically calls out machine code generation as a separate thing. I confess that somewhat surprises me, as I would expect getting machine code generated would be a main source of speed up for a JIT? That and counter based choices on what optimizations to perform?

Still, to directly answer the first question, I would hope even if there wasn't obvious performance improvements immediately, if folks want to work on this, I see no reason not to explore it. If we are lucky, we find improvements we didn't expect.

adrian17 · 2025-07-08T14:43:17 1751985797

> I confess that somewhat surprises me, as I would expect getting machine code generated would be a main source of speed up for a JIT?

My understanding is that the basic copy-and-patch approach without any other optimizations doesn’t actually give that much. The difference between an interpreter running opcodes A,B,C and a JIT emitting machine code for opcode sequence A,B,C is very little - the CPU running the code will execute roughly the same instructions for both, the only difference is that the jit avoids doing an op dispatch between each op - but that’s already not that expensive due to jump threading in the interpreter. Meanwhile the JIT adds an extra possible cost of more work if you ever need to jump from JIT back to fallback interpreter.

But what the JIT allows is to codegen machine code corresponding to more specialized ops that wouldn’t be that beneficial in the interpreter (as more and smaller ops make it much worse for icaches and branch predictors). For example standard CPython interpreter ops do very frequent refcount updates, while the JIT can relatively easily remove some sequences of refcount increments followed by immediate decrements in the next op.

Or maybe I misunderstood the question, then in other words: in principle copy-and-patch’s code generation is quite simple, and the true benefits come from the optimized opcode stream that you feed it that wouldn’t have been as good for the interpreter.

taeric · 2025-07-08T14:56:25 1751986585

Right, that is basically what I was asking. Essentially, I expected the machine code to be a bit of an unrolling of the interpreter over the opcodes that a piece of code is executing.

That my intuition is wrong here doesn't shock me, I should add. It was still a surprise and it will get me to update my idea on what the interpreter is doing.

moregrist · 2025-07-08T14:49:47 1751986187

A byte code interpreter is, very approximately, a lookup table of byte code instructions that dispatches each instruction to highly optimized assembly.

This will almost certainly outperform a straight translation to poorly optimized machine code.

Compilers are structured in conceptual (and sometimes distinct) layers. In a classic statically-typed language will only compile-time optimizations, the compiler front-end will parse the language into a abstract syntax tree (AST) via a parse tree or directly, and then convert the AST into the first of what may be several intermediate representations (IRs). This is where a lot of optimization is done.

Finally the last IR is lowered to assembly, which includes register allocation and some other (peephole) optimization techniques. This is separate from the IT manipulation so you don’t have to write separate optimizers for different architectures.

There are aspects of a tracing JIT compiler that are quite different, but it will still use IR layers to optimize and have architecture-dependent layers for generating machine code.

taeric · 2025-07-08T15:01:54 1751986914

Right, I guess my main surprise is that the PyPy byte code interpreter is as fast as it is. My understanding is obviously outdated on how it is implemented; but I thought its claim to fame was that it was purely written in python. I'm assuming the subset of python it is implemented in is fairly restricted? That or my understanding was wrong in other ways. :D

pjmlp · 2025-07-09T07:12:50 1752045170

Yes, it is called RPython,

https://rpython.readthedocs.io/en/latest/getting-started.htm...

MobiusHorizons · 2025-07-08T15:51:19 1751989879

The way I understand it, the machine code generator emits machine code for some particular piece of bytecode (or whatever the JIT IR is). This is almost like an assembler and probably has templates that it expands. It is important for this machine code to be fast, but it each template is at a pretty low level, and lacks the context for structural optimizations. The optimizer works at a higher level of abstraction, and can make these structural optimizations. You can get very large speed-ups when you can remove code that isn't necessary, or emit equivalent code that has a lower complexity or memory overhead. Typical examples of things optimizers do are * use registers instead of memory for function arguments * constant folding * function inlining * loop unrolling

I don't know if that's exactly how it works for this particular effort, but that would be my expectation.

pizlonator · 2025-07-08T17:25:58 1751995558

In JavaScript, an unoptimizing JIT (no regalloc, no optimizations that look at patterns of ops, no analysis) is faster than the interpreter because it eliminates opcode dispatch.

Adding more optimizations improves things from there.

But the point is, a JIT can be a speedup just because it isn’t an interpreter (it doesn’t dynamically dispatch ops).