that's not all there is to it, but I think that "the rest of it" is just additional fine tuning.
Benchmarks are good fixed targets for fine tuning, and I think that Sonnet gets significantly more fine tuning than Opus. Sonnet has more users, which is a strategic reason to focus on it, and it's less expensive to fine tune, if API costs of the two models are an indicator.
And I would expect Opus 4 to be much the same.