I don't have confidence that systems built on top of a specific model will work ...

I don't have confidence that systems built on top of a specific model will work the same on a higher version. Unlike, say, the Go programming language where backwards compatibility is something you can generally count on (with exceptions being well documented).

I wouldn't want to be in charge of regression testing an LLM-based enterprise software app when bumping the underlying model.