Have an opinion on the design, imagine something, then tell it to do just that, then iterate. It's when you're unspecific you get the generic, bland and typical LLM design, you just have to be subjective and influence it in some (human) direction.
If the alternative is "burn more tokens on finding issues than the attackers do", formal verification starts to look comparatively feasible cost. Think of it as setting an upper bound on cost, vs just burning more and more tokens.
AI assistants would reduce effort of verification too.
I suspect the main trade-off is structured data versus text parsing. While CLIs are composable, relying on stdout is brittle for anything complex. MCP enforces a schema (types), which acts as a contract between the model and your backend. If you're building reliable pipelines rather than just one-off scripts, that structure is pretty critical to avoid parsing errors downstream.
reply