I have experimented with instructing CC to doubt itself greatly and presume it i...

I have experimented with instructing CC to doubt itself greatly and presume it is not validating anything properly.

It caused it to throw out good ideas for validation and working code.

I want to believe there is some sweet spot.

The constant “Aha!” type responses followed by self validating prose that the answer is at hand or within reach can be intoxicating and can not be trusted.

The product is also seemingly in constant flux of tuning, where some sessions result in great progress, others the AI seems as if it is deliberately trying to steer you into traffic.

Anthropic is alluded toward this being the result of load. They mentioned in their memo about new limits for Max users that abuse of the subscription levels resulted in ~subpar product experiences. It’s possible they meant response times and the overloaded 500 responses or lower than normal TPS, but there are many anecdotal accounts of CC suddenly having a bad day from “longtime” users, including myself.

I don’t understand how load would impact the actual model’s performance.

It seems like only load based impacts on individual session context would result in degraded outputs. But I know nothing of serving LLM at scale.

Can anyone explain how high load might result in an unchanged product performing objectively worse?