An experience not exclusive to cloud vendors :) Even better when the vendor thro...

pixl97 · 2025-11-14T21:41:49 1763156509

I've had customers with load related bugs for years simply because they'd reboot when the problem happened. When dealing with the F100 it seems there is a rather limited number of people in these organizations that can troubleshoot complex issues, that or they lock them away out of sight.

perching_aix · 2025-11-14T23:09:18 1763161758

It is a tough bargain to be fair, and it is seen in other places too. From developers copying out their stuff from their local git repo, recloning from remote, then pasting their stuff back, all the way to phone repair just meaning "here's a new device, we synced all your data across for you", it's fairly hard to argue with the economic factors and the effectiveness of this approach at play.

With all the enterprise solutions being distributed, loosely coupled, self-healing, redundant, and fault-tolerant, issues like this essentially just slot in perfectly. Compound this with man-hours (especially expert ones) being a lot harder to justify for any one particular bump in tail latency, and the equation is just really not there for all this.

What gets us specifically to look into things is either the issue being operationally gnarly (e.g. frequent, impacting, or both), or management being swayed enough by principled thinking (or at least pretending to be). I'd imagine it's the same elsewhere. The latter would mostly happen if fixing a given thing becomes an office political concern, or a corporate reputation one. You might wonder if those individual issues ever snowballed into a big one, but turns out human nature takes care of that just "sufficiently enough" before it would manifest "too severely". [0]

Otherwise, you're looking at fixing / RCA'ing / working around someone else's product defect on their behalf, and giving your engineers a "fun challenge". Fun doesn't pay the bills, and we rarely saw much in return from the vendor in exchange for our research. I'd love to entertain the idea that maybe behind closed doors the negotiations went a little better because of these, but for various reasons, I really doubt so in hindsight.

[0] as delightfully subjective as those get of course

hobs · 2025-11-15T00:44:51 1763167491

If I had a nickel for every time I had to explain that rebooting a database server is usually the wrong choice I would have quite a fortune.