You've got an additional problem though, which is that this tells us you have tw...

dkersten · on June 1, 2019

As a DO user who was planning on ramping up usage in the coming weeks and months, this is what scares me and what is making me seriously reconsider.

xvector · on June 1, 2019

Do not use DO. The very fact that their default response to suspected spam is to cause prod downtime is so bizarre and unacceptable that it does not make any sense whatsoever for a business to rely on them.

dkersten · on June 1, 2019

Thanks, I’ll stick with AWS then.

sneak · on June 1, 2019

https://github.com/fog/fog/issues/2525

https://news.ycombinator.com/item?id=6983097

Running anything business or privacy critical on DO is madness.

nathan-io · on June 3, 2019

Indeed, this was bad. I assume they were trying to extend SSD lifetime by reducing writes.

It's fair to note that scrubbing is now the default behavior when a droplet is destroyed, so they did listen to the feedback.

https://ideas.digitalocean.com/ideas/DO-I-1947

sneak · on June 4, 2019

The SSD thing is a red herring.

You do not need to scrub or write anything to not provide user A’s data to user B in a multi-tenant environment. Sparse allocation can easily return nulls to a reader even while the underlying block storage still contains the old data.

They were just incompetent.

On top of all of that, when I pointed out that what they were doing was absolute amateur hour clownshoes, they oscillated between telling me it was a design decision working as intended (and that it was fine for me to publicize it), and that I was an irresponsible discloser by sharing a vulnerability.

Then they made a blog post lying about how they hadn’t leaked data when they had.

Nope.

buzzerbetrayed · on June 1, 2019

As someone who has been blown off by DO support, you hit the nail on the head.

mekane8 · on June 3, 2019

So well written. This is exactly what's so scary about this whole thing.

iamaelephant · on June 1, 2019

I think it says a lot that this CTO joker flew in, regurgitated the standard-issue "we will endeavor to do better" apology and left without answering any of the very legitimate follow-up questions. I would never deal with an organisation that behaves like these guys.

thanatos_dem · on June 1, 2019

Is there any response that would satisfy you?

pdimitar · on June 1, 2019

Yes.

"This will not happen again, ever".

People's livelihoods are at stake in DO's hosting. Canned responses and brutal account lockouts should have NEVER been on the table to begin with.

thanatos_dem · on June 1, 2019

That’d be unrealistic for any company to claim, and if any company I worked with did claim that I would run for the hills.

That’s akin to saying “we’ll never ship a bug”, or “we have an SLO of 100%”. That’s impossible for anyone to claim. Same goes for the response handling. There is clearly a lot of room for improvement there, but if you’re insisting on not getting canned response, that means a human needs to be involved at some point. Humans will at times be slow to respond. Humans will at times make mistakes. This is just an unavoidable reality.

I get that mob mentality is strong when shit hits the fan publicly, but have a bit of empathy and think about what reasonable solutions you may come up with if you were to be in their situation, rather than asking for a “magic bullet”.

I could see a good response here being an overhaul of their incident response policy, especially in terms of L1 support. Probably by beefing up the L2 staffing, and escalating issues more often and more quickly. L2 support is generally product engineers rather than dedicated support staff/contractors, so it’s more expensive to do for sure, but having engineers closer to the “front line” in responding to issues closes the loop better for integrating fixes into the product, and identifying erroneous behavior more quickly.

pdimitar · on June 1, 2019

Sure, me and a lot of others react rather strongly in these situations. I agree with that but you already seem to understand the reasons.

However, can you say with a straight face that the very generic message left here by DO's CTO instills confidence in you about how will they handle such situations in the future?

Techies hate lawyer/corporate weasel talk. Least that person could do was do their best to speak plainly without promising the sky and the moon.

thanatos_dem · on June 1, 2019

I would prefer a generic message and a promise for follow up once all the facts are known over a rushed response that may be incorrect.

I’m an engineering manager in an infrastructure team (not at all affiliated with Digital Ocean, tho full disclosure, I do have one droplet for my personal website). I know how postmortems generally work, and it’s messy enough to track down root cause even when it’s not some complex algorithm like fraud detection going off the rails.

I’d rather get slow information than misinformation, but I understand the frustration in not being able to see the inner working of how an incident is being handled.

pdimitar · on June 1, 2019

I applaud people like you. Seriously.

And I agree with your premise. However, my practice has shown that postmortems are watered-down evasive PR talk, many times.

If you look at this through the eyes of a potential startup CTO, wouldn't you be worried about the lack of transparency?

And finally, why is such an abrupt account lockdown even on the table, at all? You can't claim you are doing your best when it's very obvious that you are just leaving your customers at the mercy of very crude algorithms -- and those, let's be clear on that, could have been created without ever locking down an account without a human approval at the final step.

What I'm saying is that even at this early stage when we know almost nothing, it's evident that this CTO here is not being sincere. It seems DO just wants to maximally automate away support, to their customers' detriment.

Whatever the postmortem ends up being it still won't change the above.

Moru · on June 3, 2019

Our line so far has been to change provider of service if we start getting copy - paste answers from support. We always make sure we can get hold of a human on the phone even without a big uptime contract. This has so far lead us to small companies that are not overrun by free accounts used as spam or SEO accounts. That means they have no need for automatic shutdown of accounts and instead you get a phonecall if something goes wrong.

pdimitar · on June 3, 2019

This is how I would go about it as well. But I imagine that's a big expense for non-small companies, and not only through money but through the time of valuable professionals that could have spend the time improving the bottom line.

I too value less known providers. The human factor in support is priceless.

geezerjay · on June 3, 2019

> Is there any response that would satisfy you?

Do you believe that a PR response made in damage control mode that actually changes nothing is something that's satisfactory?

I mean, apparently this screwup was so damaging that it killed a company. What part of the PR statement addresses that precendent?

perlgeek · on June 1, 2019

How about "we will reimburse the company for any damages if we found it was our fault"?

isoprophlex · on June 1, 2019

It's been 7 hours, some follow up answers would be nice...

thanatos_dem · on June 1, 2019

7 hours, on a Friday night in the headquarters time zone. This issue is resolved and is clearly not wide spread, so does getting a response on Monday or Tuesday vs right now make any difference?

Companies are made of people. Let the people have a life. Their night is shitty enough as is after this, I guarantee you.

rat9988 · on June 1, 2019

The thing is, my business don't want to deal with people. It wants to deal with a business made of multiple people to guarantee service availability. If he cannot answer, surely someone else in DigitalOcean can?

skywhopper · on June 1, 2019

You are being unreasonable here. He promised a postmortem. I’d much rather wait a few days to get a clearly written, comprehensive analysis of the problems than to get an immediate stream of confusing and contradictory raw data.

If you have ever been involved in post facto analysis of a process breakdown like this you know how hard it is to get the full picture immediately. Rushing something out does no one any favors.