That's better than nothing, but still not great. We don't know the structure of ...

cs02rm0 · on May 31, 2019

> We also don't know how big the dataset is or how often it changes.

I'm going to take a stab at small and infrequently.

Every 2-3 months we had to execute a python script that takes 1s on all our data (500k rows), to make it faster we execute it in parallel on multiple droplets ~10 that we set up only for this pipeline and shut down once it’s done.

thaumaturgy · on May 31, 2019

Yeah, probably. But we shouldn't be calling these guys out for not taking the "obvious and simple" solution when we aren't 100% certain that it would actually work. That happens too often on HN, and then sometimes the people involved pop in to explain why it's not so simple, and everyone goes "...oh." Seems like we should learn something from that. I've gone with "don't assume it's as simple as your ego would lead you to believe."

_bxg1 · on May 31, 2019

I suggested that solution because everyone is saying "they're only a two-man shop so they don't have the time and money to do things properly". Anyone has the time and money to do the above, and there's a 90% chance that it would save them in a situation like this.

Even if they lost some data, even if the backup silently failed and hadn't been running for two months, it's the difference between a large inconvenience and literally your whole business disappearing.