Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Horizontal scaling

I think the importance of horizontal scaling is overhyped. 99% of PostgreSQL applications are at a size where a single machine can easily handle the workload.

To enable horizontal scaling, you need to make so many tradeoffs that I don't think it's worth it for most applications.



I see people responding to this thread with examples of high-volume workflows that use a small number of machines for the database layer.

I think those examples are missing the point. One of the examples uses a Redis cluster, which is a sign that the DB alone can't support the workflow.

Another example is using a database for certificate issuance. Which I suspect is using one table or multiple that can be shard and it doesn't suffer from lock contention or joints of multiple tables.

I wouldn't be brave enough to count the number of cases where horizontal scaling is needed but I would say it is definitely not zero, especially for read replicas when high availability is needed


I'd be willing to believe 99% of applications, yes - for the same reason that Wordpress can claim to drive some ridiculous percentage of websites - but since the larger use cases tend to come from larger organizations, 99% of developers don't work on systems where one instance suffices. There's a reasonably large number of companies for which that's not true, and there's not very many good "next step" options when you're too big for vertical scaling but too small to fund building your own engine.


where did you get the 99% metric ? Most of the companies even with a single saas product have insane amount of data these days.

Not just application data, There is also a whole lot of analytical data collected at every step of the product usage cycle.


> where did you get the 99% metric ? Most of the companies even with a single saas product have insane amount of data these days.

I will just leave this link here ....

https://letsencrypt.org/2021/01/21/next-gen-database-servers...


And this one -- StackOverflow, one single live db:

https://stackexchange.com/performance


From managing databases over the last two decades (starting with Ingres/mSQL), hardware has growing much faster (RAM/CPU/NVMe) than the data needs. I remember the first time we put 2Tb RAM into a machine to handle most analytics in memory.

From my experience TimescaleDB is fast and takes a lot of data in. If you're multi tenant, it's usually easy to shard.

And we do dataware housing on BigQuery, no need to have your own machine and manage the database.

Of course there are people who need unlimited horizontal scaling.


TimescaleDB also offers a good lot of supplementary functions to the PostgeSQL core product to help with time series data analysis... saves a lot of SQL acrobatics!


Generally you don't have your main application database also serve as your data warehouse for lots of reasons.

Generally you offload that to purpose built systems that favor those aforementioned tradeoffs or onto services that run them for you, i.e BigQuery, Snowflake, etc.

Your main application database is unlikely to need sharding unless you really do have a phenomenal amount of customers or you need regional sharding to meet legal requirements about data sovereignty for example.


If you do sas, you are already the 1 %.


You are right, the metric should probably be 99.999999999999999%. 99% is way to conservative.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: