I've lived most of my adult life in houses with forced air furnaces (albeit powered via natural gas, not propane), and what you are saying is inaccurate regarding indoor air pollution unless your furnace is in need of immediate replacement.
A modern furnace works via a heat exchanger, where the combustion produced pollutants never mix with the indoor air being pushed through. All pollutants are expelled outside via a property functioning chimney. This is one reason why you should have the furnace (and chimney function) inspected annually. Aging heat exchangers will show hotspots before there is a possibility of air being mixed, giving plenty of time to plan for a replacement. Of course there is a possibility of failure, which is why you should have a carbon monoxide detector.
I know I'm going to sound crazy here, but there is one more alternative. How about: Reduce, Reuse, Repair, Recycle?
I recently got a sewing machine for an unrelated project and around the same time I ordered it I had one of these cloth reusable bags rip, because I put too many heavy things in it. When I got the sewing machine, for practice I decided to see if I could fix the bag. It turned out to be surprisingly quick and easy. I didn't use any extra material besides the thread, and I believe the bag is much stronger now.
It's all about convenience, and the fact that we're trained from birth into being good little obedient consumers. Talk to your grandparents, back then they all had sewing machines, fixed their clothes, their shoes, things were expensive and cherished, now it's all cheap junk you have to consume as fast as possible before getting your next hit from amazon. Now that everything is cheap and abundant why would people bother ?
Whenever the solution involves needing other people to act together at an expense (time in this case), you run into problems. People(many) care up until it's not only words anymore.
One of the essential items to have in your house is a sewing / repair kit, for things like bags you don't even need a sewing machine, you can fix it by hand. Don't even need to know how to sow, just stick the needle / wire through a couple times until it's fixed.
Side note: people (men, mostly, still) who don't know how to use sewing machines are missing out on perhaps the most transformative, clever, empowering machine ever made. You could teach an entire curriculum just on the history, design, manufacturing and use of the sewing machine, and barely scratch the surface.
They are quite simply marvels. (Great Veritasium video about them too)
While most of what you speak of re Ceph is correct, I want to strongly disagree with your view of not filling up Ceph above 66%. It really depends on implementation details. If you have 10 nodes, yeah then maybe that's a good rule of thumb. But if you're running 100 or 1000 nodes, there's no reason to waste so much raw capacity.
With upmap and balancer it is very easy to run a Ceph cluster where every single node/disk is within 1-1.5% of the average raw utilization of the cluster. Yes, you need room for failures, but on a large cluster it doesn't require much.
80% is definitely achievable, 85% should be as well on larger clusters.
Also re scale, depending on how small we're talking of course, but I'd rather have a small Ceph cluster with 5-10 tiny nodes than a single Linux server with LVM if I care about uptime. It makes scheduled maintenances much easier, also a disk failure on a regular server means RAID group (or ZFS/btrfs?) rebuild. With Ceph, even at fairly modest scale you can have very fast recovery times.
Source, I've been running production workloads on Ceph at fortune-50 companies for more than a decade, and yes I'm biased towards Ceph.
I defer to your experience and agree that it really depends on implementation details (and design). I've only worked on a couple of Ceph clusters built by someone else who left, around 1-2PB, 100-150 OSDs, <25 hosts, and not all the same disks in them. They started falling over because some OSDs filled up, and I had to quickly learn about upmap and rebalancing. I don't remember how full they were, but numbers around 75-85% were involved so I'm getting nervous around 75% from my experiences. We suddenly commit 20TB of backup data and that's a 2% swing. It was a regular pain in the neck, stress point, and creaking, amateurishly managed, under-invested Ceph cluster problems caused several outages and some data corruption. Just having some more free space slack in it would have spared us.[1]
That whole situation is probably easier the bigger the cluster gets; any system with three "units" that has to tolerate one failing can only have 66% usable. With a hundred "units" then 99% are usable. Too much free space is only wasting money, too full is a service down disaster, for that reason I would prefer to err towards the side of too much free rather than too little.
Other than Ceph I've only worked on systems where one disk failure needs one hotspare disk to rebuild, anything else is handled by a separate backup and DR plan. With Ceph, depending on the design it might need free space to handle a host or rack failure, and that's pretty new to me and also leads me to prefer more free space rather than less. With a hundred "units" of storage grouped into 5 failure domains then only 80% is usable, again probably better with scale and experienced design.
If I had 10,000 nodes I'd rather 10,100 nodes and better sleep than playing "how close to full can I get this thing" and constantly on edge waiting for a problem which takes down a 10,000 node cluster and all the things that needed such a big cluster. I'm probably taking some advice from Reddit threads talking about 3-node Ceph/Proxmox setups which say 66% and YouTube videos talking about Ceph at CERN - in those I think their use case is a bursty massive dump of particle accelerator data to ingest, followed by a quieter period of read-heavy analysis and reporting, so they need to keep enough free space for large swings. My company's use case was more backup data churn, lower peaks, less tidal, quite predictable, and we did run much fuller than 66%. We're now down below 50% used as we migrate away, and they're much more stable.
[1] it didn't help that we had nobody familiar with Ceph once the builder had left, and these had been running a long time and partially upgraded through different versions, and had one-of-everything; some S3 storage, some CephFS, some RBDs with XFS to use block cloning, some N+1 pools, some Erasure Coding pools, some physical hardware and some virtual machines, some Docker containerised services but not all, multiple frontends hooked together by password based SSH, and no management will to invest or pay for support/consultants, some parts running over IPv6 and some over IPv4, none with DNS names, some front-ends with redundant multiple back end links, others with only one. A well-designed, well-planned, management-supported cluster with skilled admins can likely run with finer tolerances.
I only want to add a small suggestion. I get that large distributed production systems will occasionally go down, but it would be great if you could look into reducing the latency of your status page.
By my count there was at least a 35 minute delay between when things broke and before the status page (https://fastmailstatus.com) was updated.
Also, I think it would have been nice to have a bit more explanation on this event than simply "database issues" [1]. Being able to know that this was related to an upgrade would have made me feel a bit better during the time the status page was updated and until the issue was resolved.
Thank you for your hard work and an excellent email service!
Maybe they don’t realize how important you are. This is a failure in their VIP program, imo they should expedite that over a bug that effects all users.
Ha, cool. I imagine an an age of AI and LLMs, a lot of people are going to feel special at different points despite the cost of that going to nearly zero.
It started out as not being able to search, but the situation is quickly deteriorating and now I'm unable to open pretty much any email message.
Some content seems to briefly show up and then it quickly disappears and after that, it's as if cache has been invalidated and you can't get back into it.
Luckily for me the Drafts folder's showing, so I was able to send (well, I assume it's sent) the Single Most Important Email I need to send today, which I'd spent about half an hour getting right just as the interface imploded...
Edit: apparently "hybrid apps" using webviews are allowed as long as they're not "thin wrappers" for websites and provide meaningful functionality. See also: the capacitor framework.
I clicked "back" after looking at the page to point this out! If there were a visor with a modern GSM/LTE radio, the ability to let me tether a tablet when necessary, and enough horsepower to do email and SMS using graffiti input, I think I'd use it as a daily driver. The Visor was really great.
A modern furnace works via a heat exchanger, where the combustion produced pollutants never mix with the indoor air being pushed through. All pollutants are expelled outside via a property functioning chimney. This is one reason why you should have the furnace (and chimney function) inspected annually. Aging heat exchangers will show hotspots before there is a possibility of air being mixed, giving plenty of time to plan for a replacement. Of course there is a possibility of failure, which is why you should have a carbon monoxide detector.
reply