Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah that's where I started out in 2021. Been at it for almost 5 years now, last three of which full time. I'm indexing about 1.1 billion documents now off a single server.

Hard part is doing it at any sort of scale and producing useful results. It's easy to build something that indexes a few million documents. Pushing into billions is a bigger challenge, as you start needing a lot of increasingly intricate bespoke solutions.

Devlog here:

https://www.marginalia.nu/tags/search-engine/

And search engine itself:

https://marginalia-search.com/

(... though it operates a bit sub-optimally now as I'm using a ton of CPU cores to migrate the index to use postings lists compression, will take about 4-5 days I think).





Curious on what (how much) hardware your running this.

Currently running off

AMD EPYC 7543 x2 for 64 cores/128 threads

512 GB RAM

~ 90 TB of PM9A3 SSDs across 12 physical devices

Storage is not very full though. I'm probably using about a third of it at this point.


I assume you able to monetize it since you work on it full time?

I'm mostly living off grants and donations at this point, but the plan down the line is to polish it up well enough to make some money off providing an API like the one Google is making it a hassle to access with this change :-)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: