Yeah that's where I started out in 2021. Been at it for almost 5 years now, last three of which full time. I'm indexing about 1.1 billion documents now off a single server.
Hard part is doing it at any sort of scale and producing useful results. It's easy to build something that indexes a few million documents. Pushing into billions is a bigger challenge, as you start needing a lot of increasingly intricate bespoke solutions.
(... though it operates a bit sub-optimally now as I'm using a ton of CPU cores to migrate the index to use postings lists compression, will take about 4-5 days I think).
I'm mostly living off grants and donations at this point, but the plan down the line is to polish it up well enough to make some money off providing an API like the one Google is making it a hassle to access with this change :-)
Might find YaCy interesting. It’s meant to be a decentralised search engine where users scrape the internet and can search other users indexes in a kind of torrent like way.
I found it didn’t really work as a real search engine but it was interesting.
Well you'll get blocked some places but it's not too big of a deal. If you're running an above board operation, you can surprisingly often successfully just email the admin explaining what you're doing, and ask to be unblocked.