There are also Toshi[1] and Sonic[2] in Rust. And Vector[3] as a Logstash alternative too. There is an issue[4] proposing to integrate Vector with Sonic and Toshi. Maybe Zinc can pursue this goal too. Always good to see people who realize that Java is unwieldy monster that will eat all your memory. Native is a way to go for big systems.
Java will use as much memory as you give it. Because you told it to use that much, and using more generally gives better performance.
"Native" or not has nothing to do with the garbage collector. Go is a GCed language too, but I really don't think it's as performant or tuneable as Java's.
Yeah... I guess. But I could write the same article about how Java doesn't need value types and extreme escape analysis because it has a generational garbage collector. 99% of garbage really does end up in the eden generation and is literally free to clean up, no different than a stack.
Once you put aside all the words and see how things stack up in the real world, Go and Java are pretty comparable, usually with the JVM having a slight edge after the JIT is nice and warm. They go about it in different ways, but there's nothing wrong with that.
It was criticized quite heavily on the relevant hn thread.
But basically, go can get away with a less advanced GC due to having and relying on value types quite often. But Java and certain workloads do require the state-of-the-art GCs that the JVM provides. And a search engine likely constitutes such a one.
Elasticsearch restarting due to out of memory errors was a common occurrence at my last gig where we used it heavily.
The difference between Java GC and Go GC is stark. In several years, I've never had to touch a GC knob on Go, but in Java it is expected, and not usually worth the effort.
It is not expected at all in like a decade (and even previously, it was mostly blog articles for very specific workloads, being out of date soon after publication).
The default G1 GC is perfectly fine for the vast majority of workloads as is. In the rare case it is needed, the target pause time setting can be used to swift our preference of latency vs throughput.
Java is also native, when you make use of an AOT compiler, which have existed in various forms since 2000.
In that regard, a Java runtime compiled in AOT form is hardly any different from a Go runtime.
As for the lack of value types, the solution is on the way, and apparently many keep forgetting that writing a JNI based library is much better than throw everything way and start from scratch in random language X.
They do indeed, except in Java anything related to memory corruption, type confusion, integer overflow and implicit conversions are out of the picture.
Melisearch is another really good one. To be honest, I'm also working on something kind of related in my spare time but by the looks of it, I'm another 4-5 months away from having a fully operational alpha version to open source. Though it's a slightly different concept - kind of a mixture between document oriented databases(think couchbase, mongo) and elasticsearch. Currently trying to get a half-baked working version that I want to use for another project with a friend of mine and use that in order to test some of the concepts at a larger scale. I opted for rust for a million and one reasons(performance, not dealing with garbage collectors, pleasant to work with) but if we set that aside, I'm with you when it comes to java: even without knowing what it's written in, you can immediately tell that elasticsearch is a product of java. It's incredibly clunky no matter how much resources you give it or now powerful your hardware is. My personal workstation is a dual-14 core xeon with 64gb of ram and even on it it feels like trying to run a demanding game on an old Acer eee with an intel atom CPU. 4 frames per minute if you are lucky. Java is a monster, I completely agree and it's made worse by the convoluted paradigms, syntax and the Frankenstein's monster that is the jvm. Good thing Kotlin came about - at least the syntax is nice.
Yeah as a developer/programmer - I always have this feeling that I should go into all native code like using c++, d, f, rust, crystal, nim, v, zig ( this is my choice as of now ) - just to get that last drop of hardware juice. I was also looking into luajit as it can have easy ffi and near native performance.
I still think Java has a lot of value to bring or to learn and use for any programmer.
The ecosystem of libraries, software, tooling, cross platform nature and with the advent of GraalVM - Java is still very alive and capable !
As for JVM guys - Kotlin seems be an easy migration.
>(Kibana is not supported with zinc. Zinc provides its own UI).
1. I was hoping for a drop-in replacement for Elasticsearch. A new/different API means Zinc can't leverage existing tools that use Elasticsearch.
2. I don't like that you're bundling zinc with a UI; that disadvantages anyone else trying to build a better UI and often (usually?) leads to tying the db too closely to the UI (or vice versa)
Does it support boolean queries? Bluge, the library backing Zinc appears to have such a searcher, and I can see a few references in Zinc's code, but is it exposed for search expressions?
I am staunch supporter of software which runs on minimal hardware or resources. Seems like an interesting project, looking forward towards distributed features !
I wonder if this would be a good lightweight alternative to ES for a local development setup. At $work we use ES for deployed environments, but have thus far avoided running it locally because of the resource requirements.
I have Elasticsearch running in docker most of the time for local development. It's fine. You can run it with as little as half a GB of ram. It won't be using much CPU unless you start throwing millions of documents at it for indexing. Our production cluster uses 2 1GB vms. The whole setup costs us about 60$/month. We have a about 7 million documents indexed in there.
People here seem to assume Elasticsearch requires lots of memory and CPU. It actually scales down very nicely in addition to its famous ability to scale up very well. Any decent developer laptop should have no issues running this. Running VS Code is more of a burden on my laptop. And intellij is even worse.
I would say a fraction of the features. If you don't need those features, that's fine of course. But in my experience, I end up using a lot of those features when dealing with real world customer requirements. I guess if you implement search for some website where search quality just doesn't matter, light weight is fine. For everything else, using proper solutions might be the wiser thing.
In terms of performance you see a lot of comments that are stating X is faster than Y. I'd take such comments with a large grain of salt. Unless those lightweight alternatives actually do the same things you can't really compare the performance. It's not that hard to make indexing fast in Elasticsearch if you disable all the features that make it slower. Of course if you don't actually have those features it's going to be fast by default because it simply isn't doing anywhere close to the same things.
Elasticsearch is actually pretty damn fast even when you do use its many features. The reason is that it relies on in memory caches, thread pools, etc. and that a lot of very smart people have been implementing and optimizing very efficient algorithms in Lucene for the last 25 years. Elasticsearch actually can run with as little as 256MB but of course you are not going to be able to cache a lot of data with that and performance will suffer accordingly. Mostly large heap sizes with Elasticsearch are all about using larger caches. It also relies on memory mapped files and OS file caches for that. That's what allows it to work with extremely large data sets and still provide query responses in milliseconds.
There's no reason you wouldn't be able to do the same with a native implementation of course. But it would be naive to assume you'd end up using a lot less memory or CPU. Using that would be pretty core to matching performance and features. Not using that would make it pretty hard to get even close.
Yeah. There's a lot that Elasticsearch can do. There are libraries and applications that can do bits of what ES can do, but no-one I'm aware of (and I watch this space) is even slightly close to building a genuine replacement (I'm not counting the AWS fork, for obvious reasons)
How much sense does it make for another product to try to be even slightly close to elasticsearch?
My experience with ES has been somewhat mixed, exactly because them trying to do everything at the same time and a lot of the specific things we needed being under documented and involving much more try and error than should be necessary.
I for one welcome more focused products, when the need again arises for something in this space.
I see search engine space similar to the relational database space… more features is really a good thing it helps you answer more questions about your data … just my 2 cents
I feel ES has been trying really hard to walk away from the features Solr is good at. While ES still supports multiple languages and custom tokenization chains and even custom pre-processing chains (somewhat equivalent to Solr's UpdateRequestProcessors), I felt that they were very deeply buried in the configuration, when I look at ES a year ago.
ES is truly focusing on metrics and things and does have some features to make those use cases easier that Solr would probably need a lot of configuration/customization for.
So, Solr is about search. ES is about a specific set of use cases that rely very heavily on search.
And Lucidworks Fusion (commercial alternative to ES) is about big data and ML and full multi-tool pipeline on top of Solr.
ES wins hands down in search in my experience because the companies that were doing search stuff "before it was cool" were mostly running stuff like Autonomy DRE which work well but are expensive and proprietary.
There is a huge market just by going after that kind of customers.
we have alternatives to elasticsearch as listed here. what we do not have is a alternative to kibana. which works with elasticsearch/opensearch and the elasticsearchalternatives here.
[1] https://github.com/toshi-search/Toshi
[2] https://github.com/valeriansaliou/sonic
[3] https://vector.dev/
[4] https://github.com/vectordotdev/vector/issues/988