Zinc Search engine. A lightweight alternative to Elasticsearch written in Go

xvilka · on Dec 4, 2021

There are also Toshi[1] and Sonic[2] in Rust. And Vector[3] as a Logstash alternative too. There is an issue[4] proposing to integrate Vector with Sonic and Toshi. Maybe Zinc can pursue this goal too. Always good to see people who realize that Java is unwieldy monster that will eat all your memory. Native is a way to go for big systems.

[1] https://github.com/toshi-search/Toshi

[2] https://github.com/valeriansaliou/sonic

[3] https://vector.dev/

[4] https://github.com/vectordotdev/vector/issues/988

pkulak · on Dec 4, 2021

Java will use as much memory as you give it. Because you told it to use that much, and using more generally gives better performance.

"Native" or not has nothing to do with the garbage collector. Go is a GCed language too, but I really don't think it's as performant or tuneable as Java's.

stiray · on Dec 4, 2021

Please check this, it explain difference between Java and Go garbage collectors quite nicely:

https://itnext.io/go-does-not-need-a-java-style-gc-ac99b8d26...

pkulak · on Dec 4, 2021

Yeah... I guess. But I could write the same article about how Java doesn't need value types and extreme escape analysis because it has a generational garbage collector. 99% of garbage really does end up in the eden generation and is literally free to clean up, no different than a stack.

Once you put aside all the words and see how things stack up in the real world, Go and Java are pretty comparable, usually with the JVM having a slight edge after the JIT is nice and warm. They go about it in different ways, but there's nothing wrong with that.

kaba0 · on Dec 4, 2021

It was criticized quite heavily on the relevant hn thread.

But basically, go can get away with a less advanced GC due to having and relying on value types quite often. But Java and certain workloads do require the state-of-the-art GCs that the JVM provides. And a search engine likely constitutes such a one.

tomohawk · on Dec 4, 2021

And then some.

Elasticsearch restarting due to out of memory errors was a common occurrence at my last gig where we used it heavily.

The difference between Java GC and Go GC is stark. In several years, I've never had to touch a GC knob on Go, but in Java it is expected, and not usually worth the effort.

kaba0 · on Dec 4, 2021

It is not expected at all in like a decade (and even previously, it was mostly blog articles for very specific workloads, being out of date soon after publication).

The default G1 GC is perfectly fine for the vast majority of workloads as is. In the rare case it is needed, the target pause time setting can be used to swift our preference of latency vs throughput.

pjmlp · on Dec 4, 2021

Java is also native, when you make use of an AOT compiler, which have existed in various forms since 2000.

In that regard, a Java runtime compiled in AOT form is hardly any different from a Go runtime.

As for the lack of value types, the solution is on the way, and apparently many keep forgetting that writing a JNI based library is much better than throw everything way and start from scratch in random language X.

wildoats · on Dec 4, 2021

Java isn't the worst choice. It's mostly safe, reasonable to work with, and quite fast.

RAM isn't a bottleneck on most servers these days. Even the cheapest EC2 tends to have 2GB which will run nearly any Java app fine.

Sure, Rust is better. But it will take years to replicate what's already out there in Java.

I'll take a database written in Java over C any day given equivalent choices.

rapsey · on Dec 4, 2021

> Sure, Rust is better. But it will take years to replicate what's already out there in Java.

The Rust based search engine Tantivy is handily faster than Lucene and supports pretty much the same features.

pjmlp · on Dec 4, 2021

So it also replicates Zookeeper capabilities and is available in SaaS versions like SearchStax?

rapsey · on Dec 4, 2021

Zookeper what does that have to do with search?

pjmlp · on Dec 4, 2021

A lot, that is how Solr and Lucene clusters are managed as standard solution.

rapsey · on Dec 4, 2021

Well ok but that is an external dependency irrelevant of language. Might as well use etcd also.

pjmlp · on Dec 4, 2021

Nope, because those dependencies are also written in Java, are always deployed together, thus a shop using Lucene and Solr is skilled in Zookeeper.

Where is a SearchStax SaaS like service using etcd?

mike_d · on Dec 4, 2021

> RAM isn't a bottleneck on most servers these days.

It isn't hard to scale, but it is the most expensive thing to scale.

wildoats · on Dec 4, 2021

But you don't usually need to scale it. We run 1 million+ LoC Java monoliths on machines with 2GB.

CodeGlitch · on Dec 4, 2021

> I'll take a database written in Java over C any day given equivalent choices.

I know it's not equivalent, but Redis is written in C and it's amazing what you can do with it. It is also incredibly efficient and what it does.

I don't get the hate towards C. Is it the security issues? (These exist in Java too you know).

pjmlp · on Dec 4, 2021

They do indeed, except in Java anything related to memory corruption, type confusion, integer overflow and implicit conversions are out of the picture.

wildoats · on Dec 4, 2021

It's far easier to write an app safe from memory corruption in Java than C.

Java is as safe as Rust. It's just 1/2 the speed and uses 2-3X the ram. It was the best C alternative for DB until a couple years ago.

axegon_ · on Dec 4, 2021

Melisearch is another really good one. To be honest, I'm also working on something kind of related in my spare time but by the looks of it, I'm another 4-5 months away from having a fully operational alpha version to open source. Though it's a slightly different concept - kind of a mixture between document oriented databases(think couchbase, mongo) and elasticsearch. Currently trying to get a half-baked working version that I want to use for another project with a friend of mine and use that in order to test some of the concepts at a larger scale. I opted for rust for a million and one reasons(performance, not dealing with garbage collectors, pleasant to work with) but if we set that aside, I'm with you when it comes to java: even without knowing what it's written in, you can immediately tell that elasticsearch is a product of java. It's incredibly clunky no matter how much resources you give it or now powerful your hardware is. My personal workstation is a dual-14 core xeon with 64gb of ram and even on it it feels like trying to run a demanding game on an old Acer eee with an intel atom CPU. 4 frames per minute if you are lucky. Java is a monster, I completely agree and it's made worse by the convoluted paradigms, syntax and the Frankenstein's monster that is the jvm. Good thing Kotlin came about - at least the syntax is nice.

bootcat · on Dec 5, 2021

Yeah as a developer/programmer - I always have this feeling that I should go into all native code like using c++, d, f, rust, crystal, nim, v, zig ( this is my choice as of now ) - just to get that last drop of hardware juice. I was also looking into luajit as it can have easy ffi and near native performance.

I still think Java has a lot of value to bring or to learn and use for any programmer.

The ecosystem of libraries, software, tooling, cross platform nature and with the advent of GraalVM - Java is still very alive and capable !

As for JVM guys - Kotlin seems be an easy migration.

BTW - what do you think about dart ?

fithisux · on Dec 4, 2021

You can always write it in Java/Kotlin and make it native with GraalVM.

You do not need Rust necessarily. Mixed feelings yet. I am not big fan. Golang is good too

francoismassot · on Dec 4, 2021

Another alternative in Rust is Quickwit[1]. Only search is currently distributed but indexing distribution will soon come up.

Disclaimer: I'm a cofounder.

[1] https://github.com/quickwit-inc/quickwit

pjz · on Dec 3, 2021

>(Kibana is not supported with zinc. Zinc provides its own UI).

1. I was hoping for a drop-in replacement for Elasticsearch. A new/different API means Zinc can't leverage existing tools that use Elasticsearch.

2. I don't like that you're bundling zinc with a UI; that disadvantages anyone else trying to build a better UI and often (usually?) leads to tying the db too closely to the UI (or vice versa)

jjeaff · on Dec 3, 2021

If the API really is compatible, then Kibana should work. Just sounds like they don't guarantee it will always work.

yed · on Dec 3, 2021

According to the site:

> Compatibility with elasticsearch APIs for ingestion of data (single record and bulk API)

rzzzt · on Dec 3, 2021

Does it support boolean queries? Bluge, the library backing Zinc appears to have such a searcher, and I can see a few references in Zinc's code, but is it exposed for search expressions?

bootcat · on Dec 3, 2021

I am staunch supporter of software which runs on minimal hardware or resources. Seems like an interesting project, looking forward towards distributed features !

ochoseis · on Dec 4, 2021

I wonder if this would be a good lightweight alternative to ES for a local development setup. At $work we use ES for deployed environments, but have thus far avoided running it locally because of the resource requirements.

jillesvangurp · on Dec 4, 2021

I have Elasticsearch running in docker most of the time for local development. It's fine. You can run it with as little as half a GB of ram. It won't be using much CPU unless you start throwing millions of documents at it for indexing. Our production cluster uses 2 1GB vms. The whole setup costs us about 60$/month. We have a about 7 million documents indexed in there.

People here seem to assume Elasticsearch requires lots of memory and CPU. It actually scales down very nicely in addition to its famous ability to scale up very well. Any decent developer laptop should have no issues running this. Running VS Code is more of a burden on my laptop. And intellij is even worse.

keyle · on Dec 3, 2021

Ha.

Lovely, but the missing features is basically what makes Elasticsearch great!

    Missing features: 
    Clustering and High Availability

Nothing that can't be fixed though.

bootcat · on Dec 3, 2021

yes i agree - its comparatively easy to create software that is optimal as long as it runs in a single machine.

The distributed systems and related constraints and guarantees is what takes the resources !

( I would assume he would use a system based on raft to offer the above, or using something like Infinispan, helix or hazelcast kinda )

cyberge99 · on Dec 4, 2021

I wonder if a Nomad/Consul stack could make this clusterable.

Andys · on Dec 4, 2021

I would embed nats, a golang message queue that offers clustering and persistence, a great bulding block for clustered support.

ordiel · on Dec 3, 2021

When people say "Lightweight alternative" most of the time they are implying less features also, they are not lightweight just because

jillesvangurp · on Dec 4, 2021

I would say a fraction of the features. If you don't need those features, that's fine of course. But in my experience, I end up using a lot of those features when dealing with real world customer requirements. I guess if you implement search for some website where search quality just doesn't matter, light weight is fine. For everything else, using proper solutions might be the wiser thing.

In terms of performance you see a lot of comments that are stating X is faster than Y. I'd take such comments with a large grain of salt. Unless those lightweight alternatives actually do the same things you can't really compare the performance. It's not that hard to make indexing fast in Elasticsearch if you disable all the features that make it slower. Of course if you don't actually have those features it's going to be fast by default because it simply isn't doing anywhere close to the same things.

Elasticsearch is actually pretty damn fast even when you do use its many features. The reason is that it relies on in memory caches, thread pools, etc. and that a lot of very smart people have been implementing and optimizing very efficient algorithms in Lucene for the last 25 years. Elasticsearch actually can run with as little as 256MB but of course you are not going to be able to cache a lot of data with that and performance will suffer accordingly. Mostly large heap sizes with Elasticsearch are all about using larger caches. It also relies on memory mapped files and OS file caches for that. That's what allows it to work with extremely large data sets and still provide query responses in milliseconds.

There's no reason you wouldn't be able to do the same with a native implementation of course. But it would be naive to assume you'd end up using a lot less memory or CPU. Using that would be pretty core to matching performance and features. Not using that would make it pretty hard to get even close.

vosper · on Dec 3, 2021

Yeah. There's a lot that Elasticsearch can do. There are libraries and applications that can do bits of what ES can do, but no-one I'm aware of (and I watch this space) is even slightly close to building a genuine replacement (I'm not counting the AWS fork, for obvious reasons)

smoe · on Dec 3, 2021

How much sense does it make for another product to try to be even slightly close to elasticsearch?

My experience with ES has been somewhat mixed, exactly because them trying to do everything at the same time and a lot of the specific things we needed being under documented and involving much more try and error than should be necessary.

I for one welcome more focused products, when the need again arises for something in this space.

taf2 · on Dec 4, 2021

I see search engine space similar to the relational database space… more features is really a good thing it helps you answer more questions about your data … just my 2 cents

jjeaff · on Dec 3, 2021

Well, I suppose anything could be a genuine replacement depending on how many of the features of elasticsearch the user actually needs/uses.

amelius · on Dec 3, 2021

However ... unfortunately, you can't just drop features from ES (or most software really) to make it lightweight.

wildoats · on Dec 4, 2021

Java already does this AFAIK. If you use a GC that supports class unloading (all the new ones do).

Unused classes can be unloaded from RAM and it's like they don't exist. The executable is still huge but lower memory and CPU cache footprint.

PixyMisa · on Dec 4, 2021

For a lot of use cases, you can.

bpicolo · on Dec 3, 2021

Where would you say Solr falls short? In terms of search - it's definitely the case that ES has expanded beyond that (metrics, apm...)

arafalov · on Dec 3, 2021

I feel ES has been trying really hard to walk away from the features Solr is good at. While ES still supports multiple languages and custom tokenization chains and even custom pre-processing chains (somewhat equivalent to Solr's UpdateRequestProcessors), I felt that they were very deeply buried in the configuration, when I look at ES a year ago.

ES is truly focusing on metrics and things and does have some features to make those use cases easier that Solr would probably need a lot of configuration/customization for.

So, Solr is about search. ES is about a specific set of use cases that rely very heavily on search.

And Lucidworks Fusion (commercial alternative to ES) is about big data and ML and full multi-tool pipeline on top of Solr.

nelsondev · on Dec 4, 2021

Elastic makes the bulk of their money from log search. Developer productivity tools like Splunk as main competitor.

So that’s where they’ve invested a lot in tooling and visualization.

znpy · on Dec 3, 2021

ES wins hands down in search in my experience because the companies that were doing search stuff "before it was cool" were mostly running stuff like Autonomy DRE which work well but are expensive and proprietary.

There is a huge market just by going after that kind of customers.

tingletech · on Dec 4, 2021

They both use Doug Cutting’s lucene under the hood

pjmlp · on Dec 4, 2021

Except every person has a different set of 10% features.

pageandrew · on Dec 4, 2021

but its written in Go

maxpert · on Dec 4, 2021

Would be nice to have some benchmarks against ES, Sonic, and Toshi. I am very much interested in stress-testing on large dataset.

aliswe · on Dec 4, 2021

I wonder why the choice of technology is even relevant? Because of the hype factor?

kgersen · on Dec 4, 2021

A Go project is easy to contribute to. That's a huge factor for an open source project.

PixyMisa · on Dec 4, 2021

It matters for performance and reliability. Go is fine, and means you get a nice standalone binary. Java is fine, but you need Java installed.

If it were written in Node.js or PHP, on the other hand, you'd know to stay ten miles away.

nw05678 · on Dec 3, 2021

I've always wanted a full text search engine akin to "sqlite".

jasonjayr · on Dec 3, 2021

You are aware sqlite includes a full text search engine too?

https://sqlite.org/fts3.html https://sqlite.org/fts5.html

simonw · on Dec 4, 2021

I've been having a lot of fun with the SQLite FTS library. The best public-facing demo I have is probably the search feature on https://datasette.io - e.g. https://datasette.io/-/beta?q=fts- that's powered by my https://github.com/dogsheep/dogsheep-beta tool.

slig · on Dec 4, 2021

Typesense and Melisearch are what you're looking for.

fizx · on Dec 3, 2021

sqlite is itself a full text search engine. It's gotten pretty good over the years.

https://www.sqlite.org/fts5.html

nwellnhof · on Dec 4, 2021

That's what Apache Lucy tried to become, but it's an abandoned project.

PixyMisa · on Dec 4, 2021

You could try Xapian. Or there's always SQLite itself.

deknos · on Dec 4, 2021

we have alternatives to elasticsearch as listed here. what we do not have is a alternative to kibana. which works with elasticsearch/opensearch and the elasticsearchalternatives here.

arafalov · on Dec 3, 2021

They had me at: "The only viable solution to search was elasticsearch".

/s