On the one hand, I've seen people (including myself) try to hack job-queue like semantics onto Kafka many a time, and it always hits issues once redelivery or backoff comes up. So it's nice to see them considering making this a first-class citizen of Kafka.
On the other hand, Kafka isn't the only player in the queue game nowadays. If you need message queue and job queue semantics combined (which you likely do), just use Pulsar.
I think the most likely use case, the one making me happy they're working on this, is reducing infra spend and having a separate tool/guarantees/storage for queues and for whatever kafka is more made for.
I'm just hoping librdkafka gets good too-tier support for this feature in a timely manner.
RabbitMQ has implemented Streams and "Super Streams":
> Super streams are a way to scale out by partitioning a large stream into smaller streams. They integrate with single active consumer to preserve message order within a partition. Super streams are available starting with RabbitMQ 3.11.
One way to think about it is who does the routing. My understanding of rabbitmq from 10 years ago was that RMQ pushes to connected consumers and to only 1 at a time? You'd need a fanout consumer that does more of the work. And lower throughput overall.
With Kafka it "just" keeps appending to a dumb (but huge) circular buffer. But you can have multiple consumers read off this buffer and can start any point. Downside is customers have to maintain their own offsets (in some storage) but there is now a big decoupling between producer and consumer. This contributes a large part to high throughput too (and consumers can go at their own pace -ofcourse if they are too slow they can fall off the log).
> Downside is customers have to maintain their own offsets (in some storage)
Minor correction: you can maintain the offsets yourself if you want, but usually it's not necessary because Kafka can do it for you.
The abstraction Kafka provides is that for each consumer group and for each (topic, partition) tuple, your consumer object that is guaranteed not to receive messages before the last offset at which you called commit(). Internally, the committed offsets are stored in a special Kafka topic of their own.
Ah is this offset maintained by the Kafka client? (I included that as "client" as well). I thought the Kafka topic itself did not maintian any client specific offsets (how could it)? Unless this was added in recent years. Interesting tho!
They are very different, but some people use them for similar things.
Kafka is a stream, RabbitMQ is a queue. Without getting into the details, RabbitMQ is designed to add things to a stack and pop them off when consumed. Kafka is designed to stream everything to a continuous log and anywhere can tune in when appropriate.
Kafka events are replayable, and represent the final state we're aiming for. You might have two exactly the same events that tell you how the state should look like. And there are batches of events. It's good for batch processing - "we're getting new portion of data to train AI model".
RabbitMQ messages are supposed to be processed/consumed/acked only once. Your app most probably won't ever get two exactly the same messages, unless you misconfigured/misued RabbitMQ. It's good for classic message processing - "used clicked something, run the job of informing subscribers that new post has been created" (because you can't send 1000 messages from a web worker thread).
The lack of Queue like support as a first class citizen in Kafka has kept a few things on the sidelines for me.
Native capacity to do queueing is exciting, especially the concept of share groups to allow potentially different types of queues and shares in the future.
It’s might not be appropriate, but one step closer to eating more workflow engines for lunch.
i'm having a hard time understanding what scenario requires a queue semantic instead of a stream one.
Is is the ability to parallelize consuming to a super large amount, without having to setup partitions ?
I'm planning on using kafka for a job queue, and i like the idea that i can add ordering definition if i need to, and that i can keep the jobs in the queue for auditing later on if i need to. What am i missing ?
Without being limited by partitions. In Kafka your unit of parallelism is partitions but what happens when you don't care at all (or much) about ordering and just want to add or remove consumers to match your current load? Queue semantics.
In Kafka the number of partitions can go up, but not down. And even when you do that the messages don't get split up to fill the new partition so you can't burn down a backup by adding more partitions or more consumers -- ope.
On the other hand, Kafka isn't the only player in the queue game nowadays. If you need message queue and job queue semantics combined (which you likely do), just use Pulsar.