I thought the same, so I ran brotli and zstd on some PDFs I had laying around. b...

mort96 · 2026-01-22T17:29:10 1769102950

EDIT: Something weird is going on here. When compressing zstd in parallel it produces the garbage results seen here, but when compressing on a single core, it produces result competitive with Brotli (37M). See: https://news.ycombinator.com/item?id=46723158

I did my own testing where Brotli also ended up better than ZSTD: https://news.ycombinator.com/item?id=46722044

Results by compression type across 55 PDFs:

    +------+------+-----+------+--------+
    | none | zstd | xz  | gzip | brotli |
    +------|------|-----|------|--------|
    | 47M  | 45M  | 39M | 38M  | 37M    |
    +------+------+-----+------+--------+

mort96 · 2026-01-22T23:41:11 1769125271

Turns out that these numbers are caused by APFS weirdness. I used 'du' to get them which reports the size on disk, which is weirdly bloated for some reason when compressing in parallel. I should've used 'du -A', which reports the apparent size.

Here's a table with the correct sizes, reported by 'du -A' (which shows the apparent size):

    +---------+---------+--------+--------+--------+
    |  none   |  zstd   |   xz   |  gzip  | brotli |
    +---------|---------|--------|--------|--------|
    | 47.81M  | 37.92M  | 37.96M | 38.80M | 37.06M |
    +---------+---------+--------+--------+--------+

These numbers are much more impressive. Still, Brotli has a slight edge.

tracker1 · 2026-01-23T17:41:05 1769190065

Worth considering the compress/decompress overhead, which is also lower in brotli than zstd from my understanding.

Also, worth testing zopfli since it's decompression is gzip compatible.

atiedebee · 2026-01-23T12:49:38 1769172578

Ran the tests again with some more files, this time decompressing the pdf in advance. I picked some widely available PDFs to make the experiment reproducable.

  file            | raw         | zstd       (%)      | brotli     (%)     |
  gawk.pdf        | 8.068.092   | 1.437.529  (17.8%)  | 1.376.106  (17.1%) |
  shannon.pdf     | 335.009     | 68.739     (20.5%)  | 65.978     (19.6%) |
  attention.pdf   | 24.742.418  | 367.367    (1.4%)   | 362.578    (1.4%)  |
  learnopengl.pdf | 253.041.425 | 37.756.229 (14.9%)  | 35.223.532 (13.9%) |

For learnopengl.pdf I also tested the decompression performance, since it is such a large file, and got the following (less surprising) results using 'perf stat -r 5':

  zstd:   0.4532 +- 0.0216 seconds time elapsed  ( +-  4.77% )
  brotli: 0.7641 +- 0.0242 seconds time elapsed  ( +-  3.17% )

The conclusion seems to be consistent with what brotli's authors have said: brotli achieves slightly better compression, at the cost of a little over half the decompression speed.

mrspuratic · 2026-01-22T18:16:14 1769105774

> I couldn't quickly find a way to decompress them

    pdftk in.pdf output out.pdf decompress

Thoreandan · 2026-01-22T19:57:26 1769111846

Does your source .pdf material have FlateDecode'd chunks or did you fully uncompress it?

atiedebee · 2026-01-23T12:00:06 1769169606

I wasn't sure. I just went in with the (probably faulty) assumption that if it compresses to less than 90% of the original size that it had enough "non-randomness" to compare compression performance.

order-matters · 2026-01-22T16:47:25 1769100445

Whats the assumption we can potentially target as reason for the counter-intuitive result?

that data in pdf files are noisy and zstd should perform better on noisy files?

jeffbee · 2026-01-22T17:03:43 1769101423

What's counter-intuitive about this outcome?

order-matters · 2026-01-22T17:39:58 1769103598

maybe that was too strongly worded but there was an expectation for zstd to outperform. So the fact it didnt means the result was unexpected. i generally find it helpful to understand why something performs better than expected.

mort96 · 2026-01-22T17:47:29 1769104049

Isn't zstd primarily designed to provide decent compression ratios at amazing speeds? The reason it's exciting is mainly that you can add compression to places where it didn't necessarily make sense before because it's almost free in terms of CPU and memory consumption. I don't think it has ever had a stated goal of beating compression ratio focused algorithms like brotli on compression ratio.

sgerenser · 2026-01-22T18:38:44 1769107124

I actually thought zstd was supposed to be better than Brotli in most cases, but a bit of searching reveals you're right... Brotli, especially at the highest compression levels (10/11), often exceeds zstd at the highest compression levels (20-22). Both are very slow at those levels, although perfectly suitable for "compress once, decompress many" applications which the PDF spec is obviously one of them.