Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's worth mentioning that there are much faster JSON parsing libraries than the default in Ruby stdlib. I still don't think Ruby is the best choice for doing raw JSON parsing. Last time I had to care about JSON speed we were transforming billions of events and the Ruby JSON lib was becoming a bottleneck


> It's worth mentioning that there are much faster JSON parsing libraries than the default in Ruby stdlib.

I am on the edge of my seat now.

Would you mind listing which libraries are much (say, an order of magnitude) faster?


Don't think it's an order of magnitude faster, but oj is supposed to be the standard for Ruby.

https://github.com/ohler55/oj


What do you think is the best way to do deep JSON comparisons? We work with 2GB JSONs all day, and it is super annoying how long they take to process.


Not parse them into a tree, to start with.

Use a streaming JSON parser, and compare them token by token unless/until they diverge, at which point you take whatever actual suitable to identify the delta.

Parsing it into a tree may be necessary if you want to do more complex comparisons (such as sorting child objects etc.), but even then depending on your need you may well be better off storing offsets into the file depending on your requirements.

https://github.com/lloyd/yajl is an example of a streaming JSON parser (caveat: I've not benchmarked it at all), but JSON is simple enough to write one specifically to handle two streams.


I believe this comparison benchmark could be useful for you and you can expand further with more tests. Although I got downvoted for sharing a link.

https://github.com/kostya/benchmarks/blob/master/README.md


That still parses into a tree.


Rust is absolutely wonderful for tasks like this. They don't hit any of the cases where Rust's ownership can make things tricky. And the serde library makes deserializing JSON a piece of cake.

You end up with code which looks pretty similar to the equivalent JavaScript or Python code, but performs much faster (10x, 100x or even 1000x faster).


There's also pikkr (https://github.com/pikkr/pikkr) if you need really really fast JSON parsing.


Start with a compiled language, I guess? I don't operate on anywhere near that scale, but json-rust reaches 400 MB/s for me.

It doesn't parallelize, and you'd need memory enough for the entire structure, but of course Rust doesn't have GC overhead. You could trivially parse both files in parallel, at least.


(1) Try a language with fast allocations (C, C++, Rust, maybe Go or Java) -- anything except Python or Ruby

or

(2) Try using streaming API (I don't know Ruby, but quick google found https://github.com/dgraham/json-stream ). Note that this method will require you to massively restructure your program -- you want to avoid having all of the data in memory at once.

The streaming API might work better with jq-based preprocessing -- for example, if you want to compare two unsorted sets, it may be faster to sort them using jq, then compare line-by-line using streaming API.


Python is fast at parsing JSON, Go had hard time to match parsing speed of it. Additionally you have PyPy to help.



Python is fast at doing anything that doesn't involve running Python.

That's an important caveat. Python's C JSON parser library is super-fast, but if you want to use the data for anything but a simple equality check afterwards, it'll be slow as molasses.

Or you'll write a C extension for it...


nodejs comes to mind!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: