Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did a lot of data wrangling this year. The usual grep, sed, awk, jq and even find has sped up my days significantly. Sed is among my favorites to whip up some quick, ad hoc, transformations.

This year I added Miller [0] to my list; a tool to process tabular data, similar to sed, awk, etc. It handles csv, tav, json lines, etc. in a consistent way. I like the delimited key-value pairs format, which allows me to write simple oneliners in bash to collect some data (e.g. "ip=x.x.x.x,endpoint=/api/x") and use Miller to crunch the results. Not sure it saved me 100h, but it was one of the biggest time savers this year!

[0] https://miller.readthedocs.io/en/latest/



I second Miller - besides its extensive support for various formats, it is fast. If you ever have to deal with gigabyte-sized files, Miller will give noticeable speed ups versus jq et al.


FYI: https://github.com/BurntSushi/xsv is much faster than mlr (like an order of magnitude), and zsv (https://github.com/liquidaty/zsv) is even faster. But, neither support formulas. Disclaimer: I am one of the zsv authors




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: