I wrote a mostly-clone of Ack in C: https://github.com/ggreer/the_silver_searcher . Output format and most flags are the same. Besides the speed, most users won't notice a difference.
I spared no effort in optimizing. Pthreads, mmap(), boyer-moore-horspool strstr, it's all there. Searching my ~/code (5.2GB of stuff), I get this:
ag blahblahblah 1.93s user 3.54s system 313% cpu 1.749 total
ack blahblahblah 9.75s user 2.79s system 98% cpu 12.690 total
Both programs ignore a lot of extraneous files by default (hidden files, binary files, stuff in .gitignore, etc). The real amount of data searched is closer to 500MB.
Looks good, but from the doc I can't tell if it supports the second most useful feature of ack, that is scoped search:
ack --ruby --js foo_bar
will search only ruby and javascript files, which means .rb+.erb+.rhtml+.js+...
Also exclusion with --no-* is very useful (especially --no-sql).
This is markedly different from 'simply' ignoring irrelevant files, besides the fact that it does not need a 'project' to work (ack --ruby foo_func $(bundle show bar_gem)).
The better part being it is extendable so that I can create --stylesheets covering css+sass+scss+less, or add say .builder to --ruby.
Ag supports the same regexes as Ack. I use the PCRE library. I only call pcre_study once, and I use the new PCRE-JIT[1] on systems where it's available. These tweaks add up to a 3-5x speedup over Ack when regex-matching.
Thanks so much, I was staying with grep precisely because of the performance and perl dependency of Ack. Does the silver searcher compile on win32 as well?
Since I added pthreads, there's no chance that it builds on Windows anymore. I don't have a Windows machine or VM to test stuff out on. Patches welcome, though!
Did you benchmark read() vs mmap()? Most tools seem to go
with read() for grep-like io patterns.
In fact looks like GNU grep has --mmap switch and it's a little bit faster in the simple case than default on my Ubuntu system. But -i makes mmap slower. Maybe GNU grep
just avoids mmap because of error handling
(you get a segfault/bus error instead of an io error return
when things go wrong).
I spared no effort in optimizing. Pthreads, mmap(), boyer-moore-horspool strstr, it's all there. Searching my ~/code (5.2GB of stuff), I get this:
Both programs ignore a lot of extraneous files by default (hidden files, binary files, stuff in .gitignore, etc). The real amount of data searched is closer to 500MB.