I work on Semgrep; there are a bunch of examples at https://semgrep.live if you're curious about what the syntax looks like.
For context, Semgrep started as a Facebook open-source project inspired from a Inria project named Coccinelle, which has has made a couple thousand or so automatic patches to the Linux kernel over the years using a semantic patch language (http://coccinelle.lip6.fr/sp.php)
C# is high on the list, F# isn't a priority at the moment though. Behind the scenes, we've recently changed to use tree-sitter as the parser library; if there is a good F# tree-sitter library integration becomes quite easy. I don't see one at https://tree-sitter.github.io/tree-sitter/ but perhaps there's one maintained elsewhere.
I really appreciate the semantic checks. They're especially nice for security-sensitive lint rules, but really it removes the hacky regular expressions feel of adding lint rules to a codebase. It's also been useful for some codebase migrations (semgrep is more precise than e.g. `git grep -w` for finding "All the places we use code pattern X that we want to stop doing").
My main complaint about it is performance -- it's too slow per unit rule for us to replace the regular expression based system that we run on our whole codebase (so we can't happily convert our other ~100 regular expression-based lint rules to semgrep (https://github.com/zulip/zulip/blob/master/tools/linter_lib/...).
But performance has been improving a lot over time, and I think there's potential for it to be faster (E.g. mypy, the Python type-checker, has gotten way way faster in the last year or two). Because semgrep is getting active investment from a venture-funded company that I imagine will improve the performance, I expect semgrep to be a tool that most projects serious about code quality are using in a few years.
I should add that performance may also be less important to others than it is to us; we run all of our linters (currently 20 distinct linters, including eslint, prettier, pyflakes, isort, shellcheck, etc.) in parallel using https://github.com/zulip/zulint, with the goal of being able to lint the entire codebase in <30s or changed files in under 1s (obviously time depends on number of files changed).
I wonder if this could be improved by extracting fixed strings from the pattern and only actually parsing the files that could possibly match. I think the major issue would be alias support but even that should be possible for most languages as your fixed-string extraction would notice the alias itself.
Just went through the examples. Seems really intuitive and looks like it would be a good approach for homegrown linters. Would also love to see some plugin support for editors.
VS Code and vim would be the ones I would be most concerned about as I typically jump between the two. Although a pre-commit hook is great and something I will definitely use, having this hook reporting issues in a more live manner would be a huge bonus.
Comby seems more like "parenthesis matching + search" (they don't implement a full parser for the language, just some basic required constructs to make a basic AST. I imagine this limits the resolution of the search?
Semgrep uses an AST that's equivalent to the parser of the language itself so it's much higher resolution in terms of what you can match.
Regexes are such a horrible thing to deal with when you're just trying to parse code quickly and don't want to deal with AST. I've always wished for a library of regexes that just work.
I've always wondered if we could leverage the vast amount of GitHub code - that assumably all compiles without error or undefined behaviour on their master branches - train some sort of neural net to better catch syntax errors.
Has anyone done something like this, or am I riding the 2016 neural net hype train still?
This isn't specifically for syntax errors, but Jacob Jackson released TabNine [0] last year, which is an autocompleter trained on files from GitHub [1].
TabNine was acquired by Codota earlier this year [2].
Nice to see more work in this direction. I used coccinelle a lot for automating changes/bug detection and I immediately missed it when working on anything that is not C.
For context, Semgrep started as a Facebook open-source project inspired from a Inria project named Coccinelle, which has has made a couple thousand or so automatic patches to the Linux kernel over the years using a semantic patch language (http://coccinelle.lip6.fr/sp.php)