Srgn
A grep-like tool which understands source code syntax and allows for manipulation in addition to search
Install / Use
/learn @alexpovel/SrgnREADME
srgn - a code surgeon
A grep-like tool which understands source code syntax and allows for manipulation in
addition to search.
Like grep, regular expressions are a core primitive. Unlike grep, additional
capabilities allow for higher precision, with options for manipulation. This
allows srgn to operate along dimensions regular expressions and IDE tooling (Rename
all, Find all references, ...) alone cannot, complementing them.
srgn is organized around actions to take (if any), acting only within precise,
optionally language grammar-aware scopes. In terms of existing tools, think of it
as a mix of
tr,
sed,
ripgrep and
tree-sitter, with a design goal of
simplicity: if you know regex and the basics of the language you are working with, you
are good to go.
The answer to "What if
grep,tr,sedandtree-sittergot really drunk one night and had a baby?"
Quick walkthrough
[!TIP]
All code snippets displayed here are verified as part of unit tests using the actual
srgnbinary. What is showcased here is guaranteed to work.
The most simple srgn usage works similar to tr:
$ echo 'Hello World!' | srgn '[wW]orld' -- 'there' # replacement
Hello there!
Matches for the regular expression pattern '[wW]orld' (the scope) are replaced (the
action) by the second positional argument. Zero or more actions can be specified:
$ echo 'Hello World!' | srgn '[wW]orld' # zero actions: input returned unchanged
Hello World!
$ echo 'Hello World!' | srgn --upper '[wW]orld' -- 'you' # two actions: replacement, afterwards uppercasing
Hello YOU!
Replacement is always performed first and specified positionally. The argument goes
last, disambiguated by -- for safety. Any other actions are applied
after and given as command line flags.
Multiple scopes
Similarly, more than one scope can be specified: in addition to the regex pattern, a
language grammar-aware scope can be
given, which scopes to syntactical elements of source code (think, for example, "all
bodies of class definitions in Python"). If both are given, the regular expression
pattern is then only applied within that first, language scope. This enables
search and manipulation at precision not normally possible using plain regular
expressions, and serving a dimension different from tools such as Rename all in IDEs.
For example, consider this (pointless) Python source file:
"""Module for watching birds and their age."""
from dataclasses import dataclass
@dataclass
class Bird:
"""A bird!"""
name: str
age: int
def celebrate_birthday(self):
print("🎉")
self.age += 1
@classmethod
def from_egg(egg):
"""Create a bird from an egg."""
pass # No bird here yet!
def register_bird(bird: Bird, db: Db) -> None:
assert bird.age >= 0
with db.tx() as tx:
tx.insert(bird)
which can be searched using:
$ cat birds.py | srgn --python 'class' 'age'
11: age: int
15: self.age += 1
The string age was sought and found only within Python class definitions (and not,
for example, in function bodies such as register_bird, where age also occurs and
would be nigh impossible to exclude from consideration in vanilla grep). By default,
this 'search mode' also prints line numbers. Search mode is entered if no actions are
specified, and a language such as --python is given[^3]—think of it like
'ripgrep but with syntactical language
elements'.
Searching can also be performed across
lines, for example to
find methods (aka def within class) lacking docstrings:
$ cat birds.py | srgn --python 'class' 'def .+:\n\s+[^"\s]{3}' # do not try this pattern at home
13: def celebrate_birthday(self):
14: print("🎉")
Note how this does not surface either from_egg (has a docstring) or register_bird
(not a method, def outside class).
Multiple language scopes
Language scopes themselves can be specified multiple times as well. For example, in the Rust snippet
pub enum Genre {
Rock(Subgenre),
Jazz,
}
const MOST_POPULAR_SUBGENRE: Subgenre = Subgenre::Something;
pub struct Musician {
name: String,
genres: Vec<Subgenre>,
}
multiple items can be surgically drilled down into as
$ cat music.rs | srgn --rust 'pub-enum' --rust 'type-identifier' 'Subgenre' # AND'ed together
2: Rock(Subgenre),
where only lines matching all criteria are returned, acting like a logical and
between all conditions. Note that conditions are evaluated left-to-right, precluding
some combinations from making sense: for example, searching for a Python class body
inside of Python doc-strings usually returns nothing. The inverse works as expected
however:
$ cat birds.py | srgn --py 'class' --py 'doc-strings'
8: """A bird!"""
19: """Create a bird from an egg."""
No docstrings outside class bodies are surfaced!
The -j flag changes this behavior: from intersecting left-to-right, to
running all queries independently and joining their results, allowing you to search
multiple ways at once:
$ cat birds.py | srgn -j --python 'comments' --python 'doc-strings' 'bird[^s]'
8: """A bird!"""
19: """Create a bird from an egg."""
20: pass # No bird here yet!
The pattern bird[^s] was found inside of comments or docstrings likewise, not just
"docstrings within comments".
Working recursively
If standard input is not given, srgn knows how to find relevant source files
automatically, for example in this repository:
$ srgn --python 'class' 'age'
docs/samples/birds
11: age: int
15: self.age += 1
docs/samples/birds.py
9: age: int
13: self.age += 1
It recursively walks its current directory, finding files based on file
extensions and shebang lines, processing
at very high speed. For example, srgn --go strings '\d+' finds and prints all ~140,000
runs of digits in literal Go strings inside the Kubernetes
codebase
of ~3,000,000 lines of Go code within 3 seconds on 12 cores of M3. For more on working
with many files, see below.
Combining actions and scopes
Scopes and actions can be combined almost arbitrarily (though many combinations are not going to be use- or even meaningful). For example, consider this Python snippet (for examples using other supported languages see below):
"""GNU module."""
def GNU_says_moo():
"""The GNU function -> say moo -> ✅"""
GNU = """
GNU
""" # the GNU...
print(GNU + " says moo") # ...says moo
against which the following command is run:
cat gnu.py | srgn --titlecase --python 'doc-strings' '(?<!The )GNU ([a-z]+)' -- '$1: GNU 🐂 is not Unix'
The anatomy of that invocation is:
-
--titlecase(an action) will Titlecase Everything Found In Scope -
--python 'doc-strings'(a scope) will scope to (i.e., only take into consideration) docstrings according to the Python language grammar -
<!-- markdownlint-disable MD038 -->'(?<!The )GNU ([a-z]+)'(a scope) sees only what was already scoped by the previous option, and will narrow it down further. It can never extend the previous scope. The regular expression scope is applied after any language scope(s).
<!-- markdownlint-enable MD038 -->(?<!)is negative lookbehind syntax, demonstrating how this advanced feature is available. Strings ofGNUprefixed byThewill not be considered. -
'$1: GNU 🐂 is not Unix'(an action) will replace each matched occurrence (i.e., each input section found to be in scope) with this string. Matched occurrences are patterns of'(?<!The )GNU ([a-z]+)'only within Python docstrings. Notably, this replacement string demonstrates:- dynamic variable binding and substitution using
$1, which carries the contents captured by the first capturing regex group. That's([a-z]+), as(?<!The )is not capturing. - full Unicode support (🐂).
- dynamic variable binding and substitution using
The command makes use of multiple scopes (language and regex pattern) and multiple actions (replacement and titlecasing). The result then reads
"""Module: GNU 🐂 Is Not Unix."""
def GNU_says_moo():
"""The GNU function -> say moo -> ✅"""
GNU = """
GNU
""" # the GNU...
print(GNU + " says moo") # ...says moo
where the changes are limited to:
- """GNU module."""
+ """Module: GNU 🐂 Is Not Unix."""
def GNU_says_moo():
"""The GNU -> say moo -> ✅"""
[!WARNING]
While
srgnis in beta (major version 0), make sure to only (recursively) process files you can safely restore.Search mode does not overwrite files, so is always safe.
See below for the full help output of the tool.
[!NOTE]
Supported languages are
- C
- C#
- Go
- HCL (Terraform)
- Python
- Rust
- TypeScript
