Sqlean
🧹 Clean your SQL queries! 🧹
Install / Use
/learn @oliverxchen/SqleanREADME
sqlean
<p align="center"> <a href='https://coveralls.io/github/oliverxchen/sqlean'><img src='https://coveralls.io/repos/github/oliverxchen/sqlean/badge.svg' alt='Coverage Status' /></a> <a href="https://www.python.org/"><img alt="Python versions: 3.7, 3.8, 3.9, 3.10" src="https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11b-blue"></a> <a href="https://pypi.org/project/sqlean/"><img alt="PyPI" src="https://img.shields.io/pypi/v/sqlean"></a> <a href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg"></a> <a href="https://github.com/oliverxchen/sqlean/blob/main/LICENSE"><img alt="License: MIT" src="https://img.shields.io/badge/license-MIT-A31F34"></a> </p>sqlean is a Python CLI to clean your SQL queries. It aspires to be the black for SQL queries: an uncompromising query formatter that imposes a strict, strongly-opinionated formatting standard, with an easy workflow to achieve that standard.
Early releases of sqlean will support BigQuery dialect SQL queries with or without dbt syntax. If there's sufficient interest, other SQL dialects can be added.
WARNING: this is very early in sqlean's life and it should not be used in production.
Usage
Installing
Make sure you're in a virtual environment and install sqlean with pip:
pip install sqlean
sqlean command
The sqlean CLI has one argument and several options. The one argument in the
sqlean CLI is the TARGET which can be a file or a directory. If the TARGET
is a directory, sqlean will be applied to all SQL files within the directory and
all sub-directories.
When sqlean is run without any options, SQL files will be changed in place so that the files comply with sqlean's formatting rules. Here's an example of a successful run (with no options):
<img src="https://github.com/oliverxchen/sqlean/raw/main/docs/img/pass_summary.png" width=640 alt="Succesful run of sqlean with summary table."/>sqlean is thoroughly tested to ensure that original queries have the same
abstract syntax tree (AST) as the formatted query. In plain English, that means
that sqlean should never break a query. But sqlean is very new so if you don't
fully trust sqlean yet, you can use the dry-run option (-d/--dry-run)
to display the diff that would happen if sqlean is run for real on the file :
Because sqlean needs to build an AST, if sqlean doesn't understand an SQL query
then it is unparsable. In this case, the example query is unparsable because
there's a problem with the query, and the verbose option (-v/--verbose)
points to the problem:
Brief documentation on sqlean commands can be accessed with:
sqlean --help
Workflow
Once sqlean is production ready, it is ideally used in your continuous integration (CI) workflow so that a pull request that adds or changes SQL queries can only be merged if sqlean runs without errors (i.e. with the final message "All files passed").
When you first introduce sqlean into your workflow, many of your SQL files can be automatically fixed. In an example production query-base of over 1,600 files, nearly 60% of queries could be automatically fixed.
The other SQL files are unparsable. sqlean will need further development to treat these files and you can help by reporting the issue so that the problem can be addressed.
In the mean time though, the unparsable files can be marked so that sqlean knows
to ignore them in CI runs. You can do this automatically by running sqlean with
the ignore option (-i/--write-ignore), or just by inserting this comment as
the first line of the file that you want ignored:
# sqlean ignore
If some of the issues are addressed by a future version of sqlean, you can
upgrade sqlean and automatically remove the ignore indicators with the force
(-f/--force) option.
Configuring sqlean
CLI options are for options that can change from one run of sqlean to another.
Project level configuration will not change from one run to another, and must be
set in pyproject.toml.
One exception is the TARGET directory or file that can be both a project level
setting and also change from one run to another, so it can appear in both
pyproject.toml or as a CLI argument.
In pyproject.toml, sqlean options go in a [tool.sqlean] section. The options
are:
includes: a single path or list of paths to include when sqlean is run. This will be over-ridden by theTARGETargument in the CLI when it is supplied. The default value isNone, in which caseTARGETmust be used.indent_size: an integer that indicates the number of spaces in the indent. The default value is 2.whisper: (NOT YET IMPLEMENTED) a boolean which should be set totrueto make all SQL reserved words and function names all lower case. The default value isfalse, which makes all SQL reserved words and function names all-caps.max_line_length: (NOT YET IMPLEMENTED) an integer value which adds additional line breaks if a line is longer than this value. The default value is-1to indicate that no maximum will be applied.dialect: (NOT YET IMPLEMENTED) a string to indicate the SQL dialect used in the project. The default value is"BIGQUERY"
Why
In the last few years, the usage of SQL in big data processing has steadily increased with the emergence of computational engines like Google Cloud BigQuery, Amazon Redshift, Snowflake and others with SQL interfaces.
Leveraging on the growth of SQL, dbt has gained wide-spread adoption as a tool to orchestrate the running of SQL queries. While analytics and feature generation pipelines were previously written primarily in programming languages like Python or Scala, they are now mostly dbt code: a combination of SQL and jinja templating.
The tooling around dbt code does not have the maturity as tooling around older languages like Python. We see code formatting and styling as a critical part of tooling around a language: consistently formatted code makes it easier to collaborate within a team by making it faster and easier to read other people's code.
Black has emerged as the most popular code formatter for Python for a couple reasons:
- There are few options so there is little to argue about when setting it up for the first time.
- If two versions of a piece of code have the same AST, they will have the same black formatted output (modulo minor differences in line spacing).
The strong consistency in formatting means that sometimes the output looks a bit strange or at least different than what you're used to. But over time our brains adjust and it becomes faster to read black formatted code.
sqlean aspires to play the same rule with SQL/dbt queries. By using sqlean, you'll give up control on exactly how your queries are formatted and styled, but you'll gain strong consistency in the formatting. No more nitpicking in code reviews and no more getting used to other team members' query styles.
Current state of sqlean
These are very early, experimental days in the development of sqlean so these numbers will change. On a production code base of 1,615 queries, 58.7% could be parsed and formatted and the other 42.3% could not be parsed. It takes 8.8 seconds for sqlean to attempt to parse and fix these queries. The time increases as more queries can be parsed, so it is expected to take roughly 15 seconds when most or all of the queries can be parsed.
Alternatives
There are other SQL formatting/linting tools out there and some may be more suitable for you.
SQLFluff
SQLFluff is a dialect-flexible and configurable SQL linter. Designed with ELT applications in mind, SQLFluff also works with Jinja templating and dbt. SQLFluff will auto-fix most linting errors, allowing you to focus your time on what matters.
SQLFluff is the main alternative to sqlean with wide community adoption and active contributions (4K+ Github stars and 150+ contributors). SQLFluff has a wide array of rules that can be customized. Time to auto-format queries can vary widely. Running SQLFluff on the production code base of 1,615 queries took too long, so a subset of 617 queries was used for comparison:
| | sqlean | SQLFluff | | :---------------------- | -----: | -------: | | Time to auto-format (s) | 3.8 | 1295.2 | | # files auto-formatted | 388 | 228 | | Version | 0.0.3 | 0.13.2 |
While SQLFluff's performance depends on the starting queries, on this sample
sqlean was able to run nearly 350 times faster than SQLFluff and was able to
auto-format 70% more files than SQLFluff. SQLFluff was run with the default
rules using the command: sqlfluff fix -f --dialect bigquery.
The large difference in run-times is likely due to
Related Skills
oracle
341.8kBest practices for using the oracle CLI (prompt + file bundling, engines, sessions, and file attachment patterns).
prose
341.8kOpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.
Command Development
84.6kThis skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
Plugin Structure
84.6kThis skill should be used when the user asks to "create a plugin", "scaffold a plugin", "understand plugin structure", "organize plugin components", "set up plugin.json", "use ${CLAUDE_PLUGIN_ROOT}", "add commands/agents/skills/hooks", "configure auto-discovery", or needs guidance on plugin directory layout, manifest configuration, component organization, file naming conventions, or Claude Code plugin architecture best practices.
