Dsq
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
Install / Use
/learn @multiprocessio/DsqREADME
Not under active development
While development may continue in the future with a different architecture, for the moment you should probably instead use DuckDB, ClickHouse-local, or GlareDB (based on DataFusion).
These are built on stronger analytics foundations than projects like dsq based on SQLite. For example, column-oriented storage and vectorized execution, let alone JIT-compiled expression evaluation, are possible with these other projects.
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more
Since Github doesn't provide a great way for you to learn about new releases and features, don't just star the repo, join the mailing list.
- About
- Install
- Usage
- Pretty print
- Piping data to dsq
- Multiple files and joins
- SQL query from file
- Transforming data to JSON without querying
- Array of objects nested within an object
- Nested object values
- Nested arrays
- REGEXP
- Standard Library
- Output column order
- Dumping inferred schema
- Caching
- Interactive REPL
- Converting numbers in CSV and TSV files
- Supported Data Types
- Engine
- Comparisons
- Benchmark
- Third-party integrations
- Community
- How can I help?
- License
About
This is a CLI companion to DataStation (a GUI) for running SQL queries against data files. So if you want the GUI version of this, check out DataStation.
Install
Binaries for amd64 (x86_64) are provided for each release.
macOS Homebrew
dsq is available on macOS Homebrew:
$ brew install dsq
Binaries on macOS, Linux, WSL
On macOS, Linux, and WSL you can run the following:
$ VERSION="v0.23.0"
$ FILE="dsq-$(uname -s | awk '{ print tolower($0) }')-x64-$VERSION.zip"
$ curl -LO "https://github.com/multiprocessio/dsq/releases/download/$VERSION/$FILE"
$ unzip $FILE
$ sudo mv ./dsq /usr/local/bin/dsq
Or install manually from the releases
page, unzip and add
dsq to your $PATH.
Binaries on Windows (not WSL)
Download the latest Windows
release, unzip it,
and add dsq to your $PATH.
Build and install from source
If you are on another platform or architecture or want to grab the latest release, you can do so with Go 1.18+:
$ go install github.com/multiprocessio/dsq@latest
dsq will likely work on other platforms that Go is ported to such as
AARCH64 and OpenBSD, but tests and builds are only run against x86_64
Windows/Linux/macOS.
Usage
You can either pipe data to dsq or you can pass a file name to
it. NOTE: piping data doesn't work on Windows.
If you are passing a file, it must have the usual extension for its content type.
For example:
$ dsq testdata.json "SELECT * FROM {} WHERE x > 10"
Or:
$ dsq testdata.ndjson "SELECT name, AVG(time) FROM {} GROUP BY name ORDER BY AVG(time) DESC"
Pretty print
By default dsq prints ugly JSON. This is the most efficient mode.
$ dsq testdata/userdata.parquet 'select count(*) from {}'
[{"count(*)":1000}
]
If you want prettier JSON you can pipe dsq to jq.
$ dsq testdata/userdata.parquet 'select count(*) from {}' | jq
[
{
"count(*)": 1000
}
]
Or you can enable pretty printing with -p or --pretty in dsq
which will display your results in an ASCII table.
$ dsq --pretty testdata/userdata.parquet 'select count(*) from {}'
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
Piping data to dsq
When piping data to dsq you need to set the -s flag and specify
the file extension or MIME type.
For example:
$ cat testdata.csv | dsq -s csv "SELECT * FROM {} LIMIT 1"
Or:
$ cat testdata.parquet | dsq -s parquet "SELECT COUNT(1) FROM {}"
Multiple files and joins
You can pass multiple files to DSQ. As long as they are supported data
files in a valid format, you can run SQL against all files as
tables. Each table can be accessed by the string {N} where N is the
0-based index of the file in the list of files passed on the
commandline.
For example this joins two datasets of differing origin types (CSV and JSON).
$ dsq testdata/join/users.csv testdata/join/ages.json \
"select {0}.name, {1}.age from {0} join {1} on {0}.id = {1}.id"
[{"age":88,"name":"Ted"},
{"age":56,"name":"Marjory"},
{"age":33,"name":"Micah"}]
You can also give file-table-names aliases since dsq uses standard
SQL:
$ dsq testdata/join/users.csv testdata/join/ages.json \
"select u.name, a.age from {0} u join {1} a on u.id = a.id"
[{"age":88,"name":"Ted"},
{"age":56,"name":"Marjory"},
{"age":33,"name":"Micah"}]
SQL query from file
As your query becomes more complex, it might be useful to store it in a file
rather than specify it on the command line. To do so replace the query argument
with --file or -f and the path to the file.
$ dsq data1.csv data2.csv -f query.sql
Transforming data to JSON without querying
As a shorthand for dsq testdata.csv "SELECT * FROM {}" to convert
supported file types to JSON you can skip the query and the converted
JSON will be dumped to stdout.
For example:
$ dsq testdata.csv
[{...some csv data...},{...some csv data...},...]
Array of objects nested within an object
DataStation and dsq's SQL integration operates on an array of
objects. If your array of objects happens to be at the top-level, you
don't need to do anything. But if your array data is nested within an
object you can add a "path" parameter to the table reference.
For example if you have this data:
$ cat api-results.json
{
"data": {
"data": [
{"id": 1, "name": "Corah"},
{"id": 3, "name": "Minh"}
]
},
"total": 2
}
You need to tell dsq that the path to the array data is "data.data":
$ dsq --pretty api-results.json 'SELECT * FROM {0, "data.data"} ORDER BY id DESC'
+----+-------+
| id | name |
+----+-------+
| 3 | Minh |
| 1 | Corah |
+----+-------+
You can also use the shorthand {"path"} or {'path'} if you only have one table:
$ dsq --pretty api-results.json 'SELECT * FROM {"data.data"} ORDER BY id DESC'
+----+-------+
| id | name |
+----+-------+
| 3 | Minh |
| 1 | Corah |
+----+-------+
You can use either single or double quotes for the path.
Multiple Excel sheets
Excel files with multiple sheets are stored as an object with key being the sheet name and value being the sheet data as an array of objects.
If you have an Excel file with two sheets called Sheet1 and Sheet2
you can run dsq on the second sheet by specifying the sheet name as
the path:
$ dsq data.xlsx 'SELECT COUNT(1) FROM {"Sheet2"}'
Limitation: nested arrays
You cannot specify a path through an array, only objects.
Nested object values
It's easiest to show an example. Let's say you have the following JSON file called user_addresses.json:
$ cat user_addresses.json
[
{"name": "Agarrah", "location": {"city": "Toronto", "address": { "number": 1002 }}},
{"name": "Minoara", "location": {"city": "Mexico City", "address": { "number": 19 }}},
{"name": "Fontoon", "location": {"city": "New London", "address": { "number": 12 }}}
]
You can query the nested fields like so:
$ dsq user_addresses.json 'SELECT name, "location.city" FROM {}'
And if you need to disambiguate the table:
$ dsq user_addresses.json 'SELECT name, {}."location.city" FROM {}'
Caveat: PowerShell, CMD.exe
On PowerShell and CMD.exe you must escape inner double quotes with backslashes:
> dsq user_addresses.json 'select name, \"location.city\" from {}'
[{"location.city":"Toronto","name":"Agarrah"},
{"location.city":"Mexico City","name":"Minoara"},
{"location.city":"New London","name":"Fontoon"}]
Nested objects explained
Nested objects are collapsed and their new column name becomes the
JSON path to the value connected by .. Actual dots in the path must
be escaped with a backslash. Since . is a special character in SQL
you must quote the whole new column name.
Limitation: whole object retrieval
You cannot query whole objects, you must ask for a specific path that results in a scalar value.
For example in the `user_addresses.
Related Skills
oracle
339.1kBest practices for using the oracle CLI (prompt + file bundling, engines, sessions, and file attachment patterns).
xurl
339.1kA CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.
prose
339.1kOpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.
Command Development
83.8kThis skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
