Octosql
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Install / Use
/learn @cube2222/OctosqlREADME
<img src="https://raw.githubusercontent.com/cube2222/octosql/main/images/logo.png" width="168">OctoSQL
OctoSQL is predominantly a CLI tool which lets you query a plethora of databases and file formats using SQL through a unified interface, even do JOINs between them. (Ever needed to join a JSON file with a PostgreSQL table? OctoSQL can help you with that.)
At the same time it's an easily extensible full-blown dataflow engine, and you can use it to add a SQL interface to your own applications.

Usage
octosql "SELECT * FROM ./myfile.json"
octosql "SELECT * FROM ./myfile.json" --describe # Show the schema of the file.
octosql "SELECT invoices.id, address, amount
FROM invoices.csv JOIN db.customers ON invoices.customer_id = customers.id
ORDER BY amount DESC"
octosql "SELECT customer_id, SUM(amount)
FROM invoices.csv
GROUP BY customer_id"
OctoSQL supports a bunch of file formats out of the box, but you can additionally install plugins to add support for other databases.
octosql "SELECT * FROM plugins.available_plugins"
octosql plugin install postgres
echo "databases:
- name: mydb
type: postgres
config:
host: localhost
port: 5443
database: mydb
user: postgres
password: postgres" > octosql.yml
octosql "SELECT * FROM mydb.users" --describe
octosql "SELECT * FROM mydb.users"
You can specify the output format using the --output flag. Available values for it are live_table, batch_table, csv and stream_native.
The documentation about available aggregates and functions is contained within OctoSQL itself. It's in the aggregates, aggregate_signatures, functions and function_signatures tables in the docs database.
octosql "SELECT * FROM docs.functions fs"
+------------------+----------------------------------------+
| fs.name | fs.description |
+------------------+----------------------------------------+
| 'abs' | 'Returns absolute value |
| | of argument.' |
| 'ceil' | 'Returns ceiling of |
| | argument.' |
| ... | ... |
+------------------+----------------------------------------+
Installation
Homebrew
You can install OctoSQL using Homebrew on MacOS or Linux:
brew install cube2222/octosql/octosql
After running it for the first time on MacOS you'll have to go into Preferences -> Security and Privacy -> Allow OctoSQL, as with any app that's not notarized.
Pre-Compiled binary
You can also download the binary for your operating system directly from the Releases page.
Nix Package
The package can be installed in the local nix-profile.
nix-env -iA nixpkgs.octosql
For adhoc or testing purposes a shell with the package can be spawned.
nix-shell -p octosql
For NixOS users it is highly recommended to install the package by adding it to the list of systemPackages.
environment.systemPackages = with pkgs; [
octosql
# ...
];
Building from source
With Go in version >= 1.18 the application can be built from source.
This can be achieved by cloning the repository and running go install from the project directory.
git clone https://github.com/cube2222/octosql
cd octosql
go install
File Access
Support for multiple file types is included by default in OctoSQL:
- JSON (in JSONLines format, one object per line)
- CSV
- TSV
- Parquet
- Lines (reading a file line by line)
If your file has a matching extension, you can use its path directly as a table:
~> octosql "SELECT * FROM my/file/path.json"
or, if the extension is not right, you can use this alternative notation, where the extension is used in place of the database name:
~> octosql "SELECT * FROM `json.my/file/path.whatever`"
You can also specify additional options using the following notation: myfile.ext?key=value&key2=value2
The following options are available:
- CSV
- header: true/false (default: true) - Whether the file has a header row.
- JSON
- tail: true/false (default: false) - Whether to keep waiting for new content after reaching the end of the file.
- Lines
- tail: true/false (default: false) - Whether to keep waiting for new content after reaching the end of the file.
Reading from Standard Input
You can also pipe data in through stdin, and OctoSQL will expose it as the stdin.<file_type> table. For example:
~> echo '{"hello": "world"}' | octosql "SELECT * FROM stdin.json"
+---------+
| hello |
+---------+
| 'world' |
+---------+
~> seq 100 | octosql "SELECT SUM(int(text)) FROM stdin.lines"
+------+
| sum |
+------+
| 5050 |
+------+
Plugins
To use databases which are not included in the core of OctoSQL - like PostgreSQL or MySQL - you need to install a plugin. Installing plugins is very easy. The following command installs the latest version of the PostgreSQL plugin:
octosql plugin install postgres
Plugins are grouped into repositories, and potentially have many versions available. The above uses the default core repository and tries to install the latest version. So if 0.42.0 was the latest version, the above would be equivalent to:
octosql plugin install core/postgres@0.42.0
Browsing available and installed plugins is possible through OctoSQL itself, behind a SQL interface. The available tables are: plugins.repositories, plugins.available_plugins, plugins.available_versions, plugins.installed_plugins, plugins.installed_versions.
~> octosql "SELECT name, description FROM plugins.available_plugins LIMIT 2"
+------------------------+-------------------------------+
| available_plugins.name | available_plugins.description |
+------------------------+-------------------------------+
| 'postgres' | 'Adds support for |
| | querying PostgreSQL |
| | databases.' |
| 'random_data' | 'Generates random data |
| | for testing.' |
+------------------------+-------------------------------+
~> octosql "SELECT plugin_name, version FROM plugins.available_versions WHERE plugin_name='random_data'"
+--------------------------------+----------------------------+
| available_versions.plugin_name | available_versions.version |
+--------------------------------+----------------------------+
| 'random_data' | '0.1.0' |
| 'random_data' | '0.1.1' |
| 'random_data' | '0.2.0' |
+--------------------------------+----------------------------+
Some plugins, like the random_data plugin, can be used without any additional configuration:
~> octosql plugin install random_data
Downloading core/random_data@0.2.0...
~> octosql "SELECT * FROM random_data.users" --describe
+---------------------------------+--------------------------+------------+
| name | type | time_field |
+---------------------------------+--------------------------+------------+
| 'users.avatar' | 'String' | false |
| 'users.credit_card' | '{cc_number: String}' | false |
| 'users.date_of_birth' | 'String' | false |
| 'users.email' | 'String' | false |
| 'users.first_name' | 'String' | false |
| 'users.last_name' | 'String' | false |
| ... | ... | ... |
+---------------------------------+--------------------------+------------+
~> octosql "SELECT first_name, last_name, date_of_birth FROM random_data.users LIMIT 3"
+------------------+-----------------+---------------------+
| users.first_name | users.last_name | users.date_of_birth |
+------------------+-----------------+---------------------+
| 'Alethea' | 'Kuvalis' | '1997-01-07' |
| 'Ambrose' | 'Spencer' | '1979-04-18' |
| 'Antione' | 'Hodkiewicz' | '1980-03-04' |
+------------------+-----------------+---------------------+
Others, like the postgres plugin, require additional configuration. The configuration file is located at ~/.octosql/octosql.yml. You can find the available configuration settings for a plugin in its own documentation.
~> octosql plugin install postgres
Downloading core/postgres@0.1.0...
echo "databases:
- name: mydb
type: postgres
config:
host: localhost
port: 5432
database: postgres
user: postgres
password: mypassword" > ~/.octosql/octosql.yml
~> octosql "SELECT * FROM mydb.customers" --describe
+--------------------------+-------------
