<img src="https://raw.githubusercontent.com/cube2222/octosql/main/images/logo.png" width="168">OctoSQL

OctoSQL is predominantly a CLI tool which lets you query a plethora of databases and file formats using SQL through a unified interface, even do JOINs between them. (Ever needed to join a JSON file with a PostgreSQL table? OctoSQL can help you with that.)

At the same time it's an easily extensible full-blown dataflow engine, and you can use it to add a SQL interface to your own applications.

Demo

Usage

octosql "SELECT * FROM ./myfile.json"
octosql "SELECT * FROM ./myfile.json" --describe  # Show the schema of the file.
octosql "SELECT invoices.id, address, amount
         FROM invoices.csv JOIN db.customers ON invoices.customer_id = customers.id
         ORDER BY amount DESC"
octosql "SELECT customer_id, SUM(amount)
         FROM invoices.csv
         GROUP BY customer_id"

OctoSQL supports a bunch of file formats out of the box, but you can additionally install plugins to add support for other databases.

octosql "SELECT * FROM plugins.available_plugins"
octosql plugin install postgres
echo "databases:
  - name: mydb
    type: postgres
    config:
      host: localhost
      port: 5443
      database: mydb
      user: postgres
      password: postgres" > octosql.yml
octosql "SELECT * FROM mydb.users" --describe
octosql "SELECT * FROM mydb.users"

You can specify the output format using the --output flag. Available values for it are live_table, batch_table, csv and stream_native.

The documentation about available aggregates and functions is contained within OctoSQL itself. It's in the aggregates, aggregate_signatures, functions and function_signatures tables in the docs database.

octosql "SELECT * FROM docs.functions fs"
+------------------+----------------------------------------+
|     fs.name      |             fs.description             |
+------------------+----------------------------------------+
| 'abs'            | 'Returns absolute value                |
|                  | of argument.'                          |
| 'ceil'           | 'Returns ceiling of                    |
|                  | argument.'                             |
| ...              | ...                                    |
+------------------+----------------------------------------+

Installation

Homebrew

You can install OctoSQL using Homebrew on MacOS or Linux:

brew install cube2222/octosql/octosql

After running it for the first time on MacOS you'll have to go into Preferences -> Security and Privacy -> Allow OctoSQL, as with any app that's not notarized.

Pre-Compiled binary

You can also download the binary for your operating system directly from the Releases page.

Nix Package

The package can be installed in the local nix-profile.

nix-env -iA nixpkgs.octosql

For adhoc or testing purposes a shell with the package can be spawned.

nix-shell -p octosql

For NixOS users it is highly recommended to install the package by adding it to the list of systemPackages.

environment.systemPackages = with pkgs; [
  octosql
  # ...
];

Building from source

With Go in version >= 1.18 the application can be built from source. This can be achieved by cloning the repository and running go install from the project directory.

git clone https://github.com/cube2222/octosql
cd octosql
go install

File Access

Support for multiple file types is included by default in OctoSQL:

JSON (in JSONLines format, one object per line)
CSV
TSV
Parquet
Lines (reading a file line by line)

If your file has a matching extension, you can use its path directly as a table:

~> octosql "SELECT * FROM my/file/path.json"

or, if the extension is not right, you can use this alternative notation, where the extension is used in place of the database name:

~> octosql "SELECT * FROM `json.my/file/path.whatever`"

You can also specify additional options using the following notation: myfile.ext?key=value&key2=value2

The following options are available:

CSV
- header: true/false (default: true) - Whether the file has a header row.
JSON
- tail: true/false (default: false) - Whether to keep waiting for new content after reaching the end of the file.
Lines
- tail: true/false (default: false) - Whether to keep waiting for new content after reaching the end of the file.

Reading from Standard Input

You can also pipe data in through stdin, and OctoSQL will expose it as the stdin.<file_type> table. For example:

~> echo '{"hello": "world"}' | octosql "SELECT * FROM stdin.json"
+---------+
|  hello  |
+---------+
| 'world' |
+---------+
~> seq 100 | octosql "SELECT SUM(int(text)) FROM stdin.lines"
+------+
| sum  |
+------+
| 5050 |
+------+

Plugins

To use databases which are not included in the core of OctoSQL - like PostgreSQL or MySQL - you need to install a plugin. Installing plugins is very easy. The following command installs the latest version of the PostgreSQL plugin:

octosql plugin install postgres

Plugins are grouped into repositories, and potentially have many versions available. The above uses the default core repository and tries to install the latest version. So if 0.42.0 was the latest version, the above would be equivalent to:

octosql plugin install core/postgres@0.42.0

Browsing available and installed plugins is possible through OctoSQL itself, behind a SQL interface. The available tables are: plugins.repositories, plugins.available_plugins, plugins.available_versions, plugins.installed_plugins, plugins.installed_versions.

~> octosql "SELECT name, description FROM plugins.available_plugins LIMIT 2"
+------------------------+-------------------------------+
| available_plugins.name | available_plugins.description |
+------------------------+-------------------------------+
| 'postgres'             | 'Adds support for             |
|                        | querying PostgreSQL           |
|                        | databases.'                   |
| 'random_data'          | 'Generates random data        |
|                        | for testing.'                 |
+------------------------+-------------------------------+
~> octosql "SELECT plugin_name, version FROM plugins.available_versions WHERE plugin_name='random_data'"
+--------------------------------+----------------------------+
| available_versions.plugin_name | available_versions.version |
+--------------------------------+----------------------------+
| 'random_data'                  | '0.1.0'                    |
| 'random_data'                  | '0.1.1'                    |
| 'random_data'                  | '0.2.0'                    |
+--------------------------------+----------------------------+

Some plugins, like the random_data plugin, can be used without any additional configuration:

~> octosql plugin install random_data
Downloading core/random_data@0.2.0...
~> octosql "SELECT * FROM random_data.users" --describe
+---------------------------------+--------------------------+------------+
|              name               |           type           | time_field |
+---------------------------------+--------------------------+------------+
| 'users.avatar'                  | 'String'                 | false      |
| 'users.credit_card'             | '{cc_number: String}'    | false      |
| 'users.date_of_birth'           | 'String'                 | false      |
| 'users.email'                   | 'String'                 | false      |
| 'users.first_name'              | 'String'                 | false      |
| 'users.last_name'               | 'String'                 | false      |
| ...                             | ...                      | ...        |
+---------------------------------+--------------------------+------------+
~> octosql "SELECT first_name, last_name, date_of_birth FROM random_data.users LIMIT 3"
+------------------+-----------------+---------------------+
| users.first_name | users.last_name | users.date_of_birth |
+------------------+-----------------+---------------------+
| 'Alethea'        | 'Kuvalis'       | '1997-01-07'        |
| 'Ambrose'        | 'Spencer'       | '1979-04-18'        |
| 'Antione'        | 'Hodkiewicz'    | '1980-03-04'        |
+------------------+-----------------+---------------------+

Others, like the postgres plugin, require additional configuration. The configuration file is located at ~/.octosql/octosql.yml. You can find the available configuration settings for a plugin in its own documentation.

~> octosql plugin install postgres
Downloading core/postgres@0.1.0...
echo "databases:
  - name: mydb
    type: postgres
    config:
      host: localhost
      port: 5432
      database: postgres
      user: postgres
      password: mypassword" > ~/.octosql/octosql.yml
~> octosql "SELECT * FROM mydb.customers" --describe
+--------------------------+-------------

Octosql

Install / Use

README