Elastic announced the great news that they are working on SQL support at Elastic{ON} 2017! You can find the talk on this topic on the Elastic Website

sql4es: JDBC driver for Elasticsearch

Sql-for-Elasticsearch (sql4es) is a jdbc 4.1 driver for Elasticsearch 6.3.2 implementing the majority of the JDBC interfaces: Connection, Statement, PreparedStatment, ResultSet, Batch and DataBase- / ResultSetMetadata. The screenshot below shows SQLWorkbenchJ with a selection of SQL statements that can be executed using the driver.

SQLWorkbenchJ screenshot with examples

Usage

The sql4es driver can be used by adding the jar file, found within the releases directory of the project, to the tool/application used and load the driver with name 'nl.anchormen.sql4es.jdbc.ESDriver'. The driver expects an URL with the following format: jdbc:sql4es://host:port/index?params.

host: the hostname or ip of one of the es hosts (required)
port: an optional the port number to use for the transport client (default is 9300)
index: the optional index to set active within the driver. Most statements like SELECT, DELETE and INSERT require an active index (also see USE [index/alias] statement below). It is however possible to create new indices, types and aliases without an active index.
params: an optional set of parameters used to influence the internals of the driver (specify additional hosts, maximum number of documents to fetch in a single request etc). If your clustername is not 'elasticsearch' you should specify the clustername witin the url (see example below). Please see the Configuration section of this readme for a description of all driver specific parameters.

// register the driver and get a connection for index 'myidx'
Class.forName("nl.anchormen.sql4es.jdbc.ESDriver");
Connection con = DriverManager.getConnection("jdbc:sql4es://localhost:9300/myidx?cluster.name=your-cluster-name");
Statement st = con.createStatement();
// execute a query on mytype within myidx
ResultSet rs = st.executeQuery("SELECT * FROM mytype WHERE something >= 42");
ResultSetMetaData rsmd = rs.getMetaData();
int nrCols = rsmd.getColumnCount();
// get other column information like type
while(rs.next()){
	for(int i=1; i<=nrCols; i++){
  		System.out.println(rs.getObject(i));
	}
}
rs.close();
con.close();

The driver can also be used from applications able to load the jdbc driver. It has been tested with sqlWorkbench/J and Squirrel on an Elasticsearch 6.3.2 cluster. A description on how to use sql4es with sqlWorkbenchJ along with a number of example queries can be found at the bottom of this readme.

Connection con = DriverManager.getConnection("jdbc:sql4es://f03c93be1efeb9be9b2f46b660d10d90.eu-west-1.aws.found.io:9343/indexname?shield.user=username:password&cluster.name=f03c93be1efeb9be9b2f46b660d10d90&ssl");

Supported SQL

Simply said the sql4es driver translates SQL statements to their Elasticsearch counterparts and parses results into ResultSet implementations. The following sql statements are supported:

SELECT: fetches documents (with or without scoring) or aggregations from elasticsearch
- COUNT (DISTINCT ...), MIN, MAX, SUM, AVG
- DISTINCT
- WHERE (=, >, >=, <, <=, <>, IN, LIKE, AND, OR, IS NULL, IS NOT NULL, NOT [condition])
- GROUP BY
- HAVING
- ORDER BY
- LIMIT (without offset, offsets are not supported by sql4es)
CREATE TABLE (AS) creates an index/type and optionally indexes the result of a query into it
CREATE VIEW (AS): creates an alias, optionally with a filter
DROP TABLE/VIEW removes an index or alias
INSERT INTO (VALUES | SELECT): inserts documents into an index/type; either provided values or results of a query. Possible to UPDATE documents using INSERT by specifying existing document _id's
UPDATE: executed as an elasticsearch Upsert
DELETE FROM (WHERE): removes documents
USE: selects an index as the driver's active one (used to interpret queries)
EXPLAIN SELECT: returns the Elasticsearch query performed for a SELECT statement
Table aliases like SELECT … FROM table1 as T1, table2 t2...
- Table aliases are parsed but not used during query execution

Remarks

Elasticsearch does not support transactions. Hence executing batches cannot be rolled back upon failure (nor can statements be committed). It also takes some time for documents to be indexed fully so executing an INSERT directly followed by a SELECT might not include the inserted documents.

Some SQL statements or Elasticsearch features that are not (yet) supported:

~~UPDATE is not supported, although it is possible to update documents by inserting values for an existing _id~~
- added in 0.7.2.1: it is now possible to executes updates like UPDATE index.type SET myInt=100 WHERE myString = 'hundred'
~~Not possible to INSERT nestested objects~~
- added in 0.7.2.1 using double quotes: INSERT INTO mytype ("myObject.nestedDoc.myInt") VALUES (1)
Not possible to specify offsets (OFFSET offset or LIMIT offset, number)
~~Fields with type 'nested' are not supported because this type requires different methods to query and retrieve data.~~
- added in 0.6.2.1: Nested types are detected by the driver and queries on those fields are executed accordingly
Parent child relationships are not supported. It is currently not possible to index or retrieve fields of this type.
Elasticsearch features like ~~full text search, highlighting,~~ suggestions and templates are not supported.
- added in 0.6.2.1: full text search can be done using _search = '…' and highlighting trough SELECT highlight(some-field) FROM …
~~Count (Distinct …)~~
- Added in 0.9.2.4. It is possible to set the Elasticsearch precision threshold (see cardinality aggregations) by using the precision.threshold parameter in the connection url.

Concepts

Since elasticsearch is a NO-SQL database it does not contain the exact relational objects most people are familiar with (like databases, tables and records). Elasticsearch does however have a similar hierarchy of objects (index, type and document). The conceptual mapping used by sql4es is the following:

Database = Index
Table = Type
Record = document
Column = Field
View = Alias (this does not fit from a hierarchical perspective but does match given the purpose of views / aliases)

Elasticsearch responses, both search results and aggregations, are put into a ResultSet implementation. Any nested objects are 'exploded' into a lateral view by default; this means that nested objects are treated as joined tables which are put inside the he same row (see this page for explanation). It is possible to represent nested objects as a nested ResultSet, see the Configuration section. Note, that although objects are exploded, arrays with primitives are not! They are put in a java.sql.Array implementation supported by JDBC.

Sql4es works from an active index/alias which means that it resolves references to types from this index. If for example myIndex is currently active the query SELECT * FROM sometype will only return any results if sometype is part of myindex. Executing a SELECT on a type that does not exist within an index will return an empty result. It is possible to change the active index by executing USE [otherIndex] as described below.

QUERIES

This section describes how SQL is interpreted and converted into SE statements. The presto parser is used to parse SQL statements, please see the syntax definition on the presto website.

SELECT

/* basic syntax */
SELECT [field (AS alias)] FROM [types] WHERE [condition] GROUP BY [fields] HAVING [condition] ORDER BY [field (ASC|DESC)] LIMIT [number]

fields (AS alias): defines the fields to be retrieved from elasticsearch and put in the ResultSet. It is possible to use * to indicate all fields should be retrieved (including _id and _type). Fields can be addressed by their name, nested fields can be addressed using their hierarchical names in dotted notation like: nesteddoc.otherdoc.field. Using a star will simply fetch all fields, also nested ones, present in a document. It is possible to specify the root of an object in order to fetch all its fields. A query like SELECT nesteddoc FROM type will fetch all fields present in nesteddoc. As a result it might return hundreds of columns if nesteddoc has hundreds of fields.
types: the types to execute the query against. This can only be types present in the index or alias that is currently active (also see 'use' statement).
condition: standard SQL condition using =, >, >=, <, <=, <>, IN and LIKE operators. Sql4es does not support the NOT operator but '<>' can be used instead. Use AND and OR to combine conditions.
limit: only works on non aggregating queries. The use of offset is not supported!

/* the following wil explode any nested objects into a lateral view */
SELECT * from mytype

SELECT _id as id, myInt, myString FROM mytype WHERE myInt >= 3 OR (myString IN ('hello','hi','bye') AND myInt <= 3)

/* If nestedDoc contains 2 fields the result will be exploded to [myInt,nestedDoc.field1, nestedDoc.field2] */
SELECT myInt, nestedDoc FROM mytype WHERE myInt > 3 AND myString <> 'bye'

/* If the array contains 3 objects the resultset will contain 3 rows, despite the LIMIT used! */
SELECT array_of_nested_objects FROM mytype LIMIT 1

Tables/Types

Only types part of the active index or alias can be addressed in the FROM clause. An alias must be created if types from different indices must be acce

Sql4es

Install / Use

README