SkillAgentSearch skills...

Jsqltranspiler

Rewrite BigQuery, Redshift, Snowflake and Databricks queries into DuckDB compatible SQL (with deep transformation of functions, data types and format characters) using Java.

Install / Use

/learn @starlake-ai/Jsqltranspiler

README

JSQLTranspiler - Transpile Dialect, Resolve Columns, Show Lineage, Refactor Queries

Sonatype Nexus (Snapshots) JavaDoc Gradle CI Code Quality Coverage License Issues PRs Welcome

A pure Java stand-alone SQL Transpiler, Column- and Lineage Resolver for translating various large RDBMS SQL Dialects into a few smaller RDBMS Dialects for Unit Testing. Based on JSQLParser.

Supports SELECT queries as well as INSERT, UPDATE, DELETE and MERGE statements.

Internal Functions will be rewritten based on the actual meaning and purpose of the function (since the DuckDB Any() function does not necessarily behave like the RDBMS specific Any()). Respecting different function arguments count, order and type.

Rewrite of Window- and Aggregate-Functions with full coverage of the RDBMS specific published samples. The matrix of supported features and functions is shared on Google Sheets.

Dialects

Input: Google BigQuery, Databricks, Snowflake, Amazon Redshift

Output: DuckDB

Transpile Example

Google BigQuery specific SQL

-- BigQuery specific DATE() function
SELECT
  DATE(2016, 12, 25) AS date_ymd,
  DATE(DATETIME '2016-12-25 23:59:59') AS date_dt,
  DATE(TIMESTAMP '2016-12-25 05:30:00+07', 'America/Los_Angeles') AS date_tstz;

/* Output
"date_ymd","date_dt","date_tstz"
"2016-12-15","2016-12-15","2016-12-15"
*/

will become DuckDB compatible SQL

-- DuckDB compliant rewrite producing the same result
SELECT
  MAKE_DATE(2016, 12, 25) AS date_ymd,
  CAST(DATETIME '2016-12-25 23:59:59' AS DATE) AS date_dt,
  CAST(TIMESTAMP '2016-12-25 05:30:00+07' AS DATE) AS date_tstz;

/* Output
"date_ymd","date_dt","date_tstz"
"2016-12-15","2016-12-15","2016-12-15"
*/

Column Lineage Example

For the simplified schema definition and the given query

String[][] schemaDefinition = {
        // Table A with Columns col1, col2, col3, colAA, colAB
        {"a", "col1", "col2", "col3", "colAA", "colAB"},

        // Table B with Columns col1, col2, col3, colBA, colBB
        {"b", "col1", "col2", "col3", "colBA", "colBB"}
};

String sqlStr =
        "SELECT Case when Sum(colBA + colBB)=0 then c.col1 else a.col2 end AS total FROM a INNER JOIN (SELECT * FROM b) c ON a.col1 = c.col1";

JdbcResultSetMetaData resultSetMetaData = new JSQLColumResolver(databaseMetaData).getResultSetMetaData(sqlStr);

the ResultSetMetaData return a list of JdbcColumns, each traversable using the TreeNode interface. The resulting Column Lineage can be illustrated as:

SELECT
 └─total AS CaseExpression: CASE WHEN Sum(colBA + colBB) = 0 THEN c.col1 ELSE a.col2 END
    ├─WhenClause: WHEN Sum(colBA + colBB) = 0 THEN c.col1
    │  ├─EqualsTo: Sum(colBA + colBB) = 0
    │  │  └─Function: Sum(colBA + colBB)
    │  │     └─Addition: colBA + colBB
    │  │        ├─c.colBA → b.colBA : Other
    │  │        └─c.colBB → b.colBB : Other
    │  └─c.col1 → b.col1 : Other
    └─a.col2 : Other

Resolve * Star Operator Example

For the simplified schema definition and the given query with Star Operators

String[][] schemaDefinition = {
    // Table A with Columns col1, col2, col3, colAA, colAB
    {"a", "col1", "col2", "col3", "colAA", "colAB"},

    // Table B with Columns col1, col2, col3, colBA, colBB
    {"b", "col1", "col2", "col3", "colBA", "colBB"}
};

String sqlStr = "SELECT * FROM ( (SELECT * FROM b) c inner join a on c.col1 = a.col1 ) d;";
String resolved =  new JSQLColumResolver(schemaDefinition).getResolvedStatementText(sqlStr);

the query will be resolved and (optionally rewritten into):

SELECT  d.col1                 /* Resolved Column*/
        , d.col2               /* Resolved Column*/
        , d.col3               /* Resolved Column*/
        , d.colBA              /* Resolved Column*/
        , d.colBB              /* Resolved Column*/
        , d.col1_1             /* Resolved Column*/
        , d.col2_1             /* Resolved Column*/
        , d.col3_1             /* Resolved Column*/
        , d.colAA              /* Resolved Column*/
        , d.colAB              /* Resolved Column*/
FROM (  (   SELECT  b.col1     /* Resolved Column*/
                    , b.col2   /* Resolved Column*/
                    , b.col3   /* Resolved Column*/
                    , b.colba  /* Resolved Column*/
                    , b.colbb  /* Resolved Column*/
            FROM b ) c
            INNER JOIN a
                ON c.col1 = a.col1 ) d
;

Alternatively, the information about returned columns can be fetched as JDBC ResultsetMetaData (without actually executing this query):

import java.sql.DatabaseMetaData;

String sqlStr = "SELECT * FROM (  (  SELECT * FROM sales ) c INNER JOIN listing a ON c.listid = a.listid ) d;";
// the meta data of catalgogs, schemas, tables, columns, either virtually and physically
DatabaseMetaData databaseMetaData = ...;
ResultSetMetaData resultSetMetaData = new JSQLColumResolver(databaseMetaData).getResultSetMetaData(sqlStr);
System.out.println(resultSetMetaData.toString());

/*
"#","label","name","table","schema","catalog","type","type name","precision","scale","display size"
"1","salesid","salesid","d",,"JSQLTranspilerTest","INTEGER","INTEGER","0","32","0"
"2","listid","listid","d",,"JSQLTranspilerTest","INTEGER","INTEGER","0","32","0"
... (shortened) ...
"17","totalprice","totalprice","d",,"JSQLTranspilerTest","DECIMAL","DECIMAL(8,2)","0","8","0"
"18","listtime","listtime","d",,"JSQLTranspilerTest","TIMESTAMP","TIMESTAMP","0","0","0"
 */

PipedSQL Example

Piped SQL is a much saner and more logical way to write queries in its semantic order.

FROM Produce
|> WHERE
    item != 'bananas'
    AND category IN ('fruit', 'nut')
|> AGGREGATE COUNT(*) AS num_items, SUM(sales) AS total_sales
   GROUP BY item
|> ORDER BY item DESC;

For details, please see https://storage.googleapis.com/gweb-research2023-media/pubtools/1004848.pdf, https://cloud.google.com/bigquery/docs/reference/standard-sql/pipe-syntax and https://duckdb.org/docs/sql/query_syntax/from.html#from-first-syntax

JSQLTranspiler can rewrite PipedSQL into regular SQL, which can be executed on any normal RDBMS.

String sql =
            "(\n" +
            "  SELECT '000123' AS id, 'apples' AS item, 2 AS sales\n" +
            "  UNION ALL\n" +
            "  SELECT '000456' AS id, 'bananas' AS item, 5 AS sales\n" +
            ") AS sales_table\n" +
            "|> AGGREGATE SUM(sales) AS total_sales GROUP BY id, item\n" +
            "|> AS t1\n" +
            "|> JOIN (SELECT 456 AS id, 'yellow' AS color) AS t2\n" +
            "   ON CAST(t1.id AS INT64) = t2.id\n" +
            "|> SELECT t2.id, total_sales, color;";

    try (Statement st = connDuck.createStatement();
         ResultSet rs = st.executeQuery( JSQLTranspiler.transpileQuery(sql, JSQLTranspiler.Dialect.ANY) );
    ) {
      ResultSetMetaData resultSetMetaData = rs.getMetaData();
      Assertions.assertEquals(3, resultSetMetaData.getColumnCount());
      Assertions.assertEquals( "id", resultSetMetaData.getColumnLabel(1));
      Assertions.assertEquals( "total_sales", resultSetMetaData.getColumnLabel(2));
      Assertions.assertEquals( "color", resultSetMetaData.getColumnLabel(3));

      Assertions.assertTrue( rs.next() );
      Assertions.assertEquals(456, rs.getInt(1) );
      Assertions.assertEquals(5, rs.getInt(2) );
      Assertions.assertEquals("yellow", rs.getString(3) );
    }

SQL Refactoring example

JSQLTranspiler can refactor statements by replacing table names. The following example swaps the tablenames a and b in a query based on the physical tables of the given metadata:

-- Input:
SELECT a.*
FROM (  SELECT  a.col3
                , Sum( a.col2 )
        FROM a inner join b on a.col1=b.col1
        WHERE a.col1 = b.col1
        GROUP BY a.col3
        HAVING Sum( a.col2 ) > 0 ) AS a
;
JSQLReplacer replacer = new JSQLReplacer({{"a", "col1", "col2", "col3"}, {"b", "col1", "col2", "col3"}});
replacer.replace(sqlStr, Map.of("a", "b", "b", "a"));
-- Output:
SELECT  a.col3
        , a.sum
FROM (  SELECT  b.col3
                , Sum( b.col2 )
        FROM b
            INNER JOIN a
                ON b.col1 = a.col1
        WHERE b.col1 = a.col1
        GROUP BY b.col3
        HAVING Sum( b.col2 ) > 0 ) AS a;
;

How to use

Java Library

Maven Artifact with Snapshot support:

<repositories>
    <repository>
        <id>jsqltranspiler-snapshots</id>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
        <url>https://s01.oss.sonatype.org/content/reposit

Related Skills

View on GitHub
GitHub Stars72
CategoryData
Updated12h ago
Forks8

Languages

Java

Security Score

85/100

Audited on Mar 31, 2026

No findings