PyAthenaJDBC
PyAthenaJDBC is an Amazon Athena JDBC driver wrapper for the Python DB API 2.0 (PEP 249).
Install / Use
/learn @pyathena-dev/PyAthenaJDBCREADME
======================================================
This library is unmaintained, use PyAthena_ instead.
.. _PyAthena: https://github.com/laughingman7743/PyAthena
.. image:: https://img.shields.io/pypi/pyversions/PyAthenaJDBC.svg :target: https://pypi.org/project/PyAthenaJDBC/
.. image:: https://github.com/laughingman7743/PyAthenaJDBC/workflows/test/badge.svg :target: https://github.com/laughingman7743/PyAthenaJDBC/actions
.. image:: https://img.shields.io/pypi/l/PyAthenaJDBC.svg :target: https://github.com/laughingman7743/PyAthenaJDBC/blob/master/LICENSE
.. image:: https://pepy.tech/badge/pyathenajdbc/month :target: https://pepy.tech/project/pyathenajdbc
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg :target: https://github.com/psf/black
PyAthenaJDBC
PyAthenaJDBC is an Amazon Athena JDBC driver_ wrapper for the Python DB API 2.0 (PEP 249)_.
.. _DB API 2.0 (PEP 249): https://www.python.org/dev/peps/pep-0249/
.. _Amazon Athena JDBC driver: https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html
Requirements
-
Python
- CPython 3.6, 3.7, 3.8, 3.9
-
Java
- Java >= 8 (JDBC 4.2)
JDBC driver compatibility
+---------------+---------------------+-------------------------------------------------------------------------------+ | Version | JDBC driver version | Vendor | +===============+=====================+===============================================================================+ | < 2.0.0 | == 1.1.0 | AWS (Early released JDBC driver. It is incompatible with Simba's JDBC driver) | +---------------+---------------------+-------------------------------------------------------------------------------+ | >= 2.0.0 | >= 2.0.5 | Simba | +---------------+---------------------+-------------------------------------------------------------------------------+
Installation
.. code:: bash
$ pip install PyAthenaJDBC
Extra packages:
+---------------+------------------------------------------+-----------------+
| Package | Install command | Version |
+===============+==========================================+=================+
| Pandas | pip install PyAthenaJDBC[Pandas] | >=1.0.0 |
+---------------+------------------------------------------+-----------------+
| SQLAlchemy | pip install PyAthenaJDBC[SQLAlchemy] | >=1.0.0, <2.0.0 |
+---------------+------------------------------------------+-----------------+
Usage
Basic usage
.. code:: python
from pyathenajdbc import connect
conn = connect(S3OutputLocation="s3://YOUR_S3_BUCKET/path/to/",
AwsRegion="us-west-2")
try:
with conn.cursor() as cursor:
cursor.execute("""
SELECT * FROM one_row
""")
print(cursor.description)
print(cursor.fetchall())
finally:
conn.close()
Cursor iteration
.. code:: python
from pyathenajdbc import connect
conn = connect(S3OutputLocation="s3://YOUR_S3_BUCKET/path/to/",
AwsRegion="us-west-2")
try:
with conn.cursor() as cursor:
cursor.execute("""
SELECT * FROM many_rows LIMIT 10
""")
for row in cursor:
print(row)
finally:
conn.close()
Query with parameter
Supported `DB API paramstyle`_ is only ``PyFormat``.
``PyFormat`` only supports `named placeholders`_ with old ``%`` operator style and parameters specify dictionary format.
.. code:: python
from pyathenajdbc import connect
conn = connect(S3OutputLocation="s3://YOUR_S3_BUCKET/path/to/",
AwsRegion="us-west-2")
try:
with conn.cursor() as cursor:
cursor.execute("""
SELECT col_string FROM one_row_complex
WHERE col_string = %(param)s
""", {"param": "a string"})
print(cursor.fetchall())
finally:
conn.close()
if ``%`` character is contained in your query, it must be escaped with ``%%`` like the following:
.. code:: sql
SELECT col_string FROM one_row_complex
WHERE col_string = %(param)s OR col_string LIKE 'a%%'
.. _`DB API paramstyle`: https://www.python.org/dev/peps/pep-0249/#paramstyle
.. _`named placeholders`: https://pyformat.info/#named_placeholders
JVM options
~~~~~~~~~~~
In the connect method or connection object, you can specify JVM options with a string array.
You can increase the JVM heap size like the following:
.. code:: python
from pyathenajdbc import connect
conn = connect(S3OutputLocation="s3://YOUR_S3_BUCKET/path/to/",
AwsRegion="us-west-2",
jvm_options=["-Xms1024m", "-Xmx4096m"])
try:
with conn.cursor() as cursor:
cursor.execute("""
SELECT * FROM many_rows
""")
print(cursor.fetchall())
finally:
conn.close()
JDBC 4.1
~~~~~~~~
If you want to use JDBC 4.1, download the corresponding JDBC driver
and specify the path of the downloaded JDBC driver as the argument ``driver_path`` of the connect method or connection object.
* The `AthenaJDBC41-2.0.7.jar`_ is compatible with JDBC 4.1 and requires JDK 7.0 or later.
.. _`AthenaJDBC41-2.0.7.jar`: https://s3.amazonaws.com/athena-downloads/drivers/JDBC/SimbaAthenaJDBC_2.0.7/AthenaJDBC41_2.0.7.jar
.. code:: python
from pyathenajdbc import connect
conn = connect(S3OutputLocation="s3://YOUR_S3_BUCKET/path/to/",
AwsRegion="us-west-2",
driver_path="/path/to/AthenaJDBC41_2.0.7.jar")
JDBC driver configuration options
The connect method or connection object pass keyword arguments as options to the JDBC driver. If you want to change the behavior of the JDBC driver, specify the option as a keyword argument in the connect method or connection object.
.. code:: python
from pyathenajdbc import connect
conn = connect(S3OutputLocation="s3://YOUR_S3_BUCKET/path/to/",
AwsRegion="us-west-2",
LogPath="/path/to/pyathenajdbc/log/",
LogLevel="6")
For details of the JDBC driver options refer to the official documentation.
JDBC Driver Installation and Configuration Guide_.
.. _JDBC Driver Installation and Configuration Guide: https://s3.amazonaws.com/athena-downloads/drivers/JDBC/SimbaAthenaJDBC_2.0.7/docs/Simba+Athena+JDBC+Driver+Install+and+Configuration+Guide.pdf
NOTE: Option names and values are case-sensitive. The option value is specified as a character string.
SQLAlchemy
Install SQLAlchemy with ``pip install SQLAlchemy>=1.0.0`` or ``pip install PyAthenaJDBC[SQLAlchemy]``.
Supported SQLAlchemy is 1.0.0 or higher and less than 2.0.0.
.. code:: python
import contextlib
from urllib.parse import quote_plus
from sqlalchemy.engine import create_engine
from sqlalchemy.sql.expression import select
from sqlalchemy.sql.functions import func
from sqlalchemy.sql.schema import Table, MetaData
conn_str = "awsathena+jdbc://{User}:{Password}@athena.{AwsRegion}.amazonaws.com:443/"\
"{Schema}?S3OutputLocation={S3OutputLocation}"
engine = create_engine(conn_str.format(
User=quote_plus("YOUR_ACCESS_KEY"),
Password=quote_plus("YOUR_SECRET_ACCESS_KEY"),
AwsRegion="us-west-2",
Schema="default",
S3OutputLocation=quote_plus("s3://YOUR_S3_BUCKET/path/to/")))
try:
with contextlib.closing(engine.connect()) as conn:
many_rows = Table("many_rows", MetaData(bind=engine), autoload=True)
print(select([func.count("*")], from_obj=many_rows).scalar())
finally:
engine.dispose()
The connection string has the following format:
.. code:: text
awsathena+jdbc://{User}:{Password}@athena.{AwsRegion}.amazonaws.com:443/{Schema}?S3OutputLocation={S3OutputLocation}&driver_path={driver_path}&...
If you do not specify ``User`` (i.e. AWSAccessKeyID) and ``Password`` (i.e. AWSSecretAccessKey) using instance profile credentials or credential profiles file:
.. code:: text
awsathena+jdbc://:@athena.{Region}.amazonaws.com:443/{Schema}?S3OutputLocation={S3OutputLocation}&driver_path={driver_path}&...
NOTE: ``S3OutputLocation`` requires quote. If ``User``, ``Password`` and other parameter contain special characters, quote is also required.
Pandas
~~~~~~
As DataFrame
^^^^^^^^^^^^
You can use the `pandas.read_sql`_ to handle the query results as a `DataFrame object`_.
.. code:: python
from pyathenajdbc import connect
import pandas as pd
conn = connect(User="YOUR_ACCESS_KEY_ID",
Password="YOUR_SECRET_ACCESS_KEY",
S3OutputLocation="s3://YOUR_S3_BUCKET/path/to/",
AwsRegion="us-west-2",
jvm_path="/path/to/jvm")
df = pd.read_sql("SELECT * FROM many_rows LIMIT 10", conn)
The ``pyathena.util`` package also has helper methods.
.. code:: python
import contextlib
from pyathenajdbc import connect
from pyathenajdbc.util import as_pandas
with contextlib.closing(
connect(S3OutputLocation="s3://YOUR_S3_BUCKET/path/to/"
AwsRegion="us-west-2"))) as conn:
with conn.cursor() as cursor:
cursor.execute("""
SELECT * FROM many_rows
""")
df = as_pandas(cursor)
print(df.describe())
.. _`pandas.read_sql`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html
.. _`DataFrame object`: https://pandas.pydata.org/pandas-
Related Skills
tmux
339.5kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
claude-opus-4-5-migration
83.9kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
model-usage
339.5kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
blogwatcher
339.5kMonitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.
