Execute SQL queries in Deephaven
Structured Query Language (SQL) is the most popular programming language for database management and access. Its popularity can be attributed to a number of factors, including its simplicity, readability, and interoperability. Open-source options like MySQL and PostgreSQL have also played an important role in its popularity and advancement. SQL can be utilized from Deephaven with the deephaven.dbc
Python submodule.
This guide will show you how to connect to a SQL database, execute a query on it, and read the result directly into a table.
deephaven.dbc
Deephaven's Python submodule, deephaven.dbc
, contains all of Deephaven's functionality for connecting to and using external databases from Deephaven. The submodule contains three pertinent items for using SQL. They are:
read_sql
- Executes a provided SQL query via a supported driver.
adbc.read_cursor
- Supports ingesting data from external databases via the Apache Arrow Database Connectivity (ADBC) interfaces.
odbc.read_cursor
- Supports ingesting data from external relational databases via the Open Database Connectivity (ODBC) interfaces by using the turbodbc module.
Supported drivers
read_sql
can use one of three supported drivers. Each driver can be used by installing one of the following packages.
connectorx
ODBC
ADBC
- adbc-driver-manager
- adbc-driver-flightsql for Arrow Flight SQL database drivers
- adbc-driver-postgresql for ADBC PostgreSQL database drivers
- adbc-driver-sqlite for ADBC SQLite database drivers
Execute a query
With the necessary package(s) installed, a SQL query can create a table via any of the three provided Deephaven methods. Each example below pulls from a Postgres DB.
It's best practice not to store credentials in plaintext. In the following examples, the username, password, URL, and port of the Postgres DB are set via environment variables.
read_sql
from deephaven.dbc import read_sql
import os
my_query = "SELECT t_ts as Timestamp, CAST(t_id AS text) as Id, " +
"CAST(t_instrument as text) as Instrument, " +
"t_exchange as Exchange, t_price as Price, t_size as Size " +
"FROM CRYPTO TRADES"
username = os.environ["POSTGRES_USERNAME"]
password = os.environ["POSTGRES_PASSWORD"]
url = os.environ["POSTGRES_URL"]
port = os.environ["POSTGRES_PORT"]
sql_uri = f"postgresql://{url}:{port}/postgres?user={username}&password={password}"
crypto_trades = read_sql(conn=sql_uri, query=my_query, driver="connectorx")
odbc.read_cursor
from deephaven.dbc import odbc as dhodbc
import turbodbc, os
username = os.environ["POSTGRES_USERNAME"]
password = os.environ["POSTGRES_PASSWORD"]
url = os.environ["POSTGRES_URL"]
port = os.environ["POSTGRES_PORT"]
sql_query = "SELECT t_ts as Timestamp, CAST(t_id AS text) as Id, " +
"CAST(t_instrument as text) as Instrument, " +
"t_exchange as Exchange, t_price as Price, t_size as Size " +
"FROM CRYPTO TRADES"
uri = f"postgresql://{url}:{port}/postgres?user={username}&password={password}"
with turbodbc.connect(connection_string=uri) as conn:
with conn.cursor() as cursor:
cursor.execute(sql_query)
crypto_trades = dhodbc.read_cursor(cursor)
adbc.read_cursor
from deephaven.dbc import adbc as dhadbc
from adbc_driver_postgresql import dbapi
import os
username = os.environ["POSTGRES_USERNAME"]
password = os.environ["POSTGRES_PASSWORD"]
url = os.environ["POSTGRES_URL"]
port = os.environ["POSTGRES_PORT"]
sql_query = "SELECT t_ts as Timestamp, CAST(t_id AS text) as Id, " +
"CAST(t_instrument as text) as Instrument, " +
"t_exchange as Exchange, t_price as Price, t_size as Size " +
"FROM CRYPTO TRADES"
uri = f"postgresql://{url}:{port}/postgres?user={username}&password={password}"
with dbapi.connect(uri) as conn:
with conn.cursor() as cursor:
cursor.execute(sql_query)
crypto_trades = dhadbc.read_cursor(cursor)
Choose the right method
With three different methods that can achieve the same results, which is the best choice? Take into account the following factors when choosing.
High-level vs low-level
read_sql
is a high-level method for executing a query against a SQL database. It is a general purpose method with less flexibility than the other two.
adbc.read_cursor
and odbc.read_cursor
are lower-level methods because they require the use of a cursor to execute queries. If a use case requires more flexibility and options when executing queries, one of these should be used.
ADBC vs ODBC
The two lower-level options can both achieve the same results. When choosing one or the other, consider:
ODBC
is a well-known and universal specification that can work with a wide variety of SQL database types. If flexibility is a more important factor, this is the best method to choose.ADBC
is faster when working with a SQL database compatible with Apache Arrow (e.g., Arrow Flight SQL), so it should be used where this applies. It's not as compatible with as many database types, so some flexibility is lost.