JDBC Schema Inference

JDBC schema inference in Deephaven streamlines the process of importing data from relational databases by automatically generating table schemas from your JDBC data source. This automation saves time and reduces errors, especially when working with new, evolving, or unfamiliar datasets.

When connecting to a JDBC source, Deephaven inspects the metadata provided by the database to infer column names, types, and constraints. The quality and detail of the inferred schema depend on the database's metadata support. Review the generated schema for accuracy and completeness, especially for complex or non-standard database types.

Refer to the JDBC batch data guide for more details on connecting and importing data from JDBC sources; you'll need to provide appropriate driver files. You can specify any driver class string and connection URL. However, for schema discovery to work effectively the driver must provide ResultSetMetaData for the query result set. Some drivers may lack this capability.

By default, the JDBC schema generator requests the data types of the result set columns from the data source. These data types are mapped to Deephaven data types to provide the schema details.

For cases where the data source cannot provide column data type information, or where the source database is not ideally configured to store the data (e.g., numbers or date/times in String fields), the JDBC schema generator provides options to scan data as is done in the CSV schema generator, or to use a combination of data scanning and metadata. Data-only scanning results should be carefully reviewed as they are less reliable than metadata-derived results.

When to use JDBC schema inference

JDBC schema inference is particularly useful in scenarios such as:

  • Integrating new relational databases into Deephaven.
  • Ingesting evolving or unfamiliar JDBC data sources.

It is particularly useful when the database provides reliable metadata for tables and queries.

Input requirements

  • JDBC connection details (URL, driver, credentials) must be available.
  • The database should provide metadata for best results; otherwise, data scanning is used.
  • Ensure you have permission to access the database and can run the target query.

GUI usage

JDBC schema inference can be accessed through the Deephaven web interface. See the Schema Editor documentation for more information on how to do this.

GUI usage

You can use a Core+ Groovy Code Studio to create a schema from a JDBC source.

The following example demonstrates how to use a JdbcSchemaCreator builder to generate a schema.

  • It uses the namespace test and table name TestSchemaCreation.
  • It uses the JDBC driver com.microsoft.sqlserver.jdbc.SQLServerDriver to connect to a Microsoft SQL Server database.
  • The connectionUrl specifies the database server location and needs to be updated to match your environment.
  • The sqlQuery specifies the SQL query to retrieve data for schema inference and needs to be updated to match your environment.
  • Authentication is specified with user and password and need to be updated to match your environment.
  • It generates an import source with the name specified by sourceName.

More information on schemas is available in the schema documentation.

Once you have generated the schema, you can add it to Deephaven in the usual ways.

Builder options

The builder defines the behavior of the schema inference process. For example, replacement(<replacement string>) can be used to replace illegal characters in column names with the specified string. For full documentation, see the API docs for the io.deephaven.importers.JdbcSchemaCreator.Builder class.

Troubleshooting

  • JDBC driver errors: Ensure the correct JDBC driver is available (on the classpath) and referenced in your configuration.
  • Permission issues: Verify your database user account has permission to access table metadata and execute queries.
  • Non-standard or ambiguous types: Review the generated schema for columns with generic or unexpected types, and manually adjust if needed.
  • Connection problems: Double-check the connection URL, credentials, and network access to the database server.