deephaven.stream.kafka.cdc#

This module provides Change Data Capture(CDC) support for streaming relational database changes into Deephaven tables.

class CDCSpec(j_spec)[source]#

Bases: JObjectWrapper

A specification for how to consume a CDC Kafka stream.

j_object_type#

alias of CdcTools$CdcSpec

cdc_long_spec(topic, key_schema_name, key_schema_version, value_schema_name, value_schema_version)[source]#

Creates a CDCSpec with all the required configuration options.

Parameters:
  • topic (str) – the Kafka topic for the CDC events associated to the desired table data.

  • key_schema_name (str) – the schema name for the Key Kafka field in the CDC events for the topic. This schema should include definitions for the columns forming the PRIMARY KEY of the underlying table. This schema name will be looked up in a schema server.

  • key_schema_version (str) – the version for the Key schema to look up in schema server. None or “latest” implies using the latest version when Key is not ignored.

  • value_schema_name (str) – the schema name for the Value Kafka field in the CDC events for the topic. This schema should include definitions for all the columns of the underlying table. This schema name will be looked up in a schema server.

  • value_schema_version (str) – the version for the Value schema to look up in schema server. None or “latest” implies using the latest version.

Return type:

CDCSpec

Returns:

a CDCSpec

Raises:

DHError

cdc_short_spec(server_name, db_name, table_name)[source]#

Creates a CDCSpec in the debezium style from the provided server name, database name and table name.

The topic name, and key and value schema names are implied by convention:
  • Topic is the concatenation of the arguments using “.” as separator.

  • Key schema name is topic with a “-key” suffix added.

  • Value schema name is topic with a “-value” suffix added.

Parameters:
  • server_name (str) – the server_name configuration value used when the CDC Stream was created

  • db_name (str) – the database name configuration value used when the CDC Stream was created

  • table_name (str) – the table name configuration value used when the CDC Stream was created

Returns:

a CDCSpec

Raises:

DHError

consume(kafka_config, cdc_spec, partitions=None, stream_table=False, cols_to_drop=None)[source]#

Consume from a Change Data Capture (CDC) Kafka stream (as, eg, produced by Debezium), tracking the underlying database table to a Deephaven table.

Parameters:
  • kafka_config (Dict) – configuration for the associated kafka consumer and also the resulting table. Passed to the org.apache.kafka.clients.consumer.KafkaConsumer constructor; pass any KafkaConsumer specific desired configuration here. Note this should include the relevant property for a schema server URL where the key and/or value Avro necessary schemas are stored.

  • cdc_spec (CDCSpec) – a CDCSpec obtained from calling either the cdc_long_spec or the cdc_short_spec function

  • partitions (List[int]) – a list of integer partition numbers, default is None indicating all partitions

  • stream_table (bool) – if true, produce a streaming table of changed rows keeping the CDC ‘op’ column indicating the type of column change; if false, return a Deephaven ticking table that tracks the underlying database table through the CDC Stream.

  • cols_to_drop (List[str]) – a list of column names to omit from the resulting DHC table. Note that only columns not included in the primary key for the table can be dropped at this stage; you can chain a drop column operation after this call if you need to do this.

Return type:

Table

Returns:

a Deephaven live table that will update based on the CDC messages consumed for the given topic

Raises:

DHError

consume_raw(kafka_config, cdc_spec, partitions=None, table_type=deephaven.stream.kafka.consumer.TableType(io.deephaven.kafka.KafkaTools$TableType$Blink(objectRef=0x560201eb5e8a)))[source]#

Consume the raw events from a Change Data Capture (CDC) Kafka stream to a Deephaven table.

Parameters:
  • kafka_config (Dict) – configuration for the associated kafka consumer and also the resulting table. Passed to the org.apache.kafka.clients.consumer.KafkaConsumer constructor; pass any KafkaConsumer specific desired configuration here. Note this should include the relevant property for a schema server URL where the key and/or value Avro necessary schemas are stored.

  • cdc_spec (CDCSpec) – a CDCSpec obtained from calling either the cdc_long_spec or the cdc_short_spec function

  • partitions (List[int]) – a list of integer partition numbers, default is None indicating all partitions

  • table_type (TableType) – a TableType enum, default is TableType.stream()

Return type:

Table

Returns:

a Deephaven live table for the raw CDC events

Raises:

DHError