---
title: Avro & Protobuf Schema Inference
---

[Avro](https://avro.apache.org/) and [Protobuf](https://protobuf.dev/) are widely used serialization formats for structured data, especially in streaming and Kafka-based workflows.

## When to use Avro/Protobuf schema inference

Use Avro or Protobuf schema inference when:

- Ingesting data from Kafka or other streaming platforms.
- Working with complex, nested, or evolving data structures.
- Automating schema creation for event-driven architectures.

## Input requirements

- Avro: Provide a valid Avro schema file (`.avsc`).
- Protobuf: Provide a valid Protobuf descriptor file (`.pb`).
- Ensure your files are accessible and UTF-8 encoded.

## Kafka

Deephaven can generate schemas from Avro schema and Protobuf descriptors. See the examples below.

### Discover a Deephaven schema from an Avro schema

You can discover and generate a Deephaven schema from an Avro schema file programmatically using the Groovy API. This is useful for advanced workflows, such as customizing namespace, table name, or handling nested Avro schemas.

```groovy
import com.illumon.iris.db.schema.SchemaServiceFactory
import io.deephaven.kafka.ingest.SchemaDiscovery

// Replace "pageviews.avsc" with the path to your Avro schema file
ad = SchemaDiscovery.avroFactory(new File("pageviews.avsc"))
      .columnPartition("Date")
      .namespace("Kafka")
      .tableName("PageViews")

schema = ad.generateDeephavenSchema()
schemaService = SchemaServiceFactory.getDefault()
// Create the namespace if it doesn't already exist
schemaService.createNamespace("System", schema.getNamespace())
schemaService.addSchema(schema)
```

For more advanced usage, such as handling nested Avro schemas, see the [Deephaven Javadoc](https://docs.deephaven.io/javadoc/20240517/io/deephaven/kafka/ingest/package-summary.html).

### Discover a Deephaven schema from a Protobuf descriptor

You can discover and generate a Deephaven schema from a Protobuf descriptor file programmatically using the Groovy API. This is helpful for advanced use cases, such as customizing namespace, table name, or handling complex Protobuf messages.

```groovy
import com.illumon.iris.db.schema.SchemaServiceFactory
import io.deephaven.kafka.ingest.SchemaDiscovery

// Replace "trade.desc" with the path to your compiled Protobuf descriptor file
pd = SchemaDiscovery.protobufFactory(new File("trade.desc"))
      .columnPartition("Date")
      .namespace("Kafka")
      .tableName("Trades")

schema = pd.generateDeephavenSchema()
schemaService = SchemaServiceFactory.getDefault()
// Create the namespace if it doesn't already exist
schemaService.createNamespace("System", schema.getNamespace())
schemaService.addSchema(schema)
```

- `.columnPartition("Date")` specifies the partition column (required for in-worker DIS ingestion).
- `.namespace("Kafka")` and `.tableName("Trades")` let you override the namespace and table name.

## Troubleshooting

- **Invalid schema/descriptor:** Ensure your Avro or Protobuf file is valid and accessible.
- **Missing or unsupported types:** Review the generated schema and manually adjust for any unsupported or custom types.
- **Kafka integration issues:** See the [Kafka streaming guide](../streaming/coreplus-kafka.md).
- **Encoding issues:** Ensure your files are UTF-8 encoded.

## Related documentation

- [Schemas](../tables-and-schemas.md)
- [Kafka streaming guide](../streaming/coreplus-kafka.md)
- [CSV Schema Inference](./csv-schema-inference.md)
- [JDBC Schema Inference](./jdbc-schema-inference.md)
- [JSON Schema Inference](./json-schema-inference.md)
- [XML Schema Inference](./xml-schema-inference.md)
- [Avro documentation](https://avro.apache.org/docs/current/)
- [Protobuf documentation](https://developers.google.com/protocol-buffers)
