---
id: runbook-data-import-server
title: Data Import Server runbook
---

The Data Import Server (DIS) is a critical Deephaven data management service responsible for ingesting streaming data, persisting it to disk in Parquet format, and serving real-time intraday data to clients via Table Data Protocol (TDP). The DIS is the primary ingestion point for real-time data feeds. See the [Data Import Server overview](../../data-guide/dis.md) for more details.

## Impact of Data Import Server failure

| Level            | Impact                                                                                                                         |
| :--------------- | :----------------------------------------------------------------------------------------------------------------------------- |
| Sev 1 - Critical | Binary log file data will not be written to the database. Binary store imports will fail. Intraday data will not be available. |

> [!CAUTION]
> DIS failure stops new data ingestion immediately. Historical data remains accessible, but live data feeds will be interrupted.

## Data Import Server architecture

The DIS can be deployed in multiple configurations:

**Standalone process:** DIS runs as its own monit-managed process (`db_dis`), typically for high-throughput production environments.

**Embedded mode:** DIS runs inside a worker process (Persistent Query, console session, or Python client) for simpler deployments or testing.

**Sharding:** Multiple DIS instances can handle different tables or partitions for horizontal scaling.

**Hot replicas:** Multiple DIS instances can ingest the same data stream for high availability.

## Data Import Server dependencies

The DIS requires:

1. **Configuration Server** — Must be running to access table schemas and routing configuration.
2. **Authentication Server** — Must be running for token validation.
3. **Data tailer processes** — Must be running to feed binary log data (if using binary log ingestion).
4. **Filesystem access** — Must have read/write access to data storage paths.
5. **etcd cluster** — Must be accessible (via Configuration Server).

**Optional dependencies:**

- **Kafka/Solace/other sources** — If ingesting from streaming platforms.
- **Table Data Cache Proxy** — May be used in routing configuration.

## Checking Data Import Server status

Check DIS process is running with monit:

```bash
dh_monit status db_dis
```

Expected output should show status `Running`.

## Viewing Data Import Server logs

View application log:

```bash
cat /var/log/deephaven/dis/DataImportServer.log.current
```

Tail the log to follow in real-time:

```bash
tail -f /var/log/deephaven/dis/DataImportServer.log.current
```

List historical log files:

```bash
ls -ltr /var/log/deephaven/dis/db_dis.log.????-??-??
```

View process stdout/stderr logs:

```bash
cat /var/log/deephaven/dis/db_dis.log.$(date +%Y-%m-%d)
```

## Restart procedure

Restart the DIS:

```bash
dh_monit restart db_dis
```

> [!WARNING]
> Restarting DIS will temporarily interrupt data ingestion. The DIS will resume from checkpoints when restarted, but there will be a brief gap in real-time data availability.

Verify the restart was successful:

```bash
dh_monit status db_dis
```

Monitor the log during startup:

```bash
tail -f /var/log/deephaven/dis/DataImportServer.log.current
```

**Expected startup messages:**

- Successful connection to [Configuration Server](./runbook-config-server.md).
- Table schema loading.
- Checkpoint recovery.
- Tailer connections established.
- Data ingestion resuming.

## Cleaning up corrupt intraday data

In the event that intraday ticking data becomes corrupted, you can clean up the intraday data without stopping the DIS.

**General cleanup command:**

```bash
# As the dbmerge user
rm -r /db/Intraday/[namespace]/[tablename]/[intraday partition]/[date]
```

**Example for Order/Event table:**

```bash
# Remove all intraday partitions for 2018-02-09
rm -r /db/Intraday/Order/Event/*/2018-02-09
```

**After cleanup:**

1. The DIS will detect the missing data.
2. If source data is still available (binary logs, Kafka retention), the DIS can re-ingest.
3. Monitor DIS logs to verify re-ingestion completes successfully.

## Configuring the Data Import Server

**Data storage paths:**

Intraday data is written to paths defined in the schema and routing configuration:

```text
/db/Intraday/[namespace]/[tablename]/[partition]/[date]
```

## Binary log ingestion

When using binary log ingestion:

**Data flow:**

1. External system writes binary log files to filesystem.
2. Tailer process monitors log files for changes.
3. Tailer reads new data and streams to DIS via gRPC.
4. DIS writes data to Parquet files.
5. DIS serves data to clients via TDP.

**Binary log configuration:**

Binary logs are configured in the data import schema and routing configuration managed through `dhconfig`.

See [dhconfig dis command reference](../configuration/dhconfig/dis.md) for details.

## Checkpoint and recovery

The DIS maintains checkpoints to track ingestion progress:

**Checkpoint files:** Stored in the same directory as the data:

```text
/db/Intraday/[namespace]/[tablename]/[partition]/[date]/.checkpoint
```

**Recovery behavior:**

When DIS restarts:

1. Reads checkpoint files to determine last successfully written position.
2. Resumes ingestion from checkpoint position.
3. Re-processes any data after checkpoint.
4. May result in duplicate detection/handling depending on configuration.

## Data Import Server sharding

For high-throughput systems, DIS can be sharded across multiple instances:

**Sharding strategies:**

- **By table** — Different DIS instances handle different tables.
- **By partition** — Different DIS instances handle different partitions of the same table.
- **By key range** — Partition data by key ranges across DIS instances.

**Configuration:** Sharding is defined in the routing configuration managed through `dhconfig routing`.

## Configuration files and locations

**monit configuration:** `/etc/sysconfig/illumon.d/monit/db_dis.conf`

**Property files:**

- `/etc/sysconfig/illumon.d/resources/iris-common.prop`
- Service-specific properties configured through `dhconfig`

**Data directory:** Typically `/db/Intraday/` (configurable per table).

**Log directory:** `/var/log/deephaven/dis/`.

**Checkpoint files:** Stored in data directories alongside Parquet files.

## Related documentation

- [Data Import Server overview](../../data-guide/dis.md)
- [System processes overview](../architecture/architecture-overview.md)
- [dhconfig dis command reference](../configuration/dhconfig/dis.md)
- [dhconfig routing command reference](../configuration/dhconfig/routing.md)
- [Data tailer runbook](runbook-data-tailer.md)
- [Configuration Server runbook](runbook-config-server.md)
