File handle management

Deephaven reads table data directly from the filesystem without a centralized database process managing file access. This design provides performance and scalability benefits, but requires understanding how Deephaven interacts with files.

This page covers two related topics:

  • File path identity assumptions — Constraints on external file operations while Deephaven references those files.
  • TrackedFileHandleFactory — How Deephaven manages open file descriptors to stay within OS limits.

File path identity assumptions

Deephaven assumes that a file path always refers to the same physical file for as long as Deephaven is referencing that file. On Linux, this means the device ID and inode must remain constant — the path must continue to point to the same underlying file.

Operations that change the underlying inode or remove the file are not safe while Deephaven is referencing the file (see TrackedFileHandleFactory below).

Disallowed operationReason
Delete fileNever delete files while referenced
Replace file (delete + create)Creates new inode; violates identity
Rename/move filePath no longer resolves to expected file

Consequences of violations

If file identity assumptions are violated:

  • Read errors: Deephaven may attempt to read from a file that no longer contains expected data, resulting in I/O errors or exceptions.
  • Write errors: Deephaven may attempt to write to a file that has been moved or replaced, resulting in data loss or corruption.
  • Silently incorrect data: In the worst case, the replacement file may have a compatible structure but different content, causing Deephaven to return wrong results without any error.

Caution

Deephaven performs best-effort detection of file identity violations using java.nio.file.attribute.BasicFileAttributes#fileKey, but support depends on the OS and filesystem. Do not rely on this detection — always ensure files remain stable while Deephaven processes reference them.

TrackedFileHandleFactory

To avoid exhausting operating system file descriptor limits (ulimit), Deephaven uses TrackedFileHandleFactory — a least-recently-opened cache for file handles. This factory automatically manages the number of open file handles by closing older handles when capacity is reached, regardless of whether those handles are still in use.

How it works

  1. When Deephaven opens a file, TrackedFileHandleFactory creates a tracked file handle and adds it to a queue ordered by open time.
  2. When the number of open handles reaches capacity, cleanup is triggered synchronously before the new handle is created.
  3. Cleanup first removes handles that were already closed or garbage collected, then reclaims least-recently-opened handles until usage drops below the target threshold (90% of capacity by default) — even if those handles are still strongly referenced.
  4. A background cleanup job runs periodically (every 60 seconds by default) and performs the same cleanup, enforcing the target threshold regardless of whether handles are still in use.

Important

File handles may be closed asynchronously by the factory at any time, even while strongly referenced and in use. Code that holds a FileHandle must tolerate the underlying file channel being closed unexpectedly and handle re-opening if necessary.

Configuration

Configure TrackedFileHandleFactory using Deephaven properties:

PropertyTypeDefaultDescription
TrackedFileHandleFactory.maxOpenFilesInteger(see below)Maximum number of file handles to keep open simultaneously. Set this below your system's ulimit -n value to leave headroom for other file descriptors.
TrackedFileHandleFactory.fastCyclingThresholdDouble0.20Fraction of capacity that, if reclaimed within the cycling interval, triggers a warning log. Indicates the system may be under-provisioned.
TrackedFileHandleFactory.fastCyclingIntervalMillisLong60000Interval (in milliseconds) for detecting fast handle cycling.

Default maxOpenFiles by process type:

Process typeDefault
Most processes4096
Core+ workers1024
Data Import Server (DIS)512

Sizing recommendations

The maxOpenFiles setting depends on your workload and system limits:

  • Check system limits: Run ulimit -n to see your per-process file descriptor limit.
  • Leave headroom: Set maxOpenFiles to 70-80% of your ulimit to reserve descriptors for network connections, logging, and other system needs.
  • Monitor cycling log entries: If you see frequent "reclaimed N file handles" warnings in logs, consider increasing maxOpenFiles or your system's ulimit.

Example configuration:

Limitations

TrackedFileHandleFactory provides limited protection against file identity violations:

  • It tracks file handles, not file identity. If a file is replaced while a handle exists, the handle continues to reference the old (now-deleted) file until it's explicitly closed.
  • Detection of file identity changes relies on BasicFileAttributes#fileKey, which is OS and filesystem dependent — it may not always work.
  • Errors from deleted files may surface much later than the deletion. Deephaven won't detect the issue until the handle goes through a reclaim/reopen cycle.
  • Relying on the factory to "protect" against file replacement is not safe — always follow the file identity guidelines.

Warning

TrackedFileHandleFactory is a resource management mechanism, not a safety mechanism. It helps avoid file descriptor exhaustion but does not protect against file replacement or deletion issues.

Best practices

  1. Never delete or replace data files while Deephaven processes are running that might reference them. If you must remove data, stop the relevant queries or processes first.

  2. Use the merge process for data lifecycle management. The merge process safely transitions intraday data to historical format with proper coordination.

  3. Size maxOpenFiles appropriately. Setting it too low causes excessive handle cycling (performance degradation); setting it too high risks hitting system limits.

  4. Monitor file handle metrics. Watch for "reclaimed file handles" warnings in logs, which indicate the system is cycling handles frequently.

  5. Coordinate external file operations. If external tools write to or manage files that Deephaven reads, ensure they follow append-only patterns or coordinate with Deephaven's data lifecycle.