Data Buffer Pool configuration

The Data Buffer Pool caches table and index data as it is read from disk. Its configuration is relevant for all query workers and everything in the data pipeline, in particular DIS instances. The DataImportServer, LocalTableDataServer, TableDataCacheProxy, and query worker processes (including those used for merge) all operate around an internal pool of 64KB binary buffers used to read, write, and cache binary data.

Buffer size, the size of a single block of data, is technically configurable; however, it must be globally consistent for the entire data pipeline, and Deephaven has found that 64KB strikes a good balance between read/write throughput and read/cache amplification.

The pool size, how many reusable buffers held at once, is highly configurable using the properties below. In general, increasing the size of your pool allows your caching to be more effective, while decreasing the size of your pool may waste less of your heap on unnecessary data.

Some general tuning recommendations:

  • For merges, follow the merge throughput guidelines in the Merging Data documentation.
  • For specific queries, the QueryPerformanceLog and UpdatePerformanceLog provide information about the number and duration of repeated data reads. If you see a large number of repeated reads, you may want to increase the size of your pool. If there are no repeated reads, you may be able to reduce the size of your pool.
  • For DIS instances, the pool must be big enough to hold one block for each file of each partition; for example, our recommended rule of thumb is nPartitions * nColumns * 1.2, assuming most columns are not object columns.

Methods

The methods below provide information about the current configuration.

MethodDescription
getMaximumPoolToHeapSizeRatio()Get the configured maximum decimal ratio between the total size of the data buffer pool and the JVM heap size.
getMinimumPoolToHeapSizeRatio()Get the configured minimum decimal ratio between the total size of the data buffer pool and the JVM heap size.

Properties

PropertyDescription
DataBufferConfiguration.bufferSizeSpecifies the size for each buffer used for storing and communicating Deephaven-format binary data. This property supports -"XmX"-style units, i.e., <size>[g|G|m|M|k|K], but must be a positive integer less than 2^30 when converted to bytes.
DataBufferConfiguration.useDirectMemory(optional) Specifies whether the buffers used for storing and communicating Deephaven-format binary data should be allocated in direct memory, rather than heap memory. Defaults to false, meaning heap memory will be used.
DataBufferConfiguration.poolEnabledSpecifies whether a pool should be used to constrain the number of buffers in use for Deephaven-format data.
DataBufferConfiguration.poolSizeThe total size of the memory allocated to the data buffer pool. Supports "-XmX"-style units, i.e., <size>[g|\G|m|M|k|K]. Pool Size is constrained by minimum and maximum ratio to heap size; the resulting number of buffers must be less than Integer.MAX_VALUE. (Note: This property is disregarded if buffer pooling is disabled.)
DataBufferConfiguration.minPoolToHeapSizeRatio(optional) Specifies the minimum ratio (as a decimal number) between the data buffer pool size and the JVM heap size. Defaults to 0.1. (Note: This property is disregarded if buffer pooling is disabled.)
DataBufferConfiguration.heapMaxPoolToHeapSizeRatio(optional) Specifies the maximum ratio (as a decimal number) between the data buffer pool size and the JVM heap size, when heap memory is used. Defaults to 0.6. (Note: This property is disregarded if buffer pooling is disabled.)
DataBufferConfiguration.directMaxPoolToHeapSizeRatio(optional) Specifies the maximum ratio (as a decimal number) between the data buffer pool size and the JVM heap size, when direct memory is used. Defaults to 2.0. (Note: This property is disregarded if buffer pooling is disabled.)
DataBufferConfiguration.poolCleanupThresholdRatioUsed(optional) Maximum occupancy of the data buffer pool as a decimal ratio before concurrent cleanup should be performed. Defaults to 0.9. This value must be between 0.0 and 1.0. (Note: This property is disregarded if buffer pooling is disabled.)
DataBufferConfiguration.poolCleanupTargetRatioUsed(optional) Target occupancy of the data buffer pool as a decimal ratio. Defaults to 0.6. This value must be between 0.0 and DataBufferPool.cleanupThresholdRatioUsed. (Note: This property is disregarded if buffer pooling is disabled.)
DataBufferConfiguration.poolCleanupIntervalMillis(optional) Interval between checks for concurrent cleanup. Defaults to 60000 (60s), and must be greater than zero. (Note: This property is disregarded if buffer pooling is disabled.)
DataBufferConfiguration.poolClockIntervalMillis(optional) Interval between clock ticks for the data buffer pool's logical clock, used to stamp an approximate last used time on outstanding buffers, which may be used as input for cleanup processing. Defaults to 10000 (10s) and must be greater than zero. (Note: This property is disregarded if buffer pooling is disabled.)

The following legacy properties have been deprecated as of Deephaven v1.20200121.

Legacy PropertyDescription
DataBufferPool.bufferSizeInBytesSpecifies the size in bytes for all buffers used for storing and communicating Deephaven-format binary data. Must be a positive integer less than 2^30 when converted to bytes.
DataBufferPool.useDirectMemorySuperceded by DataBufferConfiguration.useDirectMemory. Specifies whether the buffers used for storing and communicating Deephaven-format binary data should be allocated in direct memory, rather than heap memory. Defaults to false, meaning that heap memory will be used.
DataBufferPool.EnabledSuperceded by DataBufferConfiguration.poolEnabled. Specifies whether a pool should be used to constrain the number of buffers in use for Deephaven-format data.
DataBufferPool.sizeInBytesSuperceded by DataBufferConfiguration.poolSize. The total size in bytes of the memory allocated to the data buffer pool. Constrained by minimum and maximum ratio to heap size. The resulting number of buffers must be less than Integer.MAX_VALUE. (Note: This property is disregarded if buffer pooling is disabled.)