Table Data Cache Proxy (TDCP)
The Table Data Cache Proxy (TDCP) is a core component of Deephaven Enterprise that caches table data locally on each node. It acts as an intermediary between workers and data sources, providing fast local access to frequently used tables while reducing load on data sources and improving query performance.
Terminology
The similar acronyms TDP and TDCP can be confusing. It's important to distinguish between these two different concepts:
- Table Data Protocol (TDP): The communication protocol itself — a custom binary network protocol specifically designed for requesting and subscribing to table locations and block data. Think of this as the "language" that components speak.
- Table Data Cache Proxy (TDCP): The actual Deephaven process/component that runs on each node. This service uses the TDP protocol to communicate with other components (workers, Data Import Servers) and provides caching and connection management.
In this document:
- "TDP" or "Table Data Protocol" refers to the communication protocol, message formats, or how data is transmitted.
- "TDCP" or "Table Data Cache Proxy" refers to the running service, its configuration, operations, or deployment.
For example:
- "Workers connect to the TDCP using TDP" — The worker connects to the proxy service, communicating via the protocol.
- "The TDCP enforces access control before allowing TDP data transfer" — The proxy service makes decisions about what the protocol is allowed to transmit.
What is the Table Data Protocol?
The Table Data Protocol (TDP) is a custom network protocol designed specifically for requesting and subscribing to table locations and block data. TDP is optimized for Deephaven's columnar data architecture, enabling:
- Efficient data transfer: Transfers only the requested columns and rows, minimizing network overhead.
- Low-latency access: Provides fast random access to columnar data without requiring reads of other columns.
- Subscription-based updates: Supports real-time subscriptions to ticking tables for live data applications.
- Block-level caching: Enables intelligent caching of data blocks at the proxy level.
Role in the Deephaven architecture
The TDCP sits between Deephaven workers and data sources, forming a critical link in the data flow:
Data flow and caching
Workers continuously send TDP requests to their local TDCP, requesting specific table data and columns. The TDCP acts as a caching intermediary, storing data blocks in memory to reduce load on data sources.
Caching benefits
The TDCP provides two key caching advantages:
-
Shared access across workers: When multiple workers request the same data, the TDCP retrieves it from the DIS once and serves it to all requesting workers. This prevents duplicate requests to the data source.
-
Repeated access optimization: When workers request the same data repeatedly, the TDCP can serve it directly from its cache without querying the DIS again, reducing latency and data source load.
Live data subscriptions
For ticking (live) tables, the TDCP maintains an active subscription to the DIS, receiving updates as new data arrives and pushing those updates to subscribed workers in real time.
Authentication and access control
The TDCP enforces a preliminary, table-level access check before allowing data transfer. This two-tier security model works as follows:
Table-level access (enforced by TDCP)
When a client requests data, the TDCP:
- Verifies the user's authentication token.
- Checks the table's Access Control Lists (ACLs) to determine if the user has permission to access any rows.
- Grants access if the user has permission to see any rows in the table.
- Denies access if the user has no permissions to any rows, effectively blocking access to the entire table over TDP.
Row-level access (enforced by worker)
If table-level access is granted:
- The TDCP permits the transfer of block data.
- The Deephaven worker (or downstream proxy) applies fine-grained row-level ACL filtering.
- The worker ensures the client only receives data from rows they are explicitly permitted to see.
This approach provides both security and performance:
- Security: Multiple layers of access control prevent unauthorized data access.
- Performance: Coarse-grained checks at the TDCP reduce unnecessary data transfer for unauthorized users.
For detailed configuration and troubleshooting of authenticated TDP, see How to configure authenticated Table Data Protocol.
Configuration
Default settings
By default, all cluster nodes run a TDCP instance with the process name db_tdcp. Standard configuration is managed through:
- Hostconfig and starting scripts: Control heap size allocation.
- Property files: Control connection limits and performance tuning.
- Data routing configuration: Defines which DIS instances serve which tables and namespaces.
- Service registry: Published by the configuration server, allowing workers to discover TDCP endpoints.
Customization
Common customizations include:
- Heap allocation: Adjust based on cache size requirements and expected data volume. A typical starting point is 4GB, but larger deployments may require 8-16GB or more.
- Connection pooling: Set limits on concurrent connections to data sources.
- Routing rules: Define custom routing for specific tables or namespaces.
For node-specific configuration, settings are stored in /etc/sysconfig/illumon on each node.
Data routing
The TDCP uses the data routing configuration (stored in the configuration server) to determine how to route table requests. This configuration can be updated dynamically without restarting the TDCP, allowing administrators to:
- Add new DIS instances to the cluster.
- Route specific tables to dedicated DIS instances for performance isolation.
For more information, see Data routing overview.
Monitoring and operations
Process monitoring
Check if the TDCP is running:
sudo -u irisadmin monit status db_tdcp
Log files
The TDCP logs to /var/log/deephaven/tdcp/:
TableDataCacheProxy.log.current— Current application log.db_tdcp.log.YYYY-MM-DD— Standard output and error logs.
Key events logged include:
- Connection establishment and termination with workers and data sources.
- Access control decisions (permission grants and denials).
- Data routing changes.
- Errors and warnings.
Performance monitoring
Monitor TDCP performance through:
- Connection counts: Track active connections to workers and data sources.
- Memory usage: Monitor heap usage and garbage collection activity.
- Network throughput: Measure data transfer rates between TDCP and workers/data sources.
Restart procedure
To restart the TDCP:
sudo -u irisadmin monit restart db_tdcp
Impact: Restarting the TDCP will disconnect all active worker connections. Workers will automatically reconnect and resume their table subscriptions, but there may be a brief interruption in data access.
Troubleshooting
Connection issues
Symptom: Workers cannot connect to the TDCP.
Possible causes:
- TDCP process is not running.
- Network connectivity issues between worker and TDCP.
- Firewall blocking TDP port (check data routing configuration for port numbers).
Resolution:
- Verify TDCP is running:
sudo -u irisadmin monit status db_tdcp - Check TDCP logs for errors:
cat /var/log/deephaven/tdcp/TableDataCacheProxy.log.current - Test network connectivity from the worker node.
- Review firewall rules and data routing configuration.
Access denied errors
Symptom: Users receive "Permission denied" or "User may not access" errors when querying tables.
Possible causes:
- User lacks permissions in the table's ACLs.
- TDCP cannot reach the authentication server.
- Incorrect data routing configuration.
Resolution:
- Check the
AuditEventLogtable for permission check details. - Review TDCP logs for
TablePermissionCheckentries. - Verify ACLs for the affected table using the ACL management tools.
- See How to configure authenticated Table Data Protocol for detailed troubleshooting.
Cache performance issues
Symptom: Queries are slower than expected, or TDCP memory usage is very high.
Possible causes:
- Insufficient heap allocated to TDCP.
- Workers repeatedly requesting uncached data.
- Memory pressure causing excessive garbage collection.
Resolution:
- Monitor TDCP heap usage and garbage collection in logs.
- Adjust heap size in
/etc/sysconfig/illumon. - Review cache configuration and tune limits based on workload.
- Analyze query patterns to identify opportunities for better caching.
Data routing changes not applied
Symptom: TDCP continues routing to old data sources after configuration changes.
Resolution: The TDCP automatically reloads routing configuration when changes are detected. If changes don't take effect:
- Check TDCP logs for routing reload messages.
- Verify the routing configuration was correctly updated in the configuration server.
- Restart the TDCP if automatic reload fails.
Resource sizing
When planning TDCP resources:
- Memory: Start with 4GB heap for basic deployments. Increase based on:
- Number of concurrent workers accessing data.
- Size and number of tables being cached.
- Average query complexity and data volume.
- CPU: TDCPs are generally not CPU-intensive, but ensure adequate CPU for handling concurrent connections.
- Network: Ensure sufficient network bandwidth between TDCP and both workers and data sources.
A common sizing formula for total cluster memory:
For example, with 10 users running 2 workers each and 10 Persistent Queries:
Data lifecycle integration
The TDCP plays an important role in the Deephaven data lifecycle:
- Intraday data: TDCP connects to DIS instances to serve live, intraday data.
- Historical data: Workers typically access historical data directly from shared storage (NFS, S3) without TDCP involvement, though some deployments may use TDCP for historical data access as well.