Detailed Version Log Deephaven 1.20240517
Note
For information on changes to Deephaven Community, see the Github release page.
Certified versions
Certified Version | Notes |
---|---|
1.20240517.437 | The following caveats apply to this release:
|
1.20240517.414 | The following caveats apply to this release:
|
1.20240517.379 | The following caveats apply to this release:
|
1.20240517.344 | |
1.20240517.298 | |
1.20240517.245 | The following caveats apply to this release:
|
1.20240517.222 | |
1.20240517.189 | Core+ Kafka ingesters using consumeToDis incorrectly require the KafkaOffset column (or the column specified by the deephaven.offset.column.name ) to be included in the target schema. If your target schema does not include the KafkaPartition or KafkaTimestamp columns, they must be disabled by setting the deephaven.partition.column.name and deephaven.timestamp.column.name properties to an empty string. This will be corrected in a point release including DH-17561. |
Detailed Version Log: Deephaven v1.20240517
Patch | Details |
---|---|
437 | Merge updates from 1.20231218.524
|
436 | DH-18956: Fix eggplant nightly test compilation |
435 | DH-19110: Fix R client methods live_table and historical_table |
434 | DH-18368: Closing a ui.dashboard in a code studio shouldn't prompt |
433 | DH-18694: The gear icon on login screen appears to be broken in web with saml auth plugin |
432 | DH-18279: Fix unstable one click plot titles in web |
431 | DH-18893: Update deephaven-hash to 0.3.0 |
430 | DH-18778: Fix wrong Assert import used. |
429 | DH-18727: Fix unbound variable in podman for envoy admin DH-18729: Podman change image creation to be single-user |
428 | DH-18930: handle SocketException, Connection reset better in LAS client |
427 | DH-18988: Python Core+ auth client should log client UUID to facilitate troubleshooting |
426 | DH-16352: Ability to disable password authentication in web UI |
425 | DH-19061: Update web packages to v0.85.17 DH-18856: Support Median Aggregation in the UI DH-18681: Fix Inability to delete Input Table row with null key |
424 | DH-18778: Javadoc fix. |
423 | DH-19041: Updates to the version log generation script |
422 | DH-19052: Core+ test stage should not cause Jenkins pipeline failures. |
421 | DH-18778: Cannot apply ACLs to Rollups or Trees |
420 | DH-18756: Make sure that internal partitions do not have invalid characters. |
419 | DH-19028: Update Grizzly to Core 0.38.0 DH-18351: Add CountWhere feature to UpdateBy Rolling and Cumulative DH-18491: Implement optimized FirstBy / LastBy + NaturalJoin feature DH-18414: SourcePartitionedTable: Allow deferred existence checking and add partitioning columns DH-18856: Add Median to jsapi Aggregation Options |
418 | DH-18989: Misconfiguring git repositories results in opaque controller crash loop and inability to login |
417 | DH-18754: Web API Service should create workers off-thread, avoid blocking other calls |
416 | DH-18815: Enable TCP KeepAlive for clients downstream of Envoy |
415 | DH-18986: Fix bugs in Py Client's controller module |
414 | DH-18941: NPE in local swing console log when creating a new PQ |
413 | DH-18001: correct consumerBufferSize calculation |
412 | DH-19007: Adjust formatting in Grizzly Release Notes to be Salmon-compatible |
411 | DH-19009: Fix EggplantTask gradle wiring |
410 | DH-19006: Disable DynamicTableWriter Integration test for now. |
409 | DH-18682: Make InputTableUpdater a LivenessReferent, able to materialize input table views |
408 | DH-18001: Tailer reduces buffer size to match DIS DH-18751: report on tailer/DIS buffer configuration errors better |
407 | DH-14692: Stop exporting client from support logs |
406 | DH-18744: Add support for VPlus to Grizzly upgrade of Podman deployments |
405 | DH-18838: port fishlib Client fix to Deephaven |
404 | DH-18981: Update CODEOWNERS for Enterprise. |
403 | DH-18669: Revert scoping changes from DH-18877 |
402 | DH-18946: Make dh_preinstall_check require/install rsync if needed DH-16487: Validate installer host uses rsync>=3.1 |
401 | DH-18669: Improve handling of internal partition column |
400 | DH-18877: UncoalescedBlinkTable must manage its source, Use LivenessScope.computeEnclosed in DatabaseImpl |
399 | DH-18439: Introduce dh_preinstall_check.sh for early installation validation |
398 | DH-18800: Reliability changes for automation and integration tests. |
397 | DH-18867: Include the proto-wrappers dir in the C++ client source tarball |
396 | DH-18754: Add missing JS client details for replica instances |
395 | DH-18897: Check condition for PqStorageEtcdV2Impl delete is wrong |
394 | DH-18794: Allow kubernetes flags to align input params to eggplant manager |
393 | DH-18747: Per-worker pod cleanup delay |
392 | DH-18794: Investigate bhs not contactable by eggplant |
391 | DH-18709: Performance under load of Controller Query Tracker operations |
390 | DH-18691: Automatically migrate sql->etcd acls |
389 | DH-18620: Provide means to change timeout to get properties from etcd and increase default |
388 | DH-18660: Scripts for load testing of Controller subscriptions and PQ restart, stop and modify requests |
387 | DH-18741: Improved Error Message when Shortcuts Fail to Load |
386 | DH-18732: Fix XSS issue in file list drag and drop |
385 | DH-18596: Fix MultiScriptPathLoader#getAvailableScriptPaths caching |
384 | DH-18569: Allow configuration of worker pod toleration on authenticated username |
382 | Update web UI packages to v0.85.15
|
381 | DH-18820: Fix jenkins publish docs script |
380 | DH-18758: Fix C++ Client Docker build |
379 | DH-18639: Python Core+ client ControllerClient API improvements |
378 | DH-18722: Update Core+ to 0.37.6 DH-18539: Fix incorrect snapshot results on historical sorted rollups DH-18632: JS ChartData should write updates to the correct row |
377 | DH-18661: Update README for buildUtils/jenkins, delete dead code |
376 | DH-18702: Update web version to v0.85.14 DH-18645: Fix panel titles using HTML instead of just text |
375 | DH-18480: correction to table filter |
374 | DH-18467: Automate qa server upgrades |
373 | DH-18148: Add new Jenkins stage + publishing script for javadoc/pydoc uploads to docs site |
372 | DH-18558: QA k8s clusters have a calendar problem with Swing UI |
371 | DH-18612: Merge forward Podman functionality from VPlus.509 DH-18422: Update generation of iris-endpoints.prop for Podman so Web ACL Editor will work correctly DH-18175: Modified Podman start_command.sh to support Podman on MacOS and to fix --nohup always being applied DH-17696: Added A to start_command.sh dig call, to ensure IPv4 address is retrieved. DH-17880: Change Podman start_command.sh default behavior to preserve existing properties DH-17977: Add volume options to Podman start_command.sh for illumon.d/java_lib and illumon.d/calendars volumes DH-17999: Fix coreplus_test_query.py nightly test |
370 | DH-18303: Fix python sourcing of Test Automation script |
369 | DH-18668: Update GoWorker |
368 | DH-18595: Fix Kubernetes install affected by change to iris_keygen.sh inputs |
367 | DH-900: Improve jenkins reporting build failures to slack |
366 | DH-17398: Fix Pandas Dataframe Restart on Disconnect |
365 | DH-18567: Core+ integration test for deleted system partitions |
364 | DH-18616: Update Core+ to 0.37.5 DH-18486: JS clients must not release tickets for pending operations (#6618) DH-17486: Support whole range of byte values for booleans (#6624) DH-18588: Allow shortened Name.rootFIle config file specifier (#6621) DH-18174: Delay reading from parquet file when creating table and column location (#6622) DH-18567: Filter locations before removal allowed checks. (#6607) (#6614) DH-18482: EventDrivenUpdateGraph: Allow two concurrent threads to safely requestRefresh(), and ensure clean errors for nested refresh attempts (#6603) (#6615) DH-18300: Improve DataIndex performance. (#6585) (#6593) |
363 | DH-18555: Fix controller_import pydoc |
362 | DH-18614: Cherrypick start_command fixes for podman |
361 | DH-18319: Add publication to jfrog for podman |
360 | DH-18598: Update jenkins images to v6 |
359 | DH-18538: Fix Deephaven Express not respecting WebGL flag |
358 | DH-18097: Add podman publishing to release pipeline |
357 | DH-18097: Add support for building and publishing a Podman images tar.gz DH-18314: Add Podman to CI build image DH-18319: Fix CI build image README and clean up Dockerfile |
356 | DH-18456: Make generation of protobuf descriptor sets ("protosets") part of proto gradle rules |
355 | DH-18540: Fix changelog generation script link generation |
354 | DH-18528: Core+ C++ Client: User documentation and examples |
353 | DH-18526: Core+ C++ Client: Protobuf type pun generator adds more Doxygen comments |
352 | DH-18527: Core+ C++ Client: Add/modify some more C++ documentation comments |
351 | DH-16929: Core+ DataIndex should read RowSets from Index Codec columns directly |
350 | DH-18362: Update org.jboss.resteasy to 5.0.9 to address vulnerability |
349 | DH-18338: Core+ C++ Client: Remove Immer from public exports |
348 | DH-18458: Core+ C++ Client: Upgrade Catch to v3.8.0 |
347 | DH-18278: Add merge KeyFormula option to Core+ Kafka schema helper |
346 | Upgrade deephaven.ui to v0.23.3, deephaven-plugin-plotly-express to v0.12.1
|
345 | DH-18459: Core+ R Client: Typo in demo script |
344 | DH-18401: Update Core+ to 0.37.4 DH-18433: SourcePartitionedTable needs to check the size of pending locations even if there are no added or removed locations |
343 | DH-18401: Update Core+ to 0.37.3 DH-18395: Prefer bulk-unmanage whenever LivenessManager.unmanage is called on multiple referents DH-18385: correct OuterJoinTools.leftOuterJoin/fullOuterJoin() output when RHS initally empty DH-18389: Chart subscriptions shouldn't reverse positions |
342 | DH-18407: Core+ C++ Client: Add more doxygen comments |
341 | DH-18406: Core+ C++ Client: Fix a couple typos in docker-build.sh |
340 | Upgrade Web UI to v0.85.12
|
339 | DH-18340: Add ControllerClientGrpc to javadoc |
338 | DH-18318: DndQueryInitializer must mark script execution thread as systemic |
337 | DH-18337: C++ Core+ Client: GetUserAndPassword() should use std::cin::getline() |
336 | DH-18032: Remove unnecessary dependency on numpy in Core+ Py client |
335 | DH-18372: Update Core+ to 0.37.2 DH-18373: Fix jpy python search paths DH-18331: JS API viewports should throw only for basic data issues DH-18178: Ensure that DataIndexes produced by a RegionedColumnSourceManager are retained by the DataIndexer DH-18194: Fios jsapi type errors from 0.37.1 DH-18364: Add Property to Disable Core DataIndex |
334 | DH-18103: Fix typings in slow login cancel button |
333 | DH-18103: Show a Cancel button on slow login |
332 | DH-17675: Fix Web not displaying PQ error restart count >10 correctly DH-18325: Fix web UI incorrectly sets error restart count to 11 instead of infinite (-1) |
331 | DH-18310: Make getPQ methods public in DnDSessionFactoryBase |
330 | DH-18301: Add some Doxygen comments to the C++ Core+ Client |
329 | DH-18302: Improve Core+ C++ client proto wrapper generator for better autogen doc |
328 | DH-17853: Additional certificates fixes found while running test plan DH-17666: Split container tests and nightly integration tests into separate tasks |
327 | DH-10177: correct typo in readme |
326 | DH-17853: Run validate_certificates.sh during install, add bugfixes and tests for iris_keygen.sh |
325 | DH-18211: Use correct offset in snapshot |
324 | DH-18202: Enable Systemic object tracking by default in Core+ |
323 | DH-18066: Update Core+ to 0.37.1 |
322 | DH-18184: Fix input tables pasting more rows than visible |
321 | DH-18182: Python Core+ RemoteTableBuilder blink method calls the wrong Java RemoteTableBuilder method |
320 | DH-18162: Fix issues where OnDiskDeephavenTableLocation would incorrectly say there was an available Data Index |
319 | DH-18127: Add clearLocationCache() methods to Core+ db. |
318 | DH-18151: Fix Web UI breaking on replicated WebClientData with non-null ReplicaSlot |
317 | DH-18148: Split Enterprise Py Docs into separate ones for worker and client |
316 | DH-18018: Fix Web UI breaking on replicated WebClientData with duplicate serials |
315 | DH-18118: Restore idempotency of user acl in TestAutomation install |
314 | DH-18096: Ignore version mismatch in DataGen test for re-install |
313 | Merge updates from 1.20231218.491
|
312 | DH-17834: Update QA results input table support to core+ python |
311 | DH-17968: Make python notebook import WorkspaceDataSnapshot aware |
310 | DH-17989: Improve Legacy ResolveTools usage |
309 | DH-18040: Fix Panels Menu Error in Safari |
308 | DH-18019: Fix test backup script test file location |
307 | DH-18022: Prefer Deephaven admin user account to cp files in config_packager.sh |
306 | DH-18020: Update QA Results data loads to include junit |
305 | DH-18016: Fix merge/validate queries on QA cluster |
304 | DH-17987: Add dhconfig schema tests WRT missing namespaces |
303 | DH-17980: QA - add new dhconfig dis test coverage |
302 | DH-17982: Add dhconfig props tests for alias names |
301 | DH-17954: Fix Query Monitor Throws Error for Set Difference |
300 | DH-17832: Upgrade Legacy Arrow jars to 17.0.0 |
299 | DH-17935: Put quotes around --exclude="java*,jna*" |
298 | DH-17962: Fix Panels Menu Status not Updating |
297 | DH-17920: Expose challenge-response login through JS API |
296 | DH-17948: Python API upload_key does not work with keys generated with generate_keypair |
295 | DH-17945: Add infrastructure for autogenerating Protobuf documentation |
294 | DH-17905: Fix remaining ubuntu20-infra kafka test failure |
293 | Merge updates from 1.20231218.478
|
292 | DH-17857: mac developers cannot deploy branch due to --transform argument sent to tar |
291 | DH-17762: Add grouping and symbolTable options to Core+ Kafka schema helper |
290 | DH-17754: Do not allow installer to use username as a groupname |
289 | DH-17934: Fix DBAclServiceProviderTest failures |
288 | DH-17917: Improved worker pod lifecycle management |
287 | DH-17765: Fix spares sorting order in the status indicator in QM |
286 | DH-17912: better checking for duplicates in 'dhconfig properties import' |
285 | DH-17930: TimePartitionRotationTest fails because Date is not within current window |
284 | DH-17882: report on invalid namespaces in 'dhconfig schema list/export' |
283 | DH-17847: Release note for config changes needed in k8s environment |
282 | DH-17859, DH-17860, DH-17862: Improve performance overview overnight analysis and error logging |
281 | DH-17633: Update podman deployment for 1.20240517 (Grizzly) |
280 | DH-17916: Legacy Python Session Fails to Start on Kubernetes |
279 | DH-14183: Fix Panels Menu Memory Issue |
278 | DH-10177: make sure 'dhconfig properties delete' won't delete a file that 'sanitizes' to the same name |
277 | DH-17906: Run Helm Upgrade as part of Minikube Nightly Test |
276 | DH-17905: Add jq for Kafka nightly tests |
275 | DH-17616: Reduce duplicates in completed binlog cleanup logs |
274 | DH-17888: Forward merge test automation to align qa-results |
273 | DH-17847: Config changes for easier future kubernetes upgrades DH-17872: Fix broken Core+ merge workers on Kubernetes |
272 | DH-15651: Web Support for Custom Favicon and Browser Tab Title |
271 | DH-17603: Cron tasks adding junit-test readin to qa-results |
270 | DH-17801: Fix typo in iris_keygen.sh |
269 | DH-17416: Prevent Console Connect while Client Disconnected |
268 | DH-10177: add 'dhconfig properties delete' DH-17733: 'dhconfig dis' improvements to handling invalid names DH-17827: dhconfig schema export NPE when exporting single non-existant schema |
267 | DH-17826: Add no-undef eslint rule to catch missing imports |
266 | DH-17616: Log completed binlog filenames to for easier maintenance |
265 | DH-17813: Generate C++ protos as part of Core+ C++ client build |
264 | DH-17730: Fix Incorrect Date Formatting on Partitioned Tables |
263 | DH-14183: Fix Panels Menu React Spectrum Bug |
262 | DH-17792: Tailer should use direct buffers. DH-17786: Correct system-tables incorrectly using user allocation properties in tailer. |
261 | DH-17743: Avoid crashing Dispatcher by hung worker |
260 | DH-17537: Fix Advanced Filters broken on tree tables |
259 | DH-17809: Fix exception in the browser console on non-running query updates |
258 | DH-14183: Fix Panel Menu Performance with Many PQs |
257 | DH-17808: Update QA-HOWTO command with sudo |
256 | DH-17749: Add Unit tests for Iceberg Integration |
255 | DH-17773: Cleanup cmake warnings in Core+ cpp-client |
254 | DH-17672: Changed ACL Editor behavior when Managed User Authentication is enabled DH-17721: Changed auth_server_reload_tool to verify ECDSA keys |
253 | DH-17177: Update deprecated JSAPI calls to use .designated |
252 | DH-17723, DH-17747: Rationalize NameResolver configuration related to authority overrides |
251 | DH-17748: Fix project coordinates for republishing |
250 | DH-17764: Fix draft de-selection when not in the viewport |
249 | DH-17751: Push of deephaven_customer docker image to repo not required |
248 | DH-17771: Updates to core+ C++ and R readme files |
247 | DH-17760: KafkaTableWriter NPE with entirely filtered data DH-17761: Unused ZoneId in TimePartitionRotation |
246 | DH-17320: Added support for almalinux (alma8 and alma9 in cluster.cnf) |
245 | DH-17748: Remove jdk8 from republishing |
244 | DH-17598: Add Iceberg table support to Core+ historical tables. |
243 | DH-17744: Remove setuptools.extern from legacy python |
242 | DH-17582: Replica selection for UI panels |
241 | DH-17254: Adjust iris start script for k8s environments |
240 | DH-17681: Release note update. |
239 | DH-17719: Fix R client after Windows changes broke it. |
238 | Merge updates from 1.20231218.446
|
237 | DH-17712: Changes to C++ and R Core+ clients for Core+ Excel Add In |
236 | DH-17700: Fixed a race condition delivering PQ status updates during startup swaps |
235 | DH-17660: Web Schedule Disable Overnight if Stop Time Disabled |
234 | DH-17558: Re-enable merge/validate queries for the ResourceUtilization table DH-17619: Improve qa-results setup instructions |
233 | DH-17681: Tailer should use pool for system tables |
232 | DH-17652: Setup default integration-test to use AutoQuery |
231 | DH-17595: Included input validation in SchemaService methods |
230 | DH-16398: Fixed client details in Audit log for Connection Disconnect Events |
229 | DH-17674: Allow specifying SAML redirect URLs separately |
228 | DH-17456, DH-17432: Add tools for managing DHE schema from Core+ Kafka workers |
227 | DH-17673: Ensure coreplus generate_scheduling generates pydocs |
226 | DH-17620: Update Core+ to 0.36.1 |
225 | DH-17648: Include deephaven-extensions-json-jackson Jar in Core+ Build |
224 | DH-17309: Add major version upgrade test |
223 | DH-17454: Fix PQ Editor display in Code Studio |
222 | DH-17638: Fixed WebClientData query Reconnects to Controller using incorrect UserContext |
221 | DH-17634: Fixed Web API Server Reconnects to Controller using Incorrect Context (cherry-pick from V+) |
220 | DH-17422: Fix deephaven.ui panels not refreshing on permissions change |
219 | DH-17622: DeferredACLTable must copy filters DH-17623: Cherry-pick Core+ Performance Overview has Bad Error Message |
218 | DH-17591: Programmatic core+ python scheduler array creation |
217 | Apply spotless |
216 | Merge updates from 1.20231218.432
|
215 | DH-17555: Additional fix for build testing |
214 | DH-17617: Forward-merge of Test Automation changes |
213 | DH-17133: Fix Concurrent Modification on Kill WebClientData |
212 | DH-17538: Fix merge/validate queries for testing |
211 | DH-17584: Allow KafkaTableWriter Transform to Skip Rows DH-17605: KafkaTableWriter transformation can swallow errors |
210 | DH-17590: Fix Chart Builder |
209 | DH-17555: Automate testing for DH16283 |
208 | DH-17383: Add dependent scheduler unit tests around the stop time |
207 | DH-17585: Remove incorrect columns from ResourceUtilization validator |
206 | DH-17255: Scheduling error restart delay should apply initialization start times as delay |
205 | DH-17572: Edit release notes for clarity. |
204 | DH-17512: ServerSelectionProvider "Administratively Down" command DH-17513: ServerSelectionProvider should back-off on servers that fail to acquire worker. DH-17536: Controller should reload Configuration after leader Election |
203 | DH-17561: KafkaTableWriter.consumeToDis not ignoring default Kafka columns |
202 | DH-17520: dhconfig schema list should return non-zero exit code when no schemas are found |
201 | DH-17501: Fix broken integration tests |
200 | DH-17410: Fix issues with skip-if-unsuccessful scheduling flag |
199 | DH-17527: Active Directory Group Synchronization |
198 | DH-17533: Fix misleading logging on MySQL dhconfig acl deletions |
197 | DH-16937: Fix Data Routing Configuration Errors to be less Verbose |
196 | DH-17501: check-deephaven-cluster should validate each authentication server individually |
195 | DH-16283: handle failure to create AEL better |
194 | DH-16398: Fixed Audit Log Formatting for Client events |
193 | DH-17071: Fix schema service error reporting for addSchema method call |
192 | DH-17477: Minikube test needs additional logging |
191 | DH-17508: dhconfig pq status ignores the --replica flag for name based filtering |
190 | DH-17409: Normalize ACL columns string order and duplicates |
189 | Update deephaven.ui version to 0.19.0
|
188 | DH-17393: Update MatPlotLib plugin to v0.5.0 |
187 | DH-16506: Fix repeat-scheduling issue |
186 | DH-17503: Update Grizzly to Core 0.35.2 |
185 | DH-17504: Fix disabled context menu items for superusers in Query Monitor |
184 | DH-17481: Fix for pydoc |
183 | DH-17438: Core+ Python Client should properly handle gRPC UNAVAILABLE |
182 | DH-17487: Create dhconfig tests for migrated controller_tool functionality |
181 | DH-17477: Minikube test needs additional logging |
180 | DH-17455: Add property to optionally specify addresses to bind to for xDS in config server |
179 | DH-17481: Add Python support for custom codecs in SystemTableLogger |
178 | DH-17436: Fix Range Schedule Modified Time Zone |
177 | DH-17442: correct return code for command line errors |
176 | DH-17473: Add --verbose flags to locations where the installer invokes dhconfig/iris |
175 | DH-17475: Updates to Release Notes |
174 | DH-17472: Fix paths in DhcInDhe/R/rdnd/docker-build.sh |
173 | DH-17442: reduce exception logging |
172 | DH-17468: Add timeout option to check-deephaven-cluster script and use it in tests |
171 | DH-17367: Deprecate Legacy ObjectCodec and use Core ObjectCodec directly. Full support for custom codecs in SystemTableLogger |
170 | DH-17436: Cleanup Web Range Schedule UI |
169 | DH-17464: Update Core to 0.35.1 |
168 | DH-17447: Fix more dhconfig obscure acl error messages |
167 | DH-17457: Remove plaintext password options from dhconfig |
166 | DH-17462: Remove explicit transitive private dependencies of Core C++ client from Core+ C++ client cmakefiles |
165 | DH-16180: Fix QM overlay z-index |
164 | DH-17449: Fix Export exporting multiple copies of the same query |
163 | DH-17436: Fix Web Range Schedule Save Bug |
162 | DH-17436: Web UI for Range Schedule |
161 | DH-17448: Display a warning on query-only operations when selection contains replicas |
160 | DH-17192: Add dhconfig pq leader tool |
159 | DH-17180: Restart Selection should only restart Selected Replica |
158 | DH-16646: Synchronize Multiple Query Monitors |
157 | DH-16506: Add range query scheduling |
156 | Merge updates from 1.20231218.385
|
155 | DH-17434: Fixed race condition on Controller worker recovery and slot reassignment |
154 | DH-17083: Import LivenessScope only under worker context |
153 | DH-17080: Revert workaround to avoid skipping second controller |
152 | DH-16011: Fixed unit test failures |
151 | DH-17165: Add unit test prop |
150 | DH-16011, DH-17271: More usability fixes for dhconfig acl |
149 | DH-17337: Controller should swap in spares when user requests individual slot restarts |
148 | DH-17165: Exclude input_table_snapshotter_tasks.txt from CUS digest |
147 | DH-17402: Fix the load balancing tab not showing on drafts |
146 | DH-17250: Add backup script option, fix possible race condition in backup process |
145 | DH-16180: Add LoadingOverlay on query selection change in Query Monitor |
144 | DH-17402: Fix the load balancing tab not displaying when it should |
143 | DH-17405: Updates to Grizzly release notes |
142 | DH-17083: Allow convenient barrage session functionality from a Core+ worker |
141 | DH-16654: Added more resilient behavior for taking over service registry leases |
140 | DH-17399: Ensure nightly build boxes are not deleted when tests fail |
139 | DH-17315: Allow Restart Delay to be Blank with Validation |
138 | DH-17312: Update Grizzly Core to 0.35.0 |
137 | DH-17385: Update Core+ plugins for Grizzly |
136 | DH-17391: Test fixes. |
135 | DH-16531: documentation updates |
134 | DH-17244: Fix Core+ user table nightly int test failures |
133 | DH-17248: Allow Dashboards to Export with Warning for Missing Queries |
132 | DH-17379: Add helm chart unit tests for k8s development |
131 | DH-12786: Move Remaining Web UI Tests from Enzyme to Testing Library |
130 | DH-17310: Display Version on Safe Mode |
129 | DH-17287: Python support for remote_table blink and filter |
128 | DH-17376: Update Core+ cpp-client docker build script help |
127 | DH-17321: Support multiple CORS origins for ACL Editor |
126 | Merge updates from 1.20231218.381
|
125 | DH-17348: Update gRPC (1.61.0 -> 1.61.1) and netty (2.0.62 -> 2.0.65) |
124 | DH-16011: Lots of quality of life fixes for dhconfig acl |
123 | Merge updates from 1.20231218.380
|
122 | DH-17203: Cluster Validation Client Script bug fixes |
121 | DH-17365: Grizzly Minikube Install Test Fix |
120 | DH-17285: Sort list of servers within Controller |
119 | DH-17364: Remove unused constant in PersistentQueryControllerClient |
118 | DH-16343: Check column partition exclusivity on set not get. |
117 | DH-17362: Update Web Version to v0.85.0 DH-17199: Fix filter by value in the tree table context menu always showing null DH-17095: Fix unable to edit cells if key columns are not first column |
116 | DH-17250: Add support for etcd auto-backup config and fix backup script on k8s |
115 | DH-17361: corrections to merge from vplus to grizzly |
114 | DH-17271: Test property file fix. |
113 | DH-17112: Fix for Core+ worker launch without intermediate script. |
112 | DH-17271: Fix for installation. |
111 | DH-12786: Move Some Web UI Tests from Enzyme to Testing Library |
110 | DH-17271: Add more safety around importing of public keys |
109 | Fix eslint and web unit tests |
108 | Spotless application. |
107 | Merge updates from 1.20231218.374
|
106 | DH-16435: Improve validation of configuration type and replica and spare usage |
105 | DH-17112: Alternate launch option for Core+ workers |
104 | DH-17125: Swing should not be able to exit PQs with more than 1 replicas or more than 0 spares |
103 | DH-17215: Fix unselectable Failed and AquiringWorker replicas DH-16180: Show spinner while updating selected query details |
102 | DH-17325: Update Core+ C++ Client Protobuf; add session manager example for connecting to existing PQ |
101 | DH-17319: Exclude rules in shadow-jetcd must filter inner classes |
100 | DH-17287, DH-17286: Support blink tables with RemoteTableBuilder, Support pre-subscription filtering |
099 | DH-17249: Python Client Cannot Handle too Many Truststore Entries |
098 | DH-17304: Implement campaign and rest of the etcd Election API client side |
097 | DH-17307: check-deephaven-cluster integration tests fail without "python" on path |
096 | DH-17306: NPE in Controller without ServerSelectionProvider |
095 | DH-17305: Fix MySQL ACLs on Grizzly |
094 | DH-17146: Fix fullStackTrace not displayed on replicas |
093 | DH-17294: Fix npm start |
092 | DH-17233: SimpleServerSelectionProvider prefers replicas on distinct hosts |
091 | DH-17251: Fix Copyright on Web UI |
090 | DH-17247: Add aria-label to Picker in Load Balancing Tab |
089 | DH-17284: Fix improper delayed command queueing in the Controller |
088 | DH-17282: Fix DerivedTableWriter partition selection |
087 | DH-17259: Fix temporary Core+ Python venv dependency leak |
086 | DH-17202: A controller taking over as leader calls observe in a a tight loop for a while and then fails |
085 | DH-17261, DH-17256, DH-17227: Fix RRAsignmentPolicy not logging reset row when clearing state, corrupting binary log file. |
084 | DH-17263: ECC Key Problems |
083 | DH-16610: Add user facing cols to Core+ DB partitioned tables |
082 | DH-17265: Fix Replica and spare APIs for web QueryInfo |
081 | DH-17133: Constrain useWebClientDataHandleConfigChange to WebClientdata events |
080 | DH-17141: Remove expired tokens in authentication server |
079 | DH-17133: PQ Status display recover from controller failover |
078 | DH-16050: Fix pending session not cleaning up on dashboard close |
077 | DH-17181: Generate separate certificate for controller |
076 | DH-17214: Controller should eagerly swap non-ready replicas for running spares |
075 | DH-17203: Cluster Validation Client Script |
074 | Merge updates from 1.20231218.344
|
073 | DH-17117 DH-17182: Complete AssignmentPolicy API / UI |
072 | DH-17162: Use dh-mirror for internal VM images |
071 | Revert DH-17141: Remove expired tokens in authentication server |
070 | DH-17166: Support ECDSA Challenge Response Authentication |
069 | DH-17141: Remove expired tokens in authentication server |
068 | DH-17231: Improve ElectionCandidate etcd auth bug workarounds |
067 | DH-17194: PQ Status for Replica Parent Row |
066 | DH-17223: Fix Web UI not building correctly |
065 | DH-17095: Cannot edit input table cells if key columns not first |
064 | DH-17222: Cannot restart failed replicas |
063 | Merge updates from 1.20231218.337
|
062 | DH-17217: Controller Timeout Configuration Unit Mismatch |
061 | Changelog correction. |
060 | DH-17179: Add Deadline for Worker Restoration on Controller Failover |
059 | DH-17197: Fix broken DeploymentGeneratorTest |
058 | DH-17167: ACL Editor - recover from WebClientData restart |
057 | DH-17129: Upgraded React Spectrum to 3.35.1 DH-17129: Upgraded DHC packages to 0.81.2 |
056 | DH-17175: Update Core+ to 0.34.3 |
055 | Fix merge conflict. |
054 | Merge updates from 1.20231218.332
|
053 | DH-17193: DeephavenNameResolver should not block the gRPC Threads DH-17139: Improve state restoration logging and parallelism. |
052 | DH-17189: Fix Query Monitor not loading |
051 | DH-17125, DH-17126: Prevent swing from editing PQs with replicas, make RoundRobin smarter about rebalancing |
050 | DH-17178: Improve Core+ user table API return values |
049 | DH-15582: UI For Selecting Python Packages and VEnvs on Community Workers |
048 | DH-17040: Update RadioGroup and ButtonGroup references |
047 | DH-17173: Minikube Install Test for Cert Manager |
046 | DH-17151: Handle upgrades from versions before nfs reduction |
045 | DH-17135: Controller needs to be more proactive checking worker states on startup |
044 | DH-17133: Refetch query config table on WebClientData restart |
043 | DH-16828: Remove extra kubernetes-only worker-overhead clause from iris-defaults.prop |
042 | DH-17116: Upgrade node to lts 20.13.1 and npm to 10.5.2 |
041 | Merge updates from 1.20231218.330
|
040 | DH-17136: Use container-specific JVM args on Kubernetes |
039 | DH-17150: Kubernetes iris-environment.prop Broken in Forward Merge |
038 | DH-17142: Fix NPE in QueryInfo.close() |
037 | Merge updates from 1.20231218.328
|
036 | DH-17119: Improve error handling in Controller during worker startup and recovery |
035 | DH-16963: Use npm ci for stable builds |
034 | DH-17130: Make republishing include jdk8 source repo for fishlib dependencies |
033 | Merge updates from 1.20231218.325
|
032 | DH-17127: Remove core proc default resources limits on Kubernetes |
031 | DH-16548, DH-17084: Add Legacy to Core+ input table converter |
030 | DH-17132: Add another case of exceptions where ElectionCandidate needs to retry |
029 | DH-17130: Make republishing target jdk11 now that jdk8 is dropped |
028 | DH-17118: Improve cluster.cnf parsing logic Merge updates from 1.20231218.323 and 1.20230511.505 DH-16829: Update worker overhead properties DH-17072: Do not write temporary-state DH_ properties to cluster.cnf DH-17026: Publish EngineTestUtils (backport of DH-15687) DH-17058: Make pull_cnf disregard java version DH-16884: Add configuration for default user table format to be parquet DH-17048: Fix controller crash and shutdown issues DH-17014: Make cluster.cnf preserve original source and prefer environment variable DH-17118: Improve cluster.cnf parsing logic DH-17066: Apply Kubernetes Control to Legacy workers in ConsoleCreator |
027 | DH-16606: Use JIT compiler limit for PQs by default |
026 | Changelog and release note updates. |
025 | Merge updates from 1.20231218.322
|
024 | Merge updates from 1.20231218.320
|
023 | DH-16436: PQ Replicas UI |
022 | DH-16436: Fix for uninitialized replicas. |
021 | DH-17099: Default Web Landing Redirect to Web UI |
020 | DH-17054: Add Assignment Policy setter to QueryInfo |
019 | DH-17091: Integrate SAML into Core Product |
018 | Merge updates from 1.20231218.318
|
017 | DH-17096: remove usage of <ButtonOld> deprecated component |
016 | DH-17097: Fix docker build with pip cache disabled |
015 | DH-16436: Webapi changes for Replica frontend |
014 | DH-17088: Split deephaven_base, Add Argument to Allow apt/pip Caching in Minikube |
013 | DH-16977: dhconfig pq should work as a "regular" user |
012 | Merge updates from 1.20231218.312
|
011 | DH-17078: PersistentQueryStateLog old format listeners do not handle ControllerHost |
010 | DH-16915: Migrate filesytem data from /etc/sysconfig to Kubernetes objects |
009 | DH-16143: Update GWT-RPC to avoid websocket reuse bug |
008 | DH-17012: dhconfig dis list should have an option to show the claimed namespaces |
007 | DH-17027, DH-16789: Fix PQs displaying error on clean shutdown and controller chattyness on disconnected clients |
006 | DH-17062: validate dhconfig actions against for the selected data type instead of globally |
005 | Fix merge issue in 004. |
004 | Merge updates from 1.20231218.308
|
003 | DH-17064: Update Grizzly to Core 0.34.2 |
002 | DH-17060: Fix internal VM image deployment task |
001 | Initial release creation |
Detailed Release Candidate Version Log: Deephaven v1.20231219beta
Patch | Details |
---|---|
161 | Apply spotless |
160 | Merge updates from 1.20230511.503
|
159 | Merge updates from 1.20231218.305
|
158 | DH-16912: Handle overflow in timeout calculation. |
157 | DH-16912: Add Option to Request/Limit Additional Memory |
156 | DH-11204: remove db_ltds from default installation |
155 | Merge updates from 1.20231218.301
|
154 | DH-17046: Controller failover/failback issues |
153 | Merge updates from 1.20231218.300
|
152 | DH-17013: ReplayDatabase and notebook imports, other cleanup. |
151 | DH-17025: Grizzly with Cert Manager causes Invalid Hostname for Verification |
150 | DH-10086: Update GWT to 2.9.0 |
149 | Merge updates from 1.20231218.296
|
148 | DH-17019: remove unused kv-tool-cli |
147 | DH-17009: Pass Schema through Core+ tables |
146 | DH-15831: Change default buffer allocation to non-direct in tailer |
145 | DH-16869: Make installer properly filter the list of machines running controllers |
144 | Merge updates from 1.20231218.292
|
143 | DH-16994: Internal Mac Docker Kubernetes Instructions DH-16998: minikube LAS is broken |
142 | DH-16661: Update Maystreet integration |
141 | Merge updates from 1.20231218.289
|
140 | Merge updates from 1.20231218.288
|
139 | DH-16972: Add Minikube Installation Test |
138 | DH-16961: Use ACL Store instead of dsakeys.txt |
137 | DH-16790: Make PersistentQueryControllerTest more reliable |
136 | DH-16981: Non-displayable queries shown in dhconfig pq status DH-16982: Controller host should be in PersistentQueryStateLog |
135 | Compile fix. |
134 | Merge updates from 1.20231218.284
|
133 | DH-16977: dhconfig pq should work as a "regular" user |
132 | Spotless fix. |
131 | Merge updates from 1.20231218.283
|
130 | DH-16978: Multiple auth server private-key validation failures |
129 | DH-16975: Controller client determines wrong controller |
128 | DH-16974: dhconfig pq is too noisy |
127 | Merge updates from 1.20231218.279
|
126 | DH-16967: Fix wrong parameter in groovy session wrapper |
125 | DH-16946: Remove Deprecated In-Worker Service PQ |
124 | DH-16545: Add Python wrapper for Core+ input tables |
123 | DH-16909: Update Core to 0.34.1 |
122 | DH-16919: Fix missing file that resulted in CUS crash. |
121 | DH-16011: Correct tests for ACL tool, allow --ignore-existing for public keys. |
120 | DH-16959: Reduce bcryptcost in etcd configuration |
119 | DH-16954: Use /usr/illumon/coreplus instead of /usr/illumon/dnd |
118 | Merge updates from 1.20231218.272
|
117 | DH-16011: Implement Acl tool within dhconfig |
116 | DH-16919: Remove /db/TempFiles NFS Volume |
115 | DH-16547: Add Core+ input table snapshotter |
114 | Merge updates from 1.20231218.265
|
113 | DH-16930: Fix issues with timeout handling in ResolverReader, DispatcherClient and ElectionCandidateImpl |
112 | DH-16799: Allow customer jars and plugins to be built into images for K8s deployments |
111 | DH-16837: Cleanup classpaths.gradle |
110 | DH-16871: Preexisting PVCs Do not Work with pre-release Hook |
109 | DH-16861: CommunityPersistentQueryHandle can leak channels when machine under load |
108 | DH-16856: Package worker client directory for query servers DH-16857: Suppress grpc exception in configuration server DH-16860: Fix phrasing in example envoy yaml files |
107 | DH-16837: Port fishlib IO to iris repo |
106 | DH-16858: Delete etcd_prop_file in k8s installation. |
105 | DH-16858: Delete etcd_prop_file |
104 | DH-16855: Enforce DH-16852 limits of 1GB (admin) and 2GB (all users). |
103 | DH-16824: Add some Parallelism to Core+ Startup and other Improvements |
102 | Merge updates from 1.20231218.250
|
101 | DH-15578: Build Python wheel with Core+ Worker Extension |
100 | Merge updates from 1.20231218.247
|
099 | DH-16843: Improve Core+ user table int tests and checkInputTableKind |
098 | Fix merge error. |
097 | Merge updates from 1.20231218.244
|
096 | DH-16820: Move dispatcher presence monitoring from job assignment to job scheduling |
095 | DH-16819: Update jpy to 0.15.0 |
094 | DH-16118: Make new dhconfig pq less noisy |
093 | Merge updates from 1.20231218.237
|
092 | DH-16801: Reduce Number of etcd Client Creations |
091 | DH-16810: Remove Legacy C# Open API Client |
090 | DH-16792: Properties changes in k8s to fix swing |
089 | DH-14125: Fix bug with late publishing of details. |
088 | Merge updates from 1.20231218.231
|
087 | DH-16785: Allow the controller to register its address in resolver as a host name instead of just as IP |
086 | Spotless |
085 | Fix javadoc compilation. |
084 | DH-16118: Migrated PersistentQueryControllerTool into dhconfig |
083 | DH-16780: Print Status Details from Python Core+ Client |
082 | DH-16770: Drop Python 3.8 from Grizzly |
081 | DH-16772: Combine PR title and Changelog Check |
080 | DH-16774: Revert spotless upgrade until we support gradle 8.X |
079 | DH-14125: Detailed worker startup logging on k8s |
078 | DH-16771: Support community-only PQ configuration types |
077 | DH-16769: Spotless should remove unused imports. |
076 | DH-16765: Add User Assignment Table to WebClientData |
075 | Merge updates from 1.20231218.228
|
074 | DH-16152: Replay Query for Core+ |
073 | DH-16633: Rebuild VM images with etcd 3.5.12 instead of 3.5.5 |
072 | Merge updates from 1.20231218.227
|
071 | DH-16756: Python Method for Generating and Uploading Keys to Deephaven Server |
070 | DH-16742: Update ci images to rocky9 |
069 | DH-16317: Installer jars should all target jdk8 |
068 | Merge updates from 1.20231218.225
|
067 | DH-16743: Better parallelization of gradle build |
066 | DH-16730: Ensure override authority works for ManagedChannelFactory via resolver |
065 | fix previous merge |
064 | Merge updates from 1.20231218.217
|
063 | DH-16435: Back end support for PQ replicas and spares. |
062 | DH-16496: make TDCP (and other table data services in general) responsive to data routing changes |
061 | DH-16357: Cache ACLs and add watcher. |
060 | Merge updates from 1.20231218.212
|
059 | Merge updates from 1.20231218.210
|
058 | DH-16683: Port fishlib base stats to iris |
057 | DH-16343: Utilities for reliable creation of live derived data DH-16651: improvements to the Core+ Python Controller client |
056 | DH-16684: Fix swing trying to resolve using etcd |
055 | DH-16674: Remove fishlib hash dependency |
054 | DH-16531: make DIS filtering responsive to routing changes |
053 | DH-16673: Do manual auth retries in ElectionCandidateImpl |
052 | Merge updates from 1.20231218.205
|
051 | DH-16268: Port Net and Stats from fishlib and eliminate AnomalyClient dependency |
050 | Merge updates from 1.20231218.202
|
049 | DH-16641: Configuration server error due to missing controller.port property on k8s |
048 | DH-16587: Converge install and upgrade hook scripts in helm/Kubernetes |
047 | DH-16643: Update Grizzly to Core 0.33.2 |
046 | DH-16629: Replace jetcd campaign workarounds with fix in jetcd implementation |
045 | DH-16639: Use Resolver to create connection.json in Internal Worker Python Client |
044 | Merge updates from 1.20231218.197
|
043 | DH-16577: ConfigurationUtils.getGradleVersion() returns "undetermined" on Core+ |
042 | DH-16615: Ensure dh-resolver changes get propagated via config server xDS |
041 | DH-16619: Installer fails with missing controller property |
040 | DH-16607: Installer to support more than one controller host |
039 | DH-16434: Use auto-server-selection by default DH-16526: Use better auto-group dispatcher heap default if none is provided |
038 | DH-16431: Remove Java 8 from Grizzly build. |
037 | DH-16603: Force dh-resolver registrations to use IP address literals, avoid depending on another round of DNS |
036 | Merge updates from 1.20231218.185
|
035 | DH-16376: Store Etcd client creds in secrets and mount only in relevant pods in k8s |
034 | DH-16590: Fix java 11 compilation error in ElectionCandidateImpl |
033 | DH-16433: Controller Leader Election |
032 | DH-16574: Prevent a dependency from configuration server to start on controller having started |
031 | DH-16572: Provide a way to override the default dh-resolver via properties |
030 | DH-16560: Support controller clients resolving without etcd |
029 | DH-14838: Add Core+ input tables |
028 | DH-16558: Fix ResolverReader javadoc |
027 | Merge updates from 1.20231218.165
|
026 | DH-16558: Make ResolverReader more tolerant of registration races |
025 | DH-16473: Implement Etcd based gRPC resolver and switch the controller service to it |
024 | DH-16511: Use round_robin load balancer policy for authentication server |
023 | DH-16510: Use round_robin load balancer policy for etcd clients. |
022 | Merge updates from 1.20231218.146
|
021 | Fix bad merge resolution. |
020 | Merge updates from 1.20231218.144
|
019 | DH-16430: Drop Centos7, Use Rocky9 as Default |
018 | Merge updates from 1.20231218.130
|
017 | DH-16380: Update Envoy to 1.29, gRPC to 1.61, jetcd to 0.7.7 |
016 | Merge updates from 1.20231218.127
|
015 | Merge updates from 1.20231218.126
|
014 | Merge updates from 1.20231218.124
|
013 | Merge updates from 1.20231218.112
|
012 | DH-16050: No prompt to confirm when clicking the "X" to close a running Code Studio |
011 | Merge updates from 1.20231218.102
|
010 | Merge updates from 1.20231218.080
|
009 | DH-16155: Support ACLs for non-existing namespaces and table names |
008 | Merge updates from 1.20231218.034
|
007 | DH-15689: Automate creation of WorkspaceSnapshot query |
006 | Merge updates from 1.20231218.034
|
005 | Add Case for rc/grizzly in CL Check Action |
004 | Fix forward merge Python beta patch version. |
003 | Fix forward merge Python beta patch version. |
002 | Merge updates from 1.20231218.007
|
001 | Initial release creation from 1.20231218 |
Automatic Server Selection
Automated server-selection is now turned on by default on new non-Kubernetes installations. To turn it off or change the defaults, edit your iris-environment.prop
file and remove or edit the properties from the iris_controller
stanza, as described in automated server selection.
Changes to tailer process memory requirements
Tailer now uses constrained and pre-allocated buffer pools for both User and System tables
With default configuration, the tailer now uses constrained and pre-allocated buffer pools for sending data to Data Import Servers. This makes the tailer's memory consumption more predictable, and avoids potential out-of-memory conditions. There are separate pools for User and System tables, governed by the following properties:
Property | Default | Description |
---|---|---|
DataContent.userPoolCapacity | 128 | The maximum number of User table locations that will be processed concurrently. If more locations are created at the same time, the processing is serialized. |
DataContent.producerBufferSize.user | 256 * 1024 | The size in bytes of the buffers used to read and send data for User table locations. |
DataContent.disableUserPool | false | If true, user table locations are processed without a constrained pool, in which case user actions can consume unbounded tailer resources. |
DataContent.systemPoolCapacity | 128 | The maximum number of System table locations that will be processed concurrently. If more locations are created at the same time, the processing is serialized. |
DataContent.producerBufferSize | 256 * 1024 | The size in bytes of the buffers used to read and send data for System table locations. |
DataContent.disableSystemPool | false | Disable the system pool, which results in an unconstrained number of buffers being used in the tailer. |
DataContent.producersUseDirectBuffers | true | Producers (the tailer) use a direct buffer for reading and sending data. |
The tailer allocates two pools of buffers, one for user tables and one for system tables. Each item in that pool requires two buffers for concurrency, so the memory required will be double the buffer size times the pool capacity.
Total direct memory required for the tailer is approximately 2 * (DataContent.producerBufferSize * DataContent.systemPoolCapacity + DataContent.producerBufferSize.user * DataContent.userPoolCapacity)
.
Tailer now adjusts to DIS buffer size
When the tailer establishes a connection to a DIS, the processes now exchange configuration information. If the tailer is configured with a larger maximum message size than the DIS can accept, it will reduce it's maximum message size to match the DIS.
Change configuration to handle older tailers sending data to newer Data Import Servers
This section applies only to environments using default configuration values, and where tailers are outside the Deephaven system, either tailing from external systems or from different Deephaven installations. In this scenario, the tailer may send messages that are too large for the DIS to accept.
Older tailers do not accommodate to DIS settings, and may have default configuration settings that allow them to send larger messages than a DIS with default configuration can accept. This typically only happens when the tailer gets behind, or otherwise has file to process with more than 327,710 bytes.
When this happens, the DIS will reject the tailer with a message like:
WARN - DataImportServer-db_dis-TailerConnection-/127.0.0.1:55638-FullTableLocationKey{TableLookupKey.Immutable[Namespace/table/U],TableLocationLookupKey.Immutable[internal_partition/date]}:Rejecting and closing channel: Protocol error while processing stream: Message size 4400021 exceeds the maximum 327710. The client is configured with a larger maximum message size than this server. Check parameters DataContent.producerBufferSize and DataContent.consumerBufferSize. This most likely applies to the DIS and tailer processes.
A similar message might appear in the tailer log.
When an old tailer sends data to a new DIS, the DIS will log a warning like this:
WARN - Received TableIdentifierDataItem with old version: 4. The tailer will not protect against causing a read buffer overflow.
If your environment includes older tailers that send data to current DISes, you should make one of the following configuration changes:
- update the tailers to the current Deephaven version
- update the tailer configurations to set the max message size, by setting the following properties in the tailer configuration or properties file:
DataContent.producerBufferSize = 262144
DataContent.producerBufferSize.user = 262144
- update the DIS max message size to match the tailer max message size by setting the following properties in the DIS configuration or properties file:
DataContent.producerBufferSize = 2097160
Note
The default max messages size from the tailer is reduced as of 20240517.262. Tailers respond to DIS configuration settings as of 20240517.403.
Database.inputTableUpdater improvements
The Database.inputTableUpdater
method has been updated to return a new, more general interface
io.deephaven.enterprise.database.inputtables.InputTableUpdater
, which allows callers to have more control over table
lineage with input tables. The InputTableUpdater
can also be explicitly managed with a liveness scope, and is also
now more efficient in caching intermediate operations.
When sourcing both an InputTableUpdater
and an input Table
view from db
, the resulting objects may not share a
common lineage:
def myTableUpdater = db.inputTableUpdater("MyNamespace", "MyTable")
def myTable = db.inputTable("MyNamespace", "MyTable")
myTableUpdater.add(new_data)
// myTable is not guaranteed to have new_data
With the new interface, callers can explicitly derive the corresponding input Table
view from the InputTableUpdater
:
def myTableUpdater = db.inputTableUpdater("MyNamespace", "MyTable")
def myTable = myTableUpdater.table()
myTableUpdater.add(new_data)
// myTable is guaranteed to have new_data
Note, the python layer does not currently expose a stand-alone equivalent of InputTableUpdater
;
an input Table
view must be created, and in that context, table lineage is already guaranteed.
my_table = db.input_table("MyNamespace", "MyTable")
my_table.add(new_data)
# my_table is guaranteed to have new_data
Breaking API change for Core+ C++ Client
We have changed the map type used by
deephaven_enterprise::controller::Subscription
.
This map type is defined in the typedef
deephaven_enterprise::controller::Subscription::map_type
and it is
used in an out parameter of
deephaven_enterprise::controller::Subscription::Current()
.
The important user-visible change is that the map's find()
operation now
returns a const iterator that interoperates with its begin()
and end()
operations. This makes it behave more like C++'s familiar std::map container.
Previously, our find()
operation returned a pointer to the mapped type
rather than an iterator.
The following example code shows the new usage of find.
Assuming this code:
std::int64_t version;
Subscription::map_type map;
if (!sub.Current(&version, &map)) {
std::cout << "Subscription closed\n";
return;
}
Old API:
const auto *pq_info_ptr = map.find(pq_serial);
if (pq_info_ptr == nullptr) {
std::cout << "pq_serial not found\n";
}
auto pq_info = *pq_info_ptr;
New API: find()
now returns an iterator
auto it = map.find(pq_serial);
if (it == map.end()) {
std::cout << "pq_serial not found\n";
}
auto pq_info = it->second;
Add dhconfig properties delete
command
The dhconfig properties
now supports a delete
command, which can be used to delete a properties file from etcd.
usage: dhconfig properties delete [--configfile <arg>] [--diskprops] [--etcd] [-f <arg>] [-h] [-k <arg> | -user <arg>]
[-pf <arg>] [-v]
Delete properties files from the system.
Options are as follows:
--configfile <arg> use the named properties file instead of the default
--diskprops read properties from the classpath, instead of etcd or the configuration service
--etcd use etcd directly, instead of configuration service (when combined with --diskprops, property
files are read from disk but written to etcd)
-f,--file <arg> specify the properties files to delete
-h,--help print help for a properties command
-k,--key <arg> specify a private key file to use for authentication
-pf,--pwfile <arg> specify a file containing the base64 encoded password for the user that is set with --user
-user,--user <arg> specify a user for authentication
-v,--verbose print additional logging, progress messages, and full exception text
Examples:
Delete my-prop-file.prop:
dhconfig properties delete --file my-prop-file.prop
Enable Systemic object tracking for Core+ workers
Systemic Object tracking has been enabled by default for Core+ workers. It increases the robustness of Persistent Queries by separating systematically important objects from user created ones, reducing the chance that misbehaving user requests cause outages. This default aligns Core+ worker behavior with Legacy worker behavior
Core+ updated to 0.37.0
The Core+ integration has been updated to Core version 0.37.0. As part of this update the Core+ Java Client libraries no longer support Java 8. Supported versions are now 11 and 17.
Python 3.8 is the oldest supported Python version
Even though Python 3.8 has already reached EOL, on some versions of Deephaven, this is the newest built + tested version of Python.
As of Bard version 1.20211129.426, Python 3.8 is the only Python version built, and iris-defaults.prop changes the default from Python 3.6 to 3.8.
If you still have virtual environments setup with Python 3.6 or 3.7, you should replace them with Python 3.8 venvs. To use newer versions of Python, upgrade to a newer version of Deephaven.
For legacy systems, you can change the default back to Python 3.6 by updating your iris-environment.prop
to set the various jpy.*
props to the values found in iris-defaults.prop
, inside the jpy.env=python36
stanza:
# Legacy python3.6 locations:
jpy.programName=/db/VEnvs/python36/bin/python3.6
jpy.pythonLib=/usr/lib64/libpython3.6m.so.1.0
jpy.jpyLib=/db/VEnvs/python36/lib/python3.6/site-packages/jpy.cpython-36m-x86_64-linux-gnu.so
jpy.jdlLib=/db/VEnvs/python36/lib/python3.6/site-packages/jdl.cpython-36m-x86_64-linux-gnu.so
The new iris-defaults.prop
python props are now:
# New iris-defaults.prop python3.8 locations:
jpy.programName=/db/VEnvs/python38/bin/python3.8
jpy.pythonLib=/usr/lib/libpython3.8.so
jpy.jpyLib=/db/VEnvs/python38/lib/python3.8/site-packages/jpy.cpython-38-x86_64-linux-gnu.so
jpy.jdlLib=/db/VEnvs/python38/lib/python3.8/site-packages/jdl.cpython-38-x86_64-linux-gnu.so
Changes to Barrage subscriptions in Core+ Python workers
The methods subscribe
and snapshotTable
inside deephaven_enterprise.remote_table
have been changed to return a Python deephaven.table.Table
object instead of a Java io.deephaven.engine.table.Table
object. This allows users to use the Python methods update_view
, rename_columns
, etc. as expected without wrapping the returned table.
Existing Python code that manually wrapped the table or directly called the wrapped Java methods must be updated.
Example of previous behavior:
from deephaven_enterprise import remote_table as rt
table = rt.in_local_cluster(query_name="SubscribePQ", table_name="my_table").snapshot()
table = table.updateView("NewCol = random()")
Example of new behavior:
from deephaven_enterprise import remote_table as rt
table = rt.in_local_cluster(query_name="SubscribePQ", table_name="my_table").snapshot()
table = table.update_view("NewCol = random()")
Upgrading from releases before Grizzly 1.20240517.273
If this is your first upgrade from a release older than Grizzly 1.20240517.273, you must make some manual changes to your
iris-environment.prop
file before running helm upgrade
or the dh_helm
script. These changes are not automated
because this file includes changes made by customers that should not be overwritten.
Create a bash session on the management shell. For example:
kubectl exec -it deploy/management-shell -- bash
Export and edit iris-environment.prop
:
/usr/illumon/latest/bin/dhconfig properties export -f iris-environment.prop -d /tmp/
vi /tmp/iris-environment.prop
Remove these entries entirely. Note that these entries are not grouped together in the file, and some of them may not be
present depending on what version you are upgrading from. If they are present, the first three are likely
to be within the first 50 lines of your file, the two final
properties are likely to be found within a scoping block
about halfway down, and the remaining entries would be found toward the end of the file:
Kubernetes.chart.name={{ .Chart.Name }}
Kubernetes.release.name={{ .Release.Name }}
Kubernetes.release.namespace={{ .Release.Namespace }}
final configuration.reload.userGroups=superuser
final PersistentQueryController.keyPairFile=/etc/sysconfig/deephaven/syskey/priv-controllerConsole.base64.txt
RemoteQueryDispatcher.workerStartupTimeoutMS=240000
RemoteQueryDispatcher.workerControlType=Kubernetes
EmbeddedDbConsole.remoteDatabaseRequestTimeoutMS=60000
Kubernetes.deployment=true
Kubernetes.start-worker-timeout-seconds=240
Kubernetes.query-worker-k8s-template=/configs/k8s-query-worker-template.yaml
Kubernetes.start-worker-dnd-timeout-seconds=240
Kubernetes.query-worker-dnd-k8s-template=/configs/k8s-query-worker-coreplus-template.yaml
Webapi.server.cus.enabled=true
Webapi.server.cus.home=/etc/sysconfig/deephaven/cus
[service.name=web_api_service] {
BusinessCalendar.storeRawData=true
}
RemoteProcessingRequestProfile.Xms.G1 GC=$RequestedHeap
RemoteQueryDispatcher.JVMParameters=-XX:+AlwaysPreTouch
BinaryStoreWriterV2.allocateDirect=false
authentication.server.localsuperusers.file=/etc/sysconfig/deephaven/superusers.txt
Import the modified file:
/usr/illumon/latest/bin/dhconfig properties import -f /tmp/iris-environment.prop
You are now ready to run the helm upgrade
command or the dh_helm
script to perform the upgrade.
Managed User Authentication flag is required for local password authentication
When the property iris.enableManagedUserAuthentication
is set to true, user passwords may be stored in the ACL store (etcd or MySQL). In previous versions of Deephaven, if this was set to false existing passwords could still be used for authentication. When the property is set to false, the authentication server now rejects the password authentication. If you have changed this property to false
from the default value of true
, users must have alternative means of authenticating to the system (e.g., SAML or Active Directory) to use Deephaven with a password.
Added support for Iceberg tables in Core+
Core+ workers now support reading Iceberg tables as historical tables. To configure Iceberg, you must configure an Iceberg Endpoint, then discover and deploy a schema.
Configuring an IcebergEndpoint
The first step to link Iceberg tables into Deephaven is configuring an IcebergEndpoint
, which we refer to as simply an endpoint. The endpoint contains the parameters required to locate and connect to the Iceberg catalog, the data warehouse, and the storage-specific parameters required to read the data.
An endpoint is configured using the example below.
import io.deephaven.enterprise.iceberg.IcebergTools
import io.deephaven.extensions.s3.S3Instructions
// Create a new endpoint
endpoint = IcebergTools.newEndpoint()
.catalogType("rest") // The catalog is a REST catalog
.catalogUri("http://mydata.com:8181") // located at this URI.
.warehouseUri("s3://warehouse/") // The data warehouse is an S3 warehouse at this URI
.putProperties("s3.access-key-id", "my_access_key", // These are the properties required by the Iceberg API.
"client.region" , "us-east-1") // See https://iceberg.apache.org/docs/nightly/configuration/#configuration
.putSecrets("s3.secret-access-key", "s3.key") // Include any named secrets
.dataInstructions(S3Instructions.builder() // Configure the S3 data parameters
.regionName("us-east-1")
.build())
.build("my_company_iceberg"); // Explicitly name the endpoint.
from deephaven_enterprise import iceberg
from deephaven.experimental import s3
# Create the data instructions for reading data.
s3i = s3.S3Instructions(region_name="us-east-1")
# Create a new endpoint
endpoint = iceberg.make_endpoint("rest", \ # The catalog is a REST catalog
"http://mydata.com:8181", \ # Located at this URI
"s3://warehouse/", \ # The data warehouse is an S3 warehouse at this URI
s3i, \ # Attach the data instructions
endpoint_name="my_company_iceberg", \ # Explicitly name this endpoint
properties={"s3.access-key-id" : "my_access_key", "client.region" : "us-east-1"}, # Set Iceberg configuration properties. See https://iceberg.apache.org/docs/nightly/configuration/#configuration
secrets={"s3.secret-access-key" : "s3.key"}) # Include any named secrets
Properties
The properties
component of the endpoint is a key-value map of Iceberg configuration parameters to their values. Valid property keys can be found in the Iceberg documentation at https://iceberg.apache.org/docs/nightly/configuration/#configuration.
Secrets
The secrets
component is also a key, value map where the keys are Iceberg configuration properties, and the values are named references to secrets stored within Deephaven so you do not need to include secrets in script text.
When needed, the secrets are retrieved from Deephaven and merged into the properties
before being used to access Iceberg. Secrets may be stored either in the Deephaven configuration file as a property or as a JSON map in a protected file on disk. More sophisticated secret stores are possible. Contact support for more information.
Secrets providers are visited in ascending priority order until one supplies a value or none can be found.
From Properties
Secrets may be stored in Deephaven configuration files as a simple property; for example, s3.access_key=1234secret4321
. The default priority of the Properties secrets provider is 100 and can be adjusted using the property PropertiesSecretsProvider.priority
.
From Files
Secrets may be stored in files on disk containing a simple JSON map. This format is more secure and better supports more complex secret values. You may configure multiple secrets files and their priorities using these properties:
Property | Description |
---|---|
FileSecretsProvider.name.path | The path to the secrets file for the provider name |
FileSecretsProvider.name.priority | The priority of the secrets provider name |
You may provide as many of these as you need, ensuring that each name is unique.
An example file:
{
"s3.key" : "some_secret_key",
"secret_url" : "https://verysecret.com:9001",
"complicated" : "<Secret type=\"important\"><Internal>Secrecy</Internal></Secret>"
}
Deployment
The endpoint can be deployed to Deephaven configuration as long as a name has been provided. Once deployed, you may reference the endpoint by name in the schema for Iceberg tables to avoid duplication.
// Deploy the endpoint to Deephaven configuration, failing if it already exists.
endpoint.deploy(false)
# Deploy the endpoint to Deephaven configuration, Overwriting if it already exists.
# The overwrite_existing parameter defaults to False. The deployment will fail if the endpoint already exists.
endpoint.deploy(overwrite_existing=True)
Discovering an Iceberg table
Once an endpoint has been configured, you can discover an Iceberg endpoint to create and deploy a Deephaven schema. If you have previously configured and deployed an endpoint, you can retrieve it by name as well.
import io.deephaven.enterprise.iceberg.IcebergTools
// Load an endpoint that was already configured
endpoint = IcebergTools.getEndpointByName("my_company_iceberg")
// Discover a table derive the schema and deploy it, deriving the namespace and table name
// from the table identifier, referencing the endpoint by name.
discovery = IcebergTools.discover(DiscoveryConfig.builder()
.tableIdentifier("market.trades")
.endpoint(endpoint)
.build())
discovery.deployWithEndpointReference()
from deephaven_enterprise import iceberg
# Load an endpoint that was already configured
endpoint = iceberg.get_named_endpoint("my_company_iceberg")
# Discover a table derive the schema and deploy it, deriving the namespace and table name
# from the table identifier, referencing the endpoint by name.
result = iceberg.discover("market.trades", endpoint)
result.deploy_named()
In the examples above, the Deephaven namespace and table name are derived directly from the Iceberg table identifier. You may specify your own by setting the namespace
and tableName
properties during discovery.
discovery = IcebergTools.discover(DiscoveryConfig.builder()
.tableIdentifier("market.trades")
.namespace("MarketUS")
.tableName("EqTrades")
.endpoint(endpoint)
.build())
result = iceberg.discover(
"market.trades", namespace="MarketUS", table_name="EqTrades", endpoint
)
Complete Examples
Below are complete examples that create an endpoint, discover a table, deploy a schema, and then fetch the table.
import io.deephaven.enterprise.iceberg.IcebergTools
import io.deephaven.enterprise.iceberg.discovery.DiscoveryConfig
import io.deephaven.extensions.s3.S3Instructions
// Create a new endpoint
endpoint = IcebergTools.newEndpoint()
.catalogType("rest")
.catalogUri("http://mydata.com:8181")
.warehouseUri("s3://warehouse/")
.putProperties("s3.access-key-id", "access_key",
"client.region" , "us-east-1")
.putSecrets("s3.secret-access-key", "s3.key")
.dataInstructions(S3Instructions.builder()
.regionName("us-east-1")
.build())
.build("my_company_iceberg");
endpoint.deploy(true)
discovery = IcebergTools.discover(DiscoveryConfig.builder()
.tableIdentifier("market.trades")
.endpoint(endpoint)
.build())
data = db.historicalTable("market", "trades")
from deephaven_enterprise import iceberg
from deephaven.experimental import s3
s3i = s3.S3Instructions(region_name="us-east-1")
endpoint = iceberg.make_endpoint(
"rest",
"http://mydata.com:8181",
"s3://warehouse/",
s3i,
endpoint_name="my_company_iceberg",
properties={"s3.access-key-id": "access_key", "client.region": "us-east-1"},
secrets={"s3.secret-access-key": "s3.key"},
)
endpoint.deploy(True)
result = iceberg.discover("market.trades", endpoint)
result.deploy_named()
data = db.historical_table("market", "trades")
Tailer Pool for System Tables
The tailer now uses data content pools for system tables, making the system table behavior controlled in a similar fashion to user table memory parameters. Before this change, a tailer servicing many system table partitions, could consume unbounded memory. The configuration properties that control the pool size are:
Property | Default | Meaning |
---|---|---|
DataContent.systemPoolCapacity | 128 | The number of items in the system pool, which is the number of system locations the tailer concurrently processes. |
DataContent.disableSystemPool | false | Disable the system pool, which results in an unconstrained number of buffers being used in the tailer. |
DataContent.producersUseDirectBuffers | true | Producers (the tailer) use a direct buffer for reading and sending data. |
The data buffers are stored in either direct memory or heap memory, depending on the value of the
DataContent.producersUseDirectBuffers
property. Direct memory is recommended, because when passed an
on-heap buffer the JVM allocates a new direct buffer while reading data from the file. If you adjust the
size of the pool, then you should also adjust the tailer's maximum heap size or maximum memory size. The
size of each buffer defaults to 256KB, but can be controlled by the DataContent.producerBufferSize
property.
Vermilion+ Core+ updated to 0.35.2
Vermilion+ 1.20231218.440 includes version 0.35.2 of the Deephaven Core engine. This is the same version that ships with Grizzly in 1.20240517.189, enabling customers to have one Core engine version of overlap between major Deephaven Enterprise releases. Although the Core engine functionality is the same in 0.35.2, the Grizzly Core+ worker has several enhancements that are not available in the Vermilion+ Core+ worker. This change also updates grpc to 1.61.0
For details on the Core changes, see the following release notes:
Changes to vector support for Core+ user tables
Both the Legacy and Core engines have special database types to represent arrays of values. The Legacy engine uses the DbArray class, while the Core system uses the Vector class. While these implementations represent identical data, they pose challenges for interoperability between workers running different engines.
When a user table is written, the schema is inferred from the source table. Previously, Vectors would be recorded verbatim in the schema. This change explicitly encodes Vector types as their base java array types as follows.
Vector Class | Converted Schema Type |
---|---|
ByteVector | byte[] |
CharVector | char[] |
ShortVector | short[] |
IntVector | int[] |
LongVector | long[] |
FloatVector | float[] |
DoubleVector | double[] |
Vector<T> | T[] |
This makes it possible for the Legacy engine to read User tables written by the Core engine. Note that no conversion is made when the Legacy engine writes DbArray types because the Core+ engine already supports those types.
If you want your User table array columns to be Vector types, use an .update() or .updateView() clause to wrap the native arrays.
staticUserTable = db.historicalTable("MyNamespace", "MyTable")
.update("Longs = (io.deephaven.vector.LongVector)io.deephaven.vector.VectorFactory.Long.vectorWrap(Longs)")
Option close Tailer-DIS connections early, while continuing to monitor files
A new property is available to customize the behavior of the Tailer.
log.tailer.defaultIdlePauseTime
This property is similar to log.tailer.defaultIdleTime
, but it allows the Tailer to close connections early while continuing to monitor files.
When the idle time specified by log.tailer.defaultIdleTime
has passed without any changes to a monitored file, the Tailer will close the corresponding connection to the DIS, and will not process any further changes to the file. The default idle time must therefore be as long as the default file rollover interval plus some buffer.
The new property enables a new feature. When the time specified by log.tailer.defaultIdlePauseTime
has passed without any changes to a monitored file, the Tailer will close the corresponding connection to the DIS, but will continue to monitor the file for changes. If a change is detected, the Tailer will reopen the connection and process the changes.
This will reduce or more quickly reclaim resources consumed for certain usage patterns.
Helm Chart Tolerations, Node Selectors and Affinity
You can now add tolerations, node selection, and affinity attributes to pods
created by the Deephaven Helm chart. By default, no tolerations, selectors or
affinity are added. To add tolerations to all created deployments, modify your
values.yaml
file to include a tolerations block, which is then copied into
each pod. For example:
tolerations:
- key: "foo"
operator: "Exists"
effect: "NoSchedule"
- key: "bar"
value: "baz"
operator: "Equal"
effect: "NoSchedule"
Adds the following tolerations to each pod (in addition to the default tolerations provided by the Kubernetes system):
Tolerations: bar=baz:NoSchedule
foo:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Similarly, you can add a nodeSelector
or affinity block
:
nodeSelector:
- key: "foo"
operator: "Exists"
effect: "NoSchedule"
- key: "bar"
value: "baz"
operator: "Equal"
effect: "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: label
operator: In
values:
- value1
Which result in pods containing node selectors like:
Node-Selectors: key1=value1
key2=value2
And affinity as follows:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: label
operator: In
values:
- value1
weight: 1
Ability to disable password authentication in front-end
A new property, authentication.client.disablePasswordAuth=true
, may be used to remove the username/password authentication option from the Swing front-end. The property has no effect if there are no other login-options available.
This property does not disable username/password authentication at the server level (see Disabling password authentication).
Allow config to override ServiceRegistry hostname
The hostname which Data Import Server (DIS) registers with the service registry may now be defined in the host
tag within the DIS' routing endpoint
of the routing configuration; or using the new ServiceRegistry.overrideHostname
system property. The precedence for the service registry host is from:
- The routing endpoint configuration. Prior to this change, the
host
value within the endpoint configuration was ignored. ServiceRegistry.overrideHostname
property.- On Kubernetes, the worker's service's hostname.
- On bare metal, it is the result of the Java
InetAddress.getLocalHost().getHostName()
function.
Selection provider marking a query server as down
You can temporarily remove a query server from the list of servers that the
controller uses for the Auto_Query
and Auto_Merge
groups. This allows the
administrator to prevent new queries or consoles from being automatically
assigned to that query server. The query server can still be selected manually
and existing running queries are not evicted. This can be useful, for example,
if a query server is malfunctioning, and you would like to remove it from the
rotation, but still have access to it for debugging. To mark Query_1
as
administratively down, use dhconfig pq selection-provider
:
dhconfig pq selection-provider --command down --node Query_1
To mark the server as up:
dhconfig pq selection-provider --command up --node Query_1
The down
state of a server is not persisted; on controller restart, all servers
are marked as up. To permanently remove a server, you should re-run the
Deephaven installer or edit the iris-endpoints.prop
file and reload the
controller.
Note: these commands are only relevant if you have the default SimpleServerSelectionProvider enabled.
Server Selection Provider Backoff
If an individual server fails to acquire workers, it often becomes the least-loaded server, resulting in the SimpleServerSelectionProvider assigning all new workers to that server. A new backoff policy has been introduced to prevent immediately assigning a worker to a server with an acquisition failure. Before assigning another worker to that server, the selection provider ensures that each of the other servers has had a worker assigned to them. Once a worker is successfully assigned to the query server, the state is cleared.
This new behavior prevents the algorithm from getting "stuck" assigning workers to the failed server but does periodically attempt to assign workers to handle transient failures or misconfigured queries.
To enable the new backoff behavior, set the configuration property:
SimpleServerSelectionProvider.FailureBackoffPolicy=ALL_OTHERS_AFTER_ACQUISITION_FAILURE
The default behavior remains the same as previous versions. To explicitly set
this behavior, set the property to NONE
:
SimpleServerSelectionProvider.FailureBackoffPolicy=NONE
Active Directory Group Synchronization
As with SAML, you can now synchronize groups from an Active Directory server to your Deephaven installation.
After a user authenticates with the Active Directory server, if group synchronization is enabled, the Authentication Server examines the memberOf
attribute of the user's record. If the groups from Active Directory do not match the user's Deephaven groups, the relevant group names are added to or removed from the ACL store using the ACL write server.
The following properties control the behavior of group synchronization:
Property | Default | Description |
---|---|---|
authentication.server.ldap.synchronizeGroups | false | If true, a user's groups are synchronized on user login. |
authentication.server.ldap.sync.ignoregroups.directory | Groups to ignore from the directory server. | |
authentication.server.ldap.sync.ignoregroups.dh | Deephaven groups to ignore when removing excess groups. | |
authentication.server.ldap.sync.mapgroup.<directory_group> | Each directory group can be mapped to one or more Deephaven groups by adding additional properties. | |
authentication.server.ldap.sync.aclwriteuser | iris | The username for ACL writes. |
Column ACL Normalization
In previous versions, column ACLs were not normalized, meaning a system could contain column ACL keys that only differed by column order or duplicates, e.g.:
Group| Namespace| TableName| Columns| Filter
----------+----------+---------------+----------+------------------------------------------------
test |DbInternal|ProcessEventLog|b1 |new UsernameFilterGenerator("EffectiveUser")
test |DbInternal|ProcessEventLog|b1,b1,b1 |new UsernameFilterGenerator("AuthenticatedUser")
test |DbInternal|ProcessEventLog|b1,b2,b3 |new UsernameFilterGenerator("EffectiveUser")
test |DbInternal|ProcessEventLog|b3,b2,b1 |new UsernameFilterGenerator("AuthenticatedUser")
Column specifications "b1" and "b1,b1,b1" refer to the same columns, as do "b1,b2,b3" and "b3,b2,b1". This is problematic, particularly when effectively equivalent column ACL keys have different filter values, as shown above. In this case, behavior is undefined.
The ACL write server attempts to normalize column ACLs on startup automatically. However, if there are effectively equivalent column ACLs with different filter values, automatic normalization fails with a report of the problematic column ACLs in the ACL write server logs. At this point, the Deephaven administrator must:
- Set property
DbAclWriteServer.startup.normalizeColumnAcls=false
. - Restart the ACL write server.
- Remove problematic column ACLs.
- Set property
DbAclWriteServer.startup.normalizeColumnAcls=true
. - Restart the ACL write server.
Continuing to run with de-normalized column ACLs is not recommended.
As always, before a software upgrade, you should backup your ACL database.
New Persistent Query range scheduling
A new Range
scheduling option allows date/time ranges. A single start date/time and a single stop date/time can be specified.
- If the start date/time is not specified, the Persistent Query is started immediately.
- If the stop date/time is not specified, the Persistent Query is not stopped once it is started.
Removal of dhconfig base64 password options
The dhconfig tool no longer accepts the --password option for plaintext passwords to help reduce the chance of leaking secrets into shell history or log files. Interactive password authentication is still supported. Private key authentication should be used for non-interactive operations.
Managing users, groups, and ACLs with dhconfig
The dhconfig
tool can now manage users, groups and ACLS using the subcommand acls
. For example, you can add users to groups with:
dhconfig acl groups add-member --name user1 user2 --group groupA groupB
Complete help is available by running dhconfig acls --help
. This example demonstrates one of the improvements enabled by using dhconfig
; the old iris_db_user_mod
tool cannot add multiple users to multiple groups in a single command.
All existing operations that were performed by /usr/illumon/latest/bin/iris
iris_db_user_mod
have been integrated into dhconfig
, and the old iris_db_user_mod
tool will be removed in a future release.
Run dhconfig acls --help
for more detailed help. Each operation provides additional help information for its specific arguments; for example: dhconfig acls user --help
provides help for user management commands.
DataImportServers respond to data routing changes
DataImportServers now respond to data routing changes. If the data routing configuration is changed (via the dhconfig routing
or dhconfig dis
commands), the DIS receives a notification of the change. The only change it can respond to is a change in the filters, either explicit changes to the filter or changes implied by claims
made by this or other DISes. If any other configuration change is detected, the DIS prints error messages indicating that the configuration is out of date until it is restarted.
Disable changes (other than logging) with property:
DataImportServer.ignoreRoutingConfigChanges=true
DIS interface changes may require script updates
The DataImportServer interface changes may require script changes. Please see the sections below.
Deprecated methods
The following deprecated methods in com.illumon.iris.db.tables.dataimport.logtailer.DataImportServer
have been removed:
DataImportServer(
com.fishlib.io.logger.Logger,
com.illumon.iris.db.v2.routing.DataImportServiceConfig,
com.fishlib.configuration.Configuration)
DataImportServer.getDataImportServer(
com.fishlib.io.logger.Logger,
com.illumon.iris.db.v2.routing.DataImportServiceConfig,
com.fishlib.configuration.Configuration,
com.illumon.iris.db.schema.SchemaService)
DataImportServer.start()
Instead, use DataImportServer.startInWorker()
or DataImportServer.start(StatsIntradayLogger.NULL)
The following methods are newly deprecated to support this functionality change and will be removed in the next release:
DataImportServer(
com.illumon.iris.db.util.logging.DataImportServerLogFactory,
com.illumon.iris.db.v2.routing.DataImportServiceConfig,
com.fishlib.configuration.Configuration,
com.illumon.iris.db.schema.SchemaService)
DataImportServer.getDataImportServer(
com.illumon.iris.db.util.logging.DataImportServerLogFactory,
com.illumon.iris.db.v2.routing.DataImportServiceConfig,
com.fishlib.configuration.Configuration,
com.illumon.iris.db.schema.SchemaService)
The replacement methods for constructing a DataImportServer are:
/**
* Construct a {@link DataImportServer}.
*
* @param logFactory optional IrisLogCreator for access to loggers
* @param disName the name of the DIS configuration in the data routing service
* @param configuration the configuration to use when fetching settings that aren't included in disConfig
* @param schemaService the SchemaService to use
* @param routingService the DataRoutingService to use for configuration
* @param storageRoot optional storage root to use if the DIS configuration specifies "private"
*/
public DataImportServer(@Nullable DataImportServerLogFactory logFactory,
@NotNull final String disName,
@NotNull final Configuration configuration,
@NotNull final SchemaService schemaService,
@NotNull final DataRoutingService routingService,
@Nullable final String storageRoot);
/**
* Create a new DataImportServer instance, according to the configuration passed in.
*
* @param logCreatorParam optional IrisLogCreator, will be used to create logger. If null, a global instance will be used, if available.
* @param disName the name of the DIS configuration to use
* @param configuration the configuration to use when fetching settings that aren't included in disConfig
* @param schemaService the SchemaService to use
* @param routingService the DataRoutingService to use for configuration
* @param storageRoot optional storage root to use when configured storage is "private"
* @return a new DataImportServer instance
*/
public static DataImportServer getDataImportServer(@Nullable final DataImportServerLogFactory logCreatorParam,
@NotNull final String disName,
@NotNull final Configuration configuration,
@NotNull final SchemaService schemaService,
@NotNull final DataRoutingService routingService,
@Nullable final String storageRoot);
Script changes
Make the following changes to your scripts:
The new methods are more explicit about the data storage location and need a DataRoutingService to respond dynamically to data routing configuration changes.
- Instead of getting a DataImportServiceConfig from the DataRoutingService, pass the DataRoutingService and configuration name as parameters.
- Specify a storage location or leave it as null if the storage is specified in the DIS configuration.
This example shows the required changes from a previously documented example.
Before:
routingService = DataRoutingServiceFactory.getDefault()
// if the dis configuration specifies "private" storage, then you must use getDataImportServiceConfigWithStorage
disConfig = routingService.getDataImportServiceConfigWithStorage("Ingester1", "/db/dataImportServers/Ingester1")
// this call assumes named storage configured in the routing file
//disConfig = routingService.getDataImportServiceConfig("Ingester1")
dis = DataImportServer.getDataImportServer(ProcessEnvironment.getDefaultLog(), disConfig, Configuration.getInstance(), SchemaServiceFactory.getDefault())
dis.startInWorker()
After:
routingService = DataRoutingServiceFactory.getDefault()
// if the dis configuration specifies "private" storage, then you must provide a storage root, otherwise set this to null
storage = "/db/dataImportServers/Ingester1"
disName = "Ingester1"
dis = DataImportServer.getDataImportServer(
null,
disName,
Configuration.getInstance(),
SchemaServiceFactory.getDefault(),
routingService,
storage)
dis.startInWorker()
Alternate launch option for Deephaven Core+ local workers
Deephaven launches local worker processes by creating a command using various system configurations and user-specified inputs, such as additional classpath or memory settings, and executes that process directly. With this release, it is possible to configure Core+ workers so that the command is supplied to another process that can launch it. As an example, this would allow for configuring the system to run workers that are authenticated with a Kerberos keytab.
The default Core+ worker is launched with this command:
$ /usr/illumon/latest/bin/deephaven-coreplus ..
If your configuration provides a commandPrefix
for your Core+ worker kind, then the default command
will be provided to the command you specify as the prefix.
As an example, this configuration setting:
WorkerKind.DeephavenCommunity.commandPrefix=k5start -f my-krb5.keytab -U -- /bin/sh -c
Results in the workers being launched like this:
$ k5start -f my-krb5.keytab -U -- /bin/sh -c '/usr/illumon/latest/bin/deephaven-coreplus ..'
By default, the commandPrefix
is tokenized by spaces. If you require a different delimiter, you can
define it with this property:
# Use a comma instead of the default space
WorkerKind.DeephavenCommunity.commandPrefixDelimiter=,
Optional lenient IOJobImpl to avoid write queue overflow
New behavior is available to avoid write queue overflow errors in the TDCP process. When a write queue overflow condition is detected, the process can be configured to delay briefly - giving the queue a chance to drain.
The following properties govern the feature:
IOJobImpl.lenientWriteQueue
IOJobImpl.lenientWriteQueue.retryDelay
IOJobImpl.lenientWriteQueue.maxDelay
Set IOJobImpl.lenientWriteQueue=true
to enable this behavior.
By default, the writer will wait up to IOJobImpl.lenientWriteQueue.maxDelay=60_000
ms in increments of IOJobImpl.lenientWriteQueue.retryDelay=100
ms.
This should address the following fatal error in the TDCP process:
ERROR - job:1424444133/RemoteTableDataService/10.128.1.75:37440->10.128.1.75:22015 write queue overflow: r=true, w=true, p=false, s=false, u=false, h=0, rcap=69632, rbyt=0, rmax=4259840, wbyt=315407, wspc=1048832, wbuf=4097, wmax=1048576, fc=0, allowFlush=true
Persistent Query Replicas and Spares
Persistent Queries now support redundancy and automatic failover with the replicas and spares feature. Any Live Persistent Query can now specify a number of Replica and Spare workers to start. Replica workers are identical copies of the query that the Controller distributes user load across. Spares are identical copies used by the controller to replace failed replicas. When more than one replica is running, the replica workers are independent from each other. Replica queries are suitable for UI interactions, but care must be taken that the replicas do not perform actions that would be unsafe to concurrently perform without coordination.
Administrators of a query (including members of the iris-querymanagers or iris-superusers group) see the state of all replicas in the Query Monitor. A viewer only sees the replica that they are assigned to in the Query Monitor.
Load Balancing
When a query is configured with more than one replica, the controller uses a configurable policy called an assignment policy to distribute users across all available workers. Each Persistent Query Configuration has a number of "slots" equal to the number of replicas. An assignment policy, which is sticky to prevent a single user's UI-driven actions from negatively impacting more than one replica, assigns a user to a specific slot.
By default, Deephaven provides a single "Round Robin" assignment policy that distributes users evenly across the available workers.
Customers may implement the io.deephaven.enterprise.controller.assignment.AssignmentPolicy
interface to provide custom assignment policies.
Assignment policies are configured by adding values to your property file as follows:
PersistentQueryController.AssignmentPolicy.PolicyName.class=com.company.MyPolicyClass
PersistentQueryController.AssignmentPolicy.PolicyName.displayName=A Display Name
PersistentQueryController.AssignmentPolicy.PolicyName.description=Describes the policy
The assignment policy receives two callbacks whenever the number of replicas change. The first callback occurs before any currently running replicas are shut down so that users can be reassigned. The second callback occurs after the changes are applied so that user load can be redistributed.
Automatic Failover
When a query is configured with spares and a replica crashes for any reason, the controller will immediately replace the crashed worker with a running spare if it is available. Each replica slot independently respects the "Error Restart Attempts" scheduling parameter of the query configuration. This can prevent a single slot from continually claiming a spare query to the exclusion of other replicas.
Additional Changes
A new DbInternal table QueryUserAssignmentLog
has been added so that Assignment Policies can log how users are assigned to available replicas.
-add_acl 'new UsernameFilterGenerator("QueryOwner")' -group allusers -namespace DbInternal -table QueryUserAssignmentLog -overwrite_existing
-add_acl * -group iris-superusers -namespace DbInternal -table QueryUserAssignmentLog -overwrite_existing
Server Selection Provider Replica Distribution
The default SimpleServerSelectionProvider attempts to more evenly spread replicas and spares of a single persistent query across the cluster so that a single server failure is less likely to impact all replicas of a running query.
The IServerSelectionProvider
interface now requires three additional methods to be implemented:
addWorkerForSerial
is invoked when a worker for a persistent query is assigned to a serverremoveWorkerForSerial
is invoked when a worker for a persistent query is terminatedgenerateStatusInformation
is generates internal state suitable for display bydhconfig pq selection-provider
Option to default all user tables to Parquet
Set the configuration property db.LegacyDirectUserTableStorageFormat=Parquet
to default all direct user table operations, such as db.addTable
, to the Parquet storage format. The default if the property is not set is DeephavenV1
.
Default Landing Page
The default landing page for a Deephaven installation now automatically
redirects to the web UI. To download the Swing Launcher, go to the /launcher
path (e.g., https://deephaven.example.com:8123/launcher/
or
https://deephaven.example.com:8000/launcher/
).
Disabling Password Authentication
To disable password authentication within the authentication server, set the
configuration property authentication.passwordsEnabled
to false
. When the
property is set to false, the authentication server rejects all password logins
and you must use SAML or private key authentication to access Deephaven.
Note that even if the UI presents a password prompt, the authentication backend rejects all passwords.
Deephaven processes log their heap usage
The db_dis
, web_api_service
, log_aggregator_service
, iris_controller
, db_tdcp
, and configuration_server
processes now periodically log their heap usage.
PersistentQueryController.log.current:[2024-05-10T15:00:32.365219-0400] - INFO - Jvm Heap: 3,972,537,856 Free / 4,291,624,960 Total (4,291,624,960 Max)
PersistentQueryController.log.current:[2024-05-10T15:01:32.365404-0400] - INFO - Jvm Heap: 3,972,310,192 Free / 4,291,624,960 Total (4,291,624,960 Max)
The logging interval can be configured using the property RuntimeMemory.logIntervalMillis
. The default is one minute.
Worker Additional Memory Request
When creating a Code Studio or Persistent Query, Deephaven allows you to configure the heap size of the Java virtual machine running your process. The RemoteQueryDispatcher uses this parameter to not only to pass appropriate arguments to the Java process, but also to account for the RAM used by workers and when running in Kubernetes to set the resource requests and limits.
A Deephaven worker can use off-heap memory in various circumstances. For example, Java libraries may use direct buffers to reduce the overhead on the garbage collector or to improve I/O performance. When executing a Python worker, Python object are not part of the Java heap, but are instead allocated and garbage collected by the Python interpreter's memory management subsystem. Similarly, any native libraries also use off-heap memory.
If your worker process uses such off-heap memory, can request that memory from the dispatcher using the "Additional Memory" field under "Advanced Settings" in the Code Studio start screen and Persistent Query Settings tab. This is especially important on Kubernetes, where if a process's memory usage exceeds the container's limit the OS kernel forcibly kills the process.
Authorized Keys and Local Users Text Files
Deephaven no longer uses the dsakeys.txt
or authusers.txt
files by default.
The Deephaven 1.20221001 release added support for storing public keys in the
ACL store; and passwords hashes can be stored in the ACL store when
iris.enableManagedUserAuthentication
is set to true (which has been the
default in iris-environment.prop
in all supported software versions).
Passwords and public keys can be manipulated with dhconfig acl user set-password
and dhconfig acl publickey
, respectively.
On upgrade, any keys stored in dsakeys.txt
are migrated to the ACL store.
On installation or upgrade, system keys for the controller, merge server, and
TDCP are written to the ACL store.
To re-enable text file based keys, the following properties must be set in your
iris-environment.prop
file:
authentication.server.localusers.enabled=true
authentication.server.localusers.file=<name>
authentication.server.authorizedkeys.enabled=true
authentication.server.authorizedkeys.file=<name>
In-Worker Service Persistent Query Type Removed
The previously deprecated In-Worker Service Persistent Query type has been
removed. Any existing persistent queries that are an In-Worker Service should
be converted to use an appropriate script type before upgrading Deephaven.
Most existing In-Worker Service queries are likely Data Import Servers, which
can be converted with a script to a type of Live Query - Merge Server
.
etcd configuration parameter 'bcrypt-cost: 4' added
When a system is installed, the etcd configuration file config.yaml
adds the configuration parameter bcrypt-cost: 4
.
This affects creation of new etcd users used internally by the Deephaven software. No changes are needed for Deephaven usernames or passwords stored in etcd. For a newly installed system this change applies immediately.
On a system that is being updated from an existing version,
the installer will not modify the etcd configuration files;
this can be done manually: on each node running etcd,
modify the file /etc/etcd/dh/ETCD_CLUSTER_ID/config.yaml
(replace ETCD_CLUSTER_ID by your etcd cluster id, a string
looking something like 'cfa6c474b'). On each config.yaml file,
add a line containing
bcrypt-cost: 4
On a system where the config file has been modified to add the new bcrypt cost value, this new value will only apply for new users created after the change, or for old users if their passwords are updated after the change; for existing users that do not update their password nothing changes.
A script redo-etcd-passwords
is provided under the bin
directory
that updates the passwords for all etcd users except root; the
password is not changed from the actual password value, but
updating the password to the same value has the side effect that the
user will use the new configuration parameter when the password is
validated. Run the script as the irisadmin user while the etcd cluster
is up but the Deephaven Enterprise system is down.
Background
etcd uses go's bcrypt
function to hash passwords. Passwords are
stored on disk hashed, to avoid compromising them immediately if an
attacker is able to observe the data on disk (this is similar to how
Unix/Linux systems store passwords hashed in the /etc/passwd
file).
When a user tries to authenticate with etcd and provides a password,
etcd hashes the password again and compares the hash with the value
stored on disk. The bcrypt function is computationally intensive;
this is intentional to prevent an attacker from trying to guess
passwords from being able to probe passwords quickly. This, however,
has an impact in a system like Deephaven Enterprise:
-
Deephaven Enterprise creates workers on demand, sometimes starting them as scheduled persistent queries. When many workers are starting at the same time, an etcd server machine can become very busy validating credentials via bcrypt for many new clients.
-
When an etcd server that is leader fails, and a new server takes over as leader, many clients switch to the new server, all of them requiring the server to execute the bcrypt function to validate credentials; this can load a machine at the worst possible moment from a fault tolerance perspective.
For these reasons, it is desirable to reduce the cost of bcrypt to the minimum configurable. The default value used by bcrypt's go library (and etcd) is 10. The minimum value is 4. This configuration change sets the value explicitly to 4, where we previously were not configuring it and thus using the default.
etcd stores the bcrypt value used when a password is created or modified, to know at what value of bcrypt cost to validate the hash. Therefore, when the bcrypt cost value is changed it only applies for new users or after updating an existing user's password.
/usr/illumon/dnd changed to /usr/illumon/coreplus
The installer now uses /usr/illumon/coreplus
for packages instead of
/usr/illumon/dnd
to reflect the updated Core+ name.
Kubernetes Heap Overhead Parameters
When running Deephaven installations in Kubernetes, the originally-implemented JVM overhead properties don't prevent some workers being killed with out-of-memory exceptions.
- Adding the
BinaryStoreWriterV2.allocateDirect=false
JVM parameter reduces direct memory usage, which is not counted towards dispatcher heap usage and can result in Kubernetes out-of-memory failures. - Adding the
-Xms
JVM parameter allocates all requested heap at worker creation time, reducing the likelihood of after-startup worker out-of-memory failures from later memory requests. - Adding the
-XX:+AlwaysPreTouch
JVM parameter to workers ensures that all worker heap is touched during startup, avoiding later page-faulting.
The following properties are being added to iris-environment.prop
for new installations. Deephaven strongly suggests adding them manually to existing installations.
RemoteProcessingRequestProfile.Xms.G1 GC=$RequestedHeap
RemoteQueryDispatcher.JVMParameters=-XX:+AlwaysPreTouch
BinaryStoreWriterV2.allocateDirect=false
In addition, the property RemoteQueryDispatcher.memoryOverheadMB=500
is being updated in iris-defaults.prop
, and this will automatically be picked up when the Kubernetes installation is upgraded.
dhconfig replaces etcd_prop_file
The etcd_prop_file tool has been removed. You must use dhconfig properties import
instead.
For recovery or bootstrapping purposes, the --diskprops
option can be used in
conjunction with the --etcd
option to read property files from disk and
import new files into etcd.
Dispatcher Memory Reservation
The Remote Query Dispatcher (either db_query_server
or db_merge_server
) has a configurable amount of heap that can be dispatched to workers, which is controlled by setting the RemoteQueryDispatcher.maxTotalQueryProcessorHeapMB
property. Setting this property requires accounting the other processes that may be running on the machine. If set too high, then workers may fail to allocate memory after being dispatched after dispatch or the kernel OOM killer may terminate processes. If set too low, then the machine may be underutilized.
As an additional safety check, the Remote Query Dispatcher can query the /proc/meminfo
file for available heap. If a user requests more heap than the MemAvailable
field indicates can be allocated to a new process, then the remote query dispatcher can reject scheduling the worker.
There are two new properties that control this behavior:
- RemoteQueryDispatcher.adminReservedAvailableMemoryMB; for users that are members of
RemoteQueryDispatcher.adminGroups
. By default this is set to 1024MiB. - RemoteQueryDispatcher.reservedAvailableMemoryMB: For all other users. By default this is set to 2048MiB.
When set to -1
, the additional check is disabled. When set to a non-negative value the dispatcher subtracts the property's value from the available memory, and verifies that the worker heap is less than this value before creating the worker.
You can examine the current status of properties, using the /config
endpoint if RemoteQueryDispatcher.webserver.enabled
is set to true. For example, navigate to `https://query-host.example.com:8084/config'. The available memory along with property values are displayed as an HTML table.
This property does not guarantee that workers or other processes are not terminated by the OOM killer. Running workers and processes may not have allocated their maximum heap size, and therefore can use system memory beyond what is available at dispatch time.
These properties have no effect on Kubernetes deployments. The dispatcher does not have an complete view of the available cluster resources.
ILLUMON_JAVA is deprecated. Use DH_JAVA instead.
In the past, specifying which version of java to use with Deephaven was done with the ILLUMON_JAVA
and it was applied inconsistently.
In this release, you can set DH_JAVA=/path/to/java_to_use/bin/java
in your cluster.cnf
to tell all Deephaven processes where to find the correct java executable regardless of your PATH.
DH_JAVA
works correctly whether you point to a java executable or a java installation directory (like "JAVA_HOME")
Both DH_JAVA=/path/to/java_to_use
and DH_JAVA=/path/to/java_to_use/bin/java
operate identically.
If different machines in your cluster have java executables located in different locations,
it is your responsibility to set DH_JAVA correctly in /etc/sysconfig/deephaven/cluster.cnf
on each machine, or (preferably) to use a symlink so you have a consistent DH_JAVA location on all machines.
Managing Persistent Queries with dhconfig
The dhconfig
tool can now manage Persistent Queries using the subcommand pq
. For example, you can restart a query with:
dhconfig pq restart --name "Persistent Query Name"
Complete help is available by running dhconfig pq --help
. This example demonstrates one of the improvements enabled by using dhconfig
: the old controller tool does not handle query names with spaces.
All existing operations that were performed by /usr/illumon/latest/bin/iris controller_tool
have been integrated into dhconfig
, and the old controller tool will be removed in a future release.
See the help pages for more details dhctl controller --help
Legacy C# Open API Client Removed
Deephaven no longer builds or maintains the Legacy C# Open API client. C# Binary Loggers and the C# SBE client are still supported.
The Excel plugin for Legacy workers which depended on this client is also no longer supported.
Core+ Controller Python Imports
From Core+ Python workers, you may now import Python modules from repositories stored in the controller. To evaluate a single Python file:
import deephaven_enterprise.controller_import
deephaven_enterprise.controller_import.exec_script("script_to_execute.py")
To import a script as a module, you must establish a meta-import with a module prefix for the controller. The following example uses the default value of "controller" to load a module of the form "package1/package2.py" or "package1/package2/__init__.py":
import deephaven_enterprise.controller_import
deephaven_enterprise.controller_import.meta_import()
import controller.package1.package2
Refreshing Local Script Repositories
The Persistent Query Controller defines a set of script repositories that can be used from Persistent Queries or Code Studios. The repositories may be configured to use a remote Git repository or just a path on the local file system. The controller scans the repository on startup for the list of scripts that are available. Previously, only Git repositories could have updates enabled (once per minute); and local repositories would never be rescanned.
You can now set the property PersistentQueryController.scriptUpdateEnabled
to true to enable script updates. If this property is not set, then the old
PersistentQueryController.useLocalGit
property is used (the old property has
an inverse sense, meaning PersistentQueryController.useLocalGit=true
stops
updates and PersistentQueryController.useLocalGit=false
permits updates) .
To mark a repository as local, the "uri" parameter must be set to empty. For
example, if the repository was reffered to as "irisrepo" in the
iris.scripts.repos
property, then to mark the repository as local you would
include a property like in your iris-environment.prop
file:
iris.scripts.repo.irisrepo.uri=
Fixing etcd ACLs that broke after upgrading to URL encodings
Note that the following is only applicable to etcd ACLs.
In 1.20231218.116 and 1.20231218.132, Deephaven began URL encoding ACL keys to prevent special characters like '/' in keys from corrupting the ACL database. Although not all special characters corrupted the database, all of them are encoded, causing the unencoded database to be incompatible with the new version. A common occurrence of this pattern is the "@" character in usernames.
These ACL entries can be fixed using the EtcdAclEncodingTool.
First, back up your etcd database by reading our backup and restore instructions.
To rewrite these ACLs with proper encodings, run the following command as irisadmin:
sudo -u irisadmin /usr/illumon/latest/bin/iris_exec com.illumon.iris.db.v2.permissions.EtcdAclEncodingTool
To see what changes would occur without actually modifying the ACLs, run:
sudo -u irisadmin /usr/illumon/latest/bin/iris_exec com.illumon.iris.db.v2.permissions.EtcdAclEncodingTool -a --dry-run
Setting JVM JIT Compiler Options for Workers
The ability to set the maximum number of allowed JVM JIT compiler threads through the -XX:CICompilerCount
JVM option has been added to JVM profiles using properties of the form RemoteProcessingRequestProfile.JitCompilerCount
. See the remote processing profiles documentation for further information.
Upgrade etcd to 3.5.12
In past releases, we recommended upgrading etcd to 3.5.5.
It was later discovered that 3.5.5 has a known bug which can break your etcd cluster if you perform an etcdctl password reset.
As such, when upgrading etcd, you should prefer the Deephaven-tested 3.5.12 point release, which is the new default as of version 1.20231218.190
.
All newly created systems will have 3.5.12 installed, but for existing systems, you must unpack new etcd binaries yourself.
You can find manual etcd installation instructions in the third-party dependencies guide.
Configurable gRPC Retries
The configuration service now supports using a gRPC service configuration file to configure retries, and one is provided by default for the system.
{
"methodConfig": [
{
"name": [
{
"service": "io.deephaven.proto.config.grpc.ConfigApi"
},
{
"service": "io.deephaven.proto.registry.grpc.RegistryApi"
},
{
"service": "io.deephaven.proto.routing.grpc.RoutingApi"
},
{
"service": "io.deephaven.proto.schema.grpc.SchemaApi"
},
{
"service": "io.deephaven.proto.processregistry.grpc.ProcessRegistryApi"
},
{
"service": "io.deephaven.proto.unified.grpc.UnifiedApi"
}
],
"retryPolicy": {
"maxAttempts": 60,
"initialBackoff": "0.5s",
"maxBackoff": "2s",
"backoffMultiplier": 2,
"retryableStatusCodes": [
"UNAVAILABLE"
]
},
"waitForReady": true,
"timeout": "120s"
}
]
}
methodConfig
has one or more entries. Each entry has a name section
with one or more service/method sections that filter whether the
retryPolicy section applies.
If the method is empty or not present, then it applies to all methods of the service. If service is empty, then method must be empty, and this is the default policy.
The retryPolicy section defines how a failing gRPC call will be
retried. In this example, grpc will retry for just over 1 minute while
the status code is UNAVAILABLE
(e.g. the service is down). Note this
applies only if the server is up but the individual RPCs are being
failed as UNAVAILABLE
by the server itself. It the server is down,
the status returned is UNAVAILABLE
but the retryPolicy defined here
for the method does not apply; gRPC manages reconnection retries for a
channel separately/independently as described here:
https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md
There is no way to configure the parameters for reconnection; see https://github.com/grpc/grpc-java/issues/9353
If the service config file specifies waitForReady, then an RPC executed
when the channel is not ready (server is down) will not fail right
away but will wait for the channel to be connected. This, combined
with a timeout
definition will make the RPC call hold on for as long
as the timeout giving the reconnection policy a chance to get the
channel to ready.
For Deephaven processes, customization of service config can be done
by (a) copying configuration_service_config.json to
/etc/sysconfig/illumon.d/resources and modifying it there, or (b)
renaming it and setting property
configuration.server.service.config.json
.
Note that the property needs to be set as a launching JVM argument because this is used in the gRPC connection to get the initial properties.
Note: The relevant service names are:
io.deephaven.proto.routing.grpc.RoutingApi
io.deephaven.proto.config.grpc.ConfigApi
io.deephaven.proto.registry.grpc.RegistryApi
io.deephaven.proto.schema.grpc.SchemaApi
io.deephaven.proto.unified.grpc.UnifiedApi
Add Core+ Calendar support and allow Java ZoneId strings in Legacy Calendars
Core+ workers can use the Calendars.resourcePath
property to load customer provided business calendars from disk. To use calendars in Core+ workers, any custom calendars on your resource path must be updated to use a standard TimeZone value.
Legacy workers also support using ZoneId strings instead of DBTimeZone values.
Dynamic management of Data Import Server configurations
Creating a new Data Import Server configuration and integrating it into the Deephaven system requires several steps, including required adjustments to the data routing configuration. This final step can now be performed with a few simple commands, and no longer requires editing the data routing configuration file.
dhconfig dis
The dhconfig
command has a new action: dis
, which supports import
, add
, export
, list
, delete
, validate
actions. The commands themselves provide help, and more information can be found in the dhconfig documentation.
dhconfig dis import
Import one or more configurations from one or more files. For example:
/usr/illumon/latest/bin/dhconfig dis import /path/to/kafka.yml
kafka.yml
kafka:
name: kafka
endpoint:
serviceRegistry: registry
tailerPortDisabled: 'false'
tableDataPortDisabled: 'false'
claims:
- {namespace: Kafka}
storage: private
dhconfig dis add
Define and import a single configuration on the command line. For example (equivalent to the import example above):
/usr/illumon/latest/bin/dhconfig dis add --name kafka --claim Kafka
dhconfig dis export
Export one or more configurations to one or more files. These files are suitable for the import command. For example, to export all configured Data Import Servers:
/usr/illumon/latest/bin/dhconfig dis export --file /tmp/import_servers.yml
dhconfig dis list
List all configured Data Import Servers. For example:
/usr/illumon/latest/bin/dhconfig dis list
Data import server configurations:
kafka
kafka3
dhconfig dis delete
Delete one or more configurations. For example:
/usr/illumon/latest/bin/dhconfig dis delete kafka --force
dhconfig dis validate
Validate one or more configurations. This can validate proposed changes before committing them with the import command. This process verifies that the configuration as a whole will be valid after applying the new changes.
Caveats
"Data routing configuration" comprises the "main" configuration file (managed with dhconfig routing
) and additional DIS configurations. The main routing configuration may contain DIS configurations in the dataImportServers
section. These two sources of DIS configurations are managed separately and are not permitted to contain duplicates. If you want to manage an existing DIS configuration with the new commands, you must remove it from the main routing configuration.
This functionality will only be useful for querying data if the routing configuration includes "all data import servers" using the dataImportServers
keyword. This is usually a source under the db_tdcp
table data service:
db_tdcp:
host: *localhost
port: *default-tableDataCacheProxyPort
sources:
- name: dataImportServers
A DIS configuration requires storage
. The special value private
indicates that the server will supply its own storage location. Any other value must be present in the storage
section of the routing configuration.
Update jgit SshSessionFactory to a more modern/supported version
For our git integration, we have been using the org.eclipse.jgit
package. Github discontinued
support for SHA-1 RSA ssh keys, but jgit's ssh implementation (com.jcraft:jsch) does not support
rsa-sha2 signatures and will not be updated. To enable stronger SSH keys and provide GitHub
compatibility, we have configured jgit to use an external SSH executable by setting the GIT_SSH
environment variable. The /usr/bin/ssh
executable must be present for Git updates.
Restartable Controller
If the iris_controller
process restarts quickly enough, Core+ workers that were already initialized and running
normally by the time the controller restarted continue running without interruption. Legacy workers still terminate on controller restart.
- The duration that workers can survive without the controller is defined by the property
PersistentQueryController.etcdPresenceLeaseTtlSeconds
, which defaults to 60 (seconds). - Only workers that have completed initialization and are in the
Running
state before the crashed controller died and which should still be running by that time, according to their query configuration at the time of controller restart.
If the iris_controller is stopped normally (e.g., via monit stop
or a regular UNIX TERM signal), the value of the property PersistentQueryController.stopWorkersOnShutdown
determines the desired behavior for workers.
- When set to
true
, all controller-managed workers are stopped alongside the controller. This is consistent with the traditional behavior. - When set to
false
(the new default), workers do not stop alongside the controller, and have the time defined in the propertyPersistentQueryController.etcdPresenceLeaseTtlSeconds
(defaults to 60 seconds) as a grace period where they wait for the controller to come back.
If the controller crashes (i.e., the iris_controller process stopped unexpectedly by an exception that crashes the process, a machine reboot, or a UNIX KILL signal), then workers are not proactively stopped even if the value of PersistentQueryController.stopWorkersOnShutdown
is true
. In this case, the dispatcher terminates those workers after the PersistentQueryController.etcdPresenceLeaseTtlSeconds
timeout.
Note: irrespective of the value of the PersistentQueryController.stopWorkersOnShutdown
property, if the dispatcher associated to a worker is shutdown, the worker stops.
Renamed Swing Launcher Archives
The downloadable swing launcher has been renamed as follows:
DeephavenLauncherSetup_123.exe
is now deephaven-launcher-123.exe
DeephavenLauncher_123.tar
is now deephaven-launcher-123.tgz
Reliable Barrage table connections
We have added a new library to provide reliable Barrage subscriptions within a Deephaven Core+ cluster. The new tables monitor the state of the source query and gracefully handle disconnection and reconnections without user intervention. This can be used to create reliable meshes of Core+ workers that are fault tolerant to the loss of other queries.
When using ResolveTools, PQ URLs (pq://MyQuery/scope/MyTable?columns=MyFirstColumn,SomeOtherColumn
) use these new reliable tables.
To use this library see the following examples
import io.deephaven.enterprise.remote.RemoteTableBuilder
import io.deephaven.enterprise.remote.SubscriptionOptions
// Subscribe to the columns `MyFirstColumn` and `SomeOtherColumn` of the table `MyTable` from the query `MyQuery`
table = RemoteTableBuilder.forLocalCluster()
.queryName("MyQuery")
.tableName("MyTable")
.subscribe(SubscriptionOptions.builder()
.addIncludedColumns("MyFirstColumn", "SomeOtherColumn").build())
from deephaven_enterprise import remote_table as rt
# Subscribe to the columns `MyFirstColumn` and `SomeOtherColumn` of the table `MyTable` from the query `MyQuery`
table = rt.in_local_cluster(query_name="MyQuery", table_name="MyTable").subscribe(
included_columns=["MyFirstColumn", "SomeOtherColumn"]
)
Connecting to remote clusters
It is also possible to connect to queries on a different Deephaven cluster.
import io.deephaven.enterprise.remote.RemoteTableBuilder
table = RemoteTableBuilder.forRemoteCluster("https://other-server.mycompany.com:8000/iris/connection.json")
.password("user", "password")
.queryName("MyQuery")
.tableName("MyTable")
.subscribe(SubscriptionOptions.builder()
.addIncludedColumns("MyFirstColumn", "SomeOtherColumn").build())
from deephaven_enterprise import remote_table as rt
# Subscribe to the columns `MyFirstColumn` and `SomeOtherColumn` of the table `MyTable` from the query `MyQuery`
table = rt.for_remote_cluster("https://other-server.mycompany.com:8000/iris/connection.json")
.password("username", "password") \
.query_name("MyQuery") \
.table_name("MyTable") \
.subscribe(included_columns=["MyFirstColumn", "SomeOtherColumn"])