How do I solve a `hash table exceeds maximum size` error?
If you see a hash table exceeds maximum size
error, it means that your table has too many keys for Deephaven to handle - without reducing cardinality. Fortunately, this is easy to do.
To get around this problem, simply use partition_by
to create subtables with a lesser key cardinality. Take this example with 100 million rows:
from deephaven import empty_table
t = empty_table(100_000_000).update("key = randomInt(0, 10000000)")
# will cause a "hash table exceeds maximum size" error on very large tables
rst = t.count_by("Count", "key")
This can be reformulated using partitioned tables into a form that has lower key cardinality that will avoid the error:
from deephaven import empty_table
t = empty_table(100_000_000).update("key = randomInt(0, 10000000)")
# partition the table to reduce key cardinality
rst = (
t.update_view("Partition = key % 100")
.partition_by(["Partition"])
.proxy()
.count_by("Count", "key")
.target.merge()
)
Additionally, working with large data amplifies any inefficiency in your code, so make sure you're using the correct join and selection methods for your use case.
These FAQ pages contain answers to questions about Deephaven Community Core that our users have asked in our Community Slack. If you have a question that is not in our documentation, join our Community and we'll be happy to help!