Null, infinity, and NaN values

Null, infinity, and not-a-number (NaN) values can unexpectedly appear in your datasets and disrupt calculations if not handled properly. This guide shows you how to detect, filter, and replace these special values to keep your data analysis running smoothly. Not all numeric types support each of these special values.

In computing, null, infinity, and NaN are special values that represent missing, undefined, or indeterminate data. Nulls exist for all Java primitive types, while NaN and infinity exist only for floating-point types.

Nulls represent missing data. Each data type reserves a single, specific value to represent null. For example, Deephaven reserves -32768 as the null value for the primitive short data type, resulting in an adjusted minimum value of -32767 instead of the traditional -32768.

NaNs represent undefined mathematical results. For instance, the logarithm of a negative number is undefined, so that result is represented as NaN.

Infinities represent mathematically infinite values. Unlike NaNs, infinities are well-defined mathematically. For example, any positive number divided by 0 is undefined numerically but mathematically is defined as positive infinity.

Available constants

Deephaven provides constant values representing null, infinity, and NaN values for each numeric type. These values are available in Python as well as built into the query language.

Nulls

The following null constants are available in Deephaven:

from deephaven.constants import (
    NULL_BOOLEAN,
    NULL_CHAR,
    NULL_BYTE,
    NULL_SHORT,
    NULL_INT,
    NULL_LONG,
    NULL_FLOAT,
    NULL_DOUBLE,
)

nulls = {
    "boolean": NULL_BOOLEAN,
    "char": NULL_CHAR,
    "byte": NULL_BYTE,
    "short": NULL_SHORT,
    "int": NULL_INT,
    "long": NULL_LONG,
    "float": NULL_FLOAT,
    "double": NULL_DOUBLE,
}
for key in nulls:
    print(f"Null value for {key}: {nulls[key]}")
from deephaven import empty_table

source = empty_table(1).update(
    [
        "NullBoolean = NULL_BOOLEAN",
        "NullChar = NULL_CHAR",
        "NullByte = NULL_BYTE",
        "NullShort = NULL_SHORT",
        "NullInt = NULL_INT",
        "NullLong = NULL_LONG",
        "NullFloat = NULL_FLOAT",
        "NullDouble = NULL_DOUBLE",
    ]
)
source_meta = source.meta_table

NaNs

The following NaN constants are available in Deephaven:

from deephaven.constants import NAN_DOUBLE, NAN_FLOAT

nan_values = {
    "float": NAN_FLOAT,
    "double": NAN_DOUBLE,
}
for key in nan_values:
    print(f"NaN value for {key}: {nan_values[key]}")
from deephaven import empty_table

source = empty_table(1).update(
    [
        "NanFloat = NAN_FLOAT",
        "NanDouble = NAN_DOUBLE",
    ]
)
source_meta = source.meta_table

Infinities

The following infinity constants are available in Deephaven:

from deephaven.constants import (
    POS_INFINITY_DOUBLE,
    POS_INFINITY_FLOAT,
    NEG_INFINITY_DOUBLE,
    NEG_INFINITY_FLOAT,
)

infinity_values = {
    "pos_float": POS_INFINITY_FLOAT,
    "pos_double": POS_INFINITY_DOUBLE,
    "neg_float": NEG_INFINITY_FLOAT,
    "neg_double": NEG_INFINITY_DOUBLE,
}
for key in infinity_values:
    print(f"Infinity value for {key}: {infinity_values[key]}")
from deephaven import empty_table

source = empty_table(1).update(
    [
        "PosInfinityFloat = POS_INFINITY_FLOAT",
        "PosInfinityDouble = POS_INFINITY_DOUBLE",
        "NegInfinityFloat = NEG_INFINITY_FLOAT",
        "NegInfinityDouble = NEG_INFINITY_DOUBLE",
    ]
)
source_meta = source.meta_table

Built-in vs Python constants

As shown in the code blocks above, the same constants are made available in both the Python API as well as the query language. As with all other operations done in query strings, the built-in constants are the recommended way to use these constants. The Python API constants are provided for convenience and times where the constants are needed in Python code.

Use null, infinity, and NaN values

Nulls, NaNs, and infinities are handled in similar ways in table operations.

Built-in functions

The functions built into the query language gracefully handle null, NaN, and infinity values. For instance, the built-in sin function handles them with no additional steps:

from deephaven import empty_table

source = empty_table(40).update(
    [
        "X1 = (ii % 4 == 0) ? NULL_DOUBLE : 0.1 * ii",
        "Y1 = sin(X1)",
        "X2 = (ii % 4 == 0) ? NAN_DOUBLE : 0.1 * ii",
        "Y2 = sin(X2)",
        "X3 = (ii % 4 == 0) ? POS_INFINITY_DOUBLE : 0.1 * ii",
        "Y3 = sin(X3)",
        "X4 = (ii % 4 == 0) ? NEG_INFINITY_DOUBLE : 0.1 * ii",
        "Y4 = sin(X4)",
    ]
)

Replace values

Replacing null, NaN, and infinity values is a common practice when a dataset should have a default value. The following code block replaces null values with -1 using built-in methods:

from deephaven import empty_table

source = empty_table(10).update(
    [
        "HasNulls = (ii % 3 == 0) ? NULL_DOUBLE : randomDouble(0, 1)",
        "HasNaNs = (ii % 3 == 0) ? NAN_DOUBLE : randomDouble(0, 1)",
        "HasPosInfs = (ii % 3 == 0) ? POS_INFINITY_DOUBLE : randomDouble(0, 1)",
        "HasNegInfs = (ii % 3 == 0) ? NEG_INFINITY_DOUBLE : randomDouble(0, 1)",
        "ReplacedNulls = replaceIfNull(HasNulls, -1)",
        "ReplacedNaNs = replaceIfNaN(HasNaNs, -1)",
        "ReplacedPosInfs = replaceIfNonFinite(HasPosInfs, -1)",
        "ReplacedNegInfs = replaceIfNonFinite(HasNegInfs, -1)",
    ]
)

Remove values

Built-in methods can also be used to filter out null and NaN values. There is no built-in method specifically to remove infinity values.

from deephaven import empty_table

source = empty_table(10).update(
    [
        "HasNulls = (ii % 3 == 0) ? NULL_DOUBLE : randomDouble(0, 1)",
        "HasNaNs = (ii % 4 == 2) ? NAN_DOUBLE : randomDouble(5, 10)",
    ]
)

result_no_nulls = source.where("!isNull(HasNulls)")
result_no_nans = source.where("!isNaN(HasNaNs)")
result_no_nulls_nans = source.where(["!isNull(HasNulls)", "!isNaN(HasNaNs)"])

User-defined functions

When passing null, NaN, or infinity values from tables into Python functions, the deephaven.constants module is useful. You can check input values as follows:

from deephaven import empty_table
from deephaven.constants import (
    NULL_DOUBLE,
    NAN_DOUBLE,
    POS_INFINITY_DOUBLE,
    NEG_INFINITY_DOUBLE,
)


def my_func(input_value) -> float:
    if input_value == NULL_DOUBLE:
        return -1
    elif input_value == NAN_DOUBLE:
        return -2
    elif input_value == POS_INFINITY_DOUBLE:
        return -3
    else:
        return input_value * 2


source = empty_table(10).update(
    [
        "HasNulls = (ii % 3 == 0) ? NULL_DOUBLE : randomDouble(0, 1)",
        "HasNaNs = (ii % 4 == 2) ? NAN_DOUBLE : randomDouble(5, 10)",
        "HasPosInfs = (ii % 5 == 4) ? POS_INFINITY_DOUBLE : randomDouble(10, 15)",
        "HasNegInfs = (ii % 6 == 5) ? NEG_INFINITY_DOUBLE : randomDouble(15, 20)",
    ]
)

result = source.update(
    [
        "ResultNulls = my_func(HasNulls)",
        "ResultNaNs = my_func(HasNaNs)",
        "ResultPosInfs = my_func(HasPosInfs)",
        "ResultNegInfs = my_func(HasNegInfs)",
    ]
)