Towards real-time epileptic seizures forecasts

Machine learning techniques have an ever-increasing importance in healthcare. Some key applications include medical image classification, treatment recommendations, disease detection, and prediction. This blog discusses predicting seizures in epileptic patients through binary classification.

You don't need to be a neuroscience expert to develop a basic working prototype of a seizure prediction model. To classify electroencephalogram (EEG) signals we will use a long short-term memory (LSTM) recurrent neural network (RNN) with Deephaven.

Deephaven is ideally suited for this task - its images for AI/ML in Python make using TensorFlow easy (for more information, please see our guide). Besides, in real-world applications, EEG data is generated in the form of a stream, such as neural activity records from brain implants, sensors, or wearable devices. Deephaven's streaming tables are a natural choice to make real-time predictions.

Background

Seizures are like storms in the brain — sudden bursts of abnormal electrical activity that can cause disturbances in movements, behavior, feelings, and awareness. There is no regularity in their occurrence, so doctors have no way of telling people with epilepsy when the next seizure might happen - in 20 hours, in 20 days, or 20 weeks after a previous one. 25% of the patients with epilepsy are drug-resistant and have to live with the threat of a sudden seizure at any time.

For many years, neuroscientists thought seizures began abruptly, just a few seconds before clinical attacks. Recent research has shown that seizures are not random events and develop minutes to hours before clinical onset. There are 4 states of brain activity: interictal (between seizures), preictal (before seizure), ictal (seizure), and post-ictal (after seizures). Over the last few years, significant research has demonstrated the existence and accurate classification of the preictal brain state.

Dataset

The latest studies show that seizures can be forecast 24 hours in advance — and in some patients, up to three days prior. In this work, we will not be so ambitious. Instead, we will try to predict the risk of a seizure within 10-minute intervals. We will be using a dataset from the American Epilepsy Society Seizure Prediction Challenge on Kaggle. It is EEG data from the NeuroVista seizure advisory system implant.

Each epilepsy patient has their own specific pre-seizure signatures, so we will be using records of brain electrical activity only for one patient from the dataset for the sake of time. The goal of our experiment is to distinguish between 10-minute-long data clips covering an hour before a seizure (i.e., preictal clips), and 10-minute EEG clips with no oncoming seizures (interictal clips).

Load the data

Let's start by loading the EEG data:

import glob

CLIP_PATH = "/data/Patient_1/"

def get_clips(data_folder):

    # Get all clips
    clips = os.listdir(data_folder)

    # Preictal recordings - time-series segments of the measurement before a seizure occura
    clips_preictal = glob.glob(os.path.join(data_folder, "*preictal*"))

    # Interictial segments - segments with no oncoming seizures
    clips_interictial = glob.glob(os.path.join(data_folder, "*interictal*"))

    return clips_interictial, clips_preictal


# Get EEG recordings
clips_interictal, clips_preictal = get_clips(data_folder = CLIP_PATH)

For our patient, EEG data was recorded with 15 channels (15 electrodes) and a sampling rate of 5000 Hz. In each channel, a sampling frequency (5000 Hz) determines how many data samples represent 1 second of EEG data. The sampling frequency multiplied by the total measurement time per clip (~600 seconds in our example) determines the length of each time series (around 3,000,000).

Features

There are various signal processing methods to engineer features from the raw EEG data. The Kaggle competition winners used the power spectral band, the signal correlation between EEG channels and eigenvalue of the correlation matrix, Shannon's entropy, and many more.

In this blog, we want to keep it simple - we won't dive deep into complex signal processing theories and neuroscience interpretations; instead, we will only perform a 1d convolution on the raw measurements.

For our LSTM network, we want to use TensorFlow, which requires input as a tensor with the shape (N, seq_len, n_channels) where:

N is the number of data points.
seq_len is the sequence length for time-series.
n_channels is the number of channels.

The problem we face here is that the raw data sequence is very long for the LSTM network. In our example, there are approximately 3,000,000 points in time. This is very long, and typical LSTM cells cannot be trained for such a long series. Therefore, we are going to use 1d convolutions with averages to reduce the number of points. This results in a shorter time series that we can use as an input to an LSTM network. Our approach is based on the code available on this GitHub repository:

Click to see the code!

import os
import numpy as np
from scipy.io import loadmat

# Construct LSTM sequences from one segment
def lstm_sequence(input_segment, target, sampling_freq, window, stride, block_s = 60):
    """ Function for generating blocks of LSTM input tensors
        input_segment : The EEG segment
        target        : 1/0 (preictal/interictial); None for test
        sampling_freq : Samplig frequency
        window        : Window size for 1d convolutions on each block
        stride        : Stride size of the 1d convolution
        block_s       : Size of the block in seconds (default = 60)
    """

    # Dimensions
    n_channels, T_segment = input_segment.shape

    # Determine block dimensions
    block_len = sampling_freq * block_s   # Length of each block
    n_blocks = (T_segment-1) // block_len # Number of blocks
    blocks = [block for block in range(0,(n_blocks+1)*block_len,block_len)]

    # Determine the sequence length for LSTM
    div = (block_len - window)%stride
    if (div != 0):
        pad = stride - div # Size of padding neded
    else:
        pad = 0

    seq_len = (block_len + pad - window) // stride

    # Initiate tensor
    X = np.zeros((n_blocks, seq_len, n_channels))

    # Loop over blocks and fill X
    for ib in range(n_blocks):
        # Get block
        data_block = input_segment[:, blocks[ib]:blocks[ib+1]]

        # Pad if necessary
        if (pad !=0):
            data_block = np.concatenate((data_block, np.zeros((n_channels, pad))), axis=1)

        # 1d convolution by mean
        index = 0
        for j in range(seq_len):
            X[ib, j, :] = np.mean(data_block[:, (index+j):(index+j+seq_len)], axis = 1)

    # Fill in the target
    if (target == 1):
        Y = np.ones(n_blocks)
    elif(target == 0):
        Y = np.zeros(n_blocks)
    else:
        Y = None

    return X, Y, n_blocks

# Collect all the segments to build a tesnsor input for LSTM
def lstm_build_input(clips, target, window, stride, block_s = 60):
    """ Collect all the data and build sequences for LSTM
        clips              : List of clips
        target             : 1/0 (preictal/interictial); None for test set
        window             : Window size for 1d convolutions
        stride             : Length of the stride in 1d convolution
        block_s            : Size of the block in seconds (default = 60)
    """

    # Number of clips
    n_clips = len(clips)

    # Loop over all clips and store data
    iclip = 0
    for file in clips:
        clip = loadmat(file)
        segment_name = list(clip.keys())[3] # Get segment name
        input_segment = clip[segment_name][0][0][0] # Get electrode data
        sampling_freq = np.squeeze(clip[segment_name][0][0][2]) # Sampling frequency

        # Get number of channels
        n_channels = clip[segment_name][0][0][0].shape[0]

        # Get tensor input and targets from blocks
        X, Y, n_blocks = lstm_sequence(input_segment, target, sampling_freq, window, stride, block_s)

        # Concatenate the tensor and target vector
        if (iclip == 0):
            X_train = X
            Y_train = Y[:,None] if Y is not None else None
        else:
            X_train = np.vstack((X_train,X))
            Y_train = np.vstack((Y_train,Y[:,None])) if Y is not None else None

        iclip +=1

    return X_train, Y_train


# Window, stride and block_s
window = 16000
stride = 100
block_s = 60

X_1, Y_1 = lstm_build_input(clips_preictal, 1, window, stride)
X_0, Y_0 = lstm_build_input(clips_interictal, 0, window, stride)

# Scale the data
X_1 = X_1 / np.max(np.abs(X_1), axis=1)[:,None,:]
X_0 = X_0 / np.max(np.abs(X_0), axis=1)[:,None,:]

# Combine the data
X = np.concatenate((X_0, X_1), axis = 0)
Y = np.concatenate((Y_0, Y_1), axis = 0)
Y = np.squeeze(Y)

print("Data shape = ", X.shape)

After averaging, our data shape is (612, 2840, 15), which is an acceptable value a typical LSTM network can handle.

It is always a good idea to normalize the data:

# Normalize
X = X / np.max(np.abs(X), axis=1)[:,None,:]

# Shuffle
np.random.seed(1)
shuffle = np.random.choice(np.arange(len(Y)), size=len(Y), replace=False)
X = X[shuffle]
Y = Y[shuffle]

Build the model

Finally, we are ready to build our RNN model for predictions:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential()
model.add(layers.Input(shape=(2840, 15)))
model.add(layers.LSTM(64))
model.add(layers.BatchNormalization())
model.add(layers.Dense(1, activation='sigmoid'))

Train the model

Now let's train our model with Deephaven tables. This requires a few additional functions:

Click to see the code!

from deephaven import learn
from deephaven.learn import gather
from deephaven import numpy
from keras.callbacks import Callback

from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score


class RocCallback(Callback):
    def __init__(self,training_data,validation_data):
        self.x = training_data[0]
        self.y = training_data[1]
        self.x_val = validation_data[0]
        self.y_val = validation_data[1]

    def on_train_begin(self, logs={}):
        return

    def on_train_end(self, logs={}):
        return

    def on_epoch_begin(self, epoch, logs={}):
        return

    def on_epoch_end(self, epoch, logs={}):
        y_pred_train = model.predict(self.x)
        roc_train = roc_auc_score(self.y, y_pred_train)
        y_pred_val = model.predict(self.x_val)
        roc_val = roc_auc_score(self.y_val, y_pred_val)
        print('roc-auc_train: ', roc_train)
        print('roc-auc_val: ', roc_val)
        return

    def on_batch_begin(self, batch, logs={}):
        return

    def on_batch_end(self, batch, logs={}):
        return


# Function that trains the model
def train_model(X, Y):
    X = X.reshape(X.shape[0], -1, n_channels)   # reshape DH table to 3d numpy array
    X_train, X_valid, Y_train, Y_valid = train_test_split(X, Y, stratify=Y, test_size = 0.1)
    roc = RocCallback(training_data=(X_train, Y_train), validation_data=(X_valid, Y_valid))
    model.compile(loss='binary_crossentropy', optimizer="adam", metrics=["accuracy"])
    model.fit(X_train, Y_train, validation_data=(X_valid, Y_valid), callbacks=[roc], batch_size = 200, epochs=100)

# Function that gets the model's predictions on input data
def predict_with_model(X):
    X = X.reshape(X.shape[0], -1, n_channels)  # reshape DH table to 3d numpy array
    Y_pred = model.predict(X, batch_size=200)
    return Y_pred

# Function to gather data from table columns into a NumPy array of doubles
def table_to_array_double(rows, cols):
    return gather.table_to_numpy_2d(rows, cols, np_type=np.double)

# Function to gather data from table columns into a NumPy array of integers
def table_to_array_int(rows, cols):
    return gather.table_to_numpy_2d(rows, cols, np_type=np.intc)

# Function to extract a list element at a given index
def get_predicted_class(data, idx):
    return data[idx]

# Split the data into training and test datasets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, stratify=Y, test_size = 0.2)

# Convert numpy arrays X_train and Y_train to DH table X_table
n_rows = X_train.shape[0]
n_cols = X_train.shape[1] * X_train.shape[2]
column_names = ['Col_'+str(i) for i in range(n_cols)]
X_reshaped = X_train.reshape(n_rows, n_cols)
X_table = numpy.to_table(X_reshaped, cols=column_names)

def add_class_col(index):
    y_class = [int(i) for i in Y_train.tolist()]
    return y_class[index]

X_table = X_table.update(["Class = (int)add_class_col(i)"])


# Train the model
learn.learn(
    table=X_table,
    model_func=train_model,
    inputs=[learn.Input(column_names, table_to_array_double), learn.Input(["Class"], table_to_array_int)],
    outputs=None,
    batch_size=200
)


# Convert numpy array X_test to DH table X_table_test
X_reshaped_test = X_test.reshape(X_test.shape[0], n_cols)
X_table_test = numpy.to_table(X_reshaped_test, cols=column_names)

# Use the learn function to create a new table that contains predicted values
predicted = learn.learn(
    table=X_table_test,
    model_func=predict_with_model,
    inputs=[learn.Input(column_names, table_to_array_double)],
    outputs=[learn.Output("PredictedClass", get_predicted_class, "int")],
    batch_size=200
)

To evaluate our model, we calculated the area under the ROC curve (AUC) - the same metric that was used to judge submissions in the Kaggle Seizure Prediction Challenge. For our validation dataset, we got AUC = 0.8. Of course, it should ideally be closer to 1 for a good classifier. But our model is just a toy example we built with limited domain knowledge in neuroscience and without using complex signal processing procedures and feature engineering.

Real-time predictions

As mentioned before, one of Deephaven's biggest advantages is the ability to deal with numerous real-time data feeds. To simulate the real-time feed, we can use a TableReplayer:

from deephaven.replay import TableReplayer
from deephaven import time as dtu
from deephaven.time import to_datetime
from deephaven import numpy

# use our test data to simulate real-time feed
X_live = X_test
n_rows = X_live.shape[0]
n_cols = X_live.shape[1] * X_live.shape[2]
X_live = X_live.reshape(n_rows, n_cols)

# convert numpy array to DH table
X_live = numpy.to_table(X_live, cols=column_names)

start_time = dtu.to_datetime("2022-01-01T00:00:00 NY")
def add_datetime_col(index):
    return dtu.plus_period(start_time, dtu.to_period(f"T{index}S"))

X_live = X_live.update(["Timestamp = (DateTime)add_datetime_col(i)"])

# replay historical data
start_time = to_datetime("2022-01-01T00:00:00 NY")
end_time = to_datetime("2022-01-01T00:02:30 NY")

replayer = TableReplayer(start_time, end_time)
replayed_table = replayer.add_table(X_live, "Timestamp")
replayer.start()

predicted = learn.learn(
    table=replayed_table,
    model_func=predict_with_model,
    inputs=[learn.Input(column_names, table_to_array_double)],
    outputs=[learn.Output("PredictedClass", get_predicted_class, "int")],
    batch_size=200
)

Though we trained our model on the static dataset, Deephaven can use the streaming data source to perform real-time classification:

Making predictions with an LSTM network with Deephaven