Deephaven is a query engine that excels at working with real-time data. Data scientists and developers use Deephaven to analyze capital markets, blockchains, cryptocurrency, gaming, sports, and e-commerce. Why not use it for addressing ethical issues and improving an organization's climate as well?
According to the MIT Sloan Management Review, toxic work culture is the biggest reason why people quit their jobs. Their research estimates it’s 10 times more important than salary.
Today, we'll demonstrate how to create a working prototype of a solution that checks if a new message posted to a Slack channel reads as toxic. If so, a bot sends a warning message to the channel.
The process is simple and requires only 3 steps:
- Receive and store real-time Slack chat messages in a Deephaven table.
- Calculate the probability of toxicity for each message.
- Send a notification if a message is classified as toxic.
If you just want to look at some code, this GitHub repository has everything. For further details, keep reading!
Pull live data
To get messages from Slack, we'll use Socket Mode. To set up Socket Mode, we need to create an app and generate an app-level token.
After that we are ready to receive a private WebSocket URL:
SLACK_ENDPOINT = 'https://slack.com/api/apps.connections.open'
APP_TOKEN = os.environ["APP_TOKEN"]
# call the apps.connections.open endpoint with app-level token to get a WebSocket URL
headers = {'Authorization': f'Bearer {APP_TOKEN}', 'Content-type': 'application/x-www-form-urlencoded'}
response = requests.post(SLACK_ENDPOINT, headers=headers)
url = response.json()["url"]
Let's connect to it! For our example, we want the websocket to deliver events only about new messages in a Slack channel:
# connect to the WebSocket
ws = create_connection(url)
# install the app in your workspace to get the Bot User OAuth token
BOT_OAUTH_TOKEN = os.environ["BOT_OAUTH_TOKEN"]
# we don't want to get data about all activities in Slack, so we subscribe only to the event type = message
ws.send(
json.dumps(
{
"type": "subscribe",
"token": BOT_OAUTH_TOKEN,
"event": {
"type": "message",
"subtype": None
}
}
)
)
DynamicTableWriter
Deephaven's DynamicTableWriter can help us create a live table to store incoming messages and their integer representations that will be used as features for our ML model:
Click to see the code!
# use Deephaven's DynamicTableWriter to create a table for features (integer representation of words)
# and original messages
columns = ["Index_" + str(num) for num in range(MAX_NUMBER)]
column_definitions = {col: dht.int32 for col in columns}
column_definitions["message"] = dht.string
dtw = DynamicTableWriter(column_definitions)
table = dtw.table
# receive real-time messages by the websocket and write data to the table
def thread_function():
while True:
try:
data = json.loads(ws.recv())
event = data["payload"]["event"]
message = event["text"]
if (data["retry_attempt"] == 0 and "bot_id" not in event):
# convert message into integer sequence encoding the words using a pre-trained tokenizer
list_tokenized = tokenizer.texts_to_sequences([message])
row_to_write = pad_sequences(list_tokenized, maxlen=MAX_NUMBER)[0].tolist()
row_to_write.append(message)
# add integers representing words and original text to DH table
dtw.write_row(*row_to_write)
except Exception as e:
print(e)
thread = Thread(target=thread_function)
thread.start()
Predicting
To recognize toxic patterns in incoming Slack messages, we'll use a basic LSTM model trained on a Kaggle dataset:
Click to see the code!
# load our pre-trained model
model = load_model("/data/model.h5")
print(model.summary())
# function that gets the model's predictions on input data
def predict_with_model(features):
predictions = model.predict(features)
return predictions
# help function to gather data from table columns into a NumPy array of integers
def table_to_array_int(rows, cols):
return gather.table_to_numpy_2d(rows, cols, np_type=np.intc)
# create a list for learn.Output() objects
outputs = []
for i in range(len(TOXICITY_TYPES)):
type = TOXICITY_TYPES[i]
get_predicted_class = lambda data, idx: data[idx][i]
outputs.append(learn.Output(type, get_predicted_class, "double"))
# use the learn function to create a new table that contains predicted values
predicted = learn.learn(
table=table,
model_func=predict_with_model,
inputs=[learn.Input(columns, table_to_array_int)],
outputs=outputs,
batch_size=100
)
Here is the table with our live predictions:
Now let's use the Slack Web Client to send back a message to the channel containing the result of the predictions. An alert will trigger if the probability of toxic content is greater than a threshold for at least one of the toxicity types:
Click to see the code!
# use the Slack Web Client to send messages to a channel
client = WebClient(token=BOT_OAUTH_TOKEN)
threshold = 0.5
# create a listener to our table with predictions
# once the predicted table is updated, we post a warning to our slack channel if the probability of toxicity > threshold
# at least for one of the indicators
def predicted_listener(update, is_replay):
added_dict = update.added()
warning = ""
warning_types = [(type, added_dict[type]) for type in TOXICITY_TYPES if added_dict[type] > threshold]
for item in warning_types:
warning += f'Detected {item[0]} with probability {item[1][0]:.1f}. '
if warning != "":
client.chat_postMessage(channel=CHANNEL, text=warning)
predicted_handler = listen(predicted, predicted_listener)
Let's test our bot:
This starter program just scratches the surface of the Artificial Intelligence (AI) integration into the workplace. But we hope it'll inspire you to use Deephaven to solve real-life problems!
Talk to us
If you have any questions, comments, concerns, you can reach out to us on Slack - no toxicity welcome, of course. We'd love to hear from you!