Since its founding in 2009, Slack has become a powerhouse for communication within professional environments. Chances are you work at a company that uses Slack for messages among co-workers. Companies also frequently use Slack's freemium offering to create public workspaces to communicate with customers. This has been a huge success because it provides a known medium of communication, and companies don't need to allocate a budget for it.
However, the freemium product is not without its limitations. It only stores a maximum of 10,000 messages. Once the limit is reached, old messages start getting deleted. This becomes a problem if important messages get lost over time.
Using the Slack API, I was able to archive messages before the limit was reached, and preserve them for future analysis. Read on to learn how to set up this solution for yourself.
Using the Slack API
Fortunately for us, Slack provides an API that developers can use. The conversation history method can be used to pull messages from a workspace. Combining this with the conversations list method provides an easy way to preserve messages for a workspace. Simply grab all the channels, and then grab all the messages for each channel.
After installing the Slack Python SDK and setting up an application, you can run the following Python code to grab your messages.
from slack_sdk import WebClient
import os
import time
SLACK_API_TOKEN = os.environ.get("SLACK_API_TOKEN")
slack_client = WebClient(token=SLACK_API_TOKEN)
def get_public_channels():
cursor = None
channels = []
while True:
response = slack_client.conversations_list(cursor=cursor)
for channel in response["channels"]:
channels.append(channel["id"])
cursor = response["response_metadata"]["next_cursor"]
if len(cursor) == 0:
break
else:
print("Pagination found, getting next entries")
time.sleep(3)
return channels
def get_channel_messages(slack_channels):
messages = []
for slack_channel in slack_channels:
cursor = None
while True:
channel_history = slack_client.conversations_history(channel=slack_channel, cursor=cursor)
for message in channel_history["messages"]:
if (message["type"] == "message"):
messages.append((slack_channel, message["text"]))
if bool(channel_history["has_more"]):
cursor = channel_history["response_metadata"]["next_cursor"]
else:
cursor = None
if cursor is None:
break
else:
print("Pagination found, getting next entries")
time.sleep(1.2)
return messages
slack_channels = get_public_channels()
messages = get_channel_messages(slack_channels)
print(messages)
This looks good...but there's a small issue. If you have any threads in your channels (which is extremely common), you may notice that those threads are missing. Thankfully the conversation replies method lets you pull messages from threads. Let's redefine get_channel_messages
and update it to pull messages from these threads.
def get_thread_messages(slack_channel, ts):
messages = []
cursor = None
while True:
thread_replies = slack_client.conversations_replies(channel=slack_channel, ts=ts, cursor=cursor)
for message in thread_replies["messages"]:
if (message["type"] == "message"):
messages.append(message["text"])
if bool(thread_replies["has_more"]):
cursor = thread_replies["response_metadata"]["next_cursor"]
else:
cursor = None
if cursor is None:
break
else:
print("Pagination found, getting next entries")
time.sleep(1.2)
return messages
def get_channel_messages(slack_channels):
messages = []
for slack_channel in slack_channels:
cursor = None
while True:
channel_history = slack_client.conversations_history(channel=slack_channel, cursor=cursor)
for message in channel_history["messages"]:
if (message["type"] == "message"):
if ("thread_ts" in message):
for text in get_thread_messages(slack_channel, message["ts"]):
messages.append((slack_channel, text))
else:
messages.append((slack_channel, message["text"]))
if bool(channel_history["has_more"]):
cursor = channel_history["response_metadata"]["next_cursor"]
else:
cursor = None
if cursor is None:
break
else:
print("Pagination found, getting next entries")
time.sleep(1.2)
return messages
Using the messages within Deephaven
Now that we know how to pull our message information from Slack, let's put this data into Deephaven. Using the DynamicTableWriter, we can easily write our Slack data to Deephaven tables.
from deephaven import DynamicTableWriter
import deephaven.dtypes as dht
column_definitions = {
"Channel": dht.string,
"Message": dht.string
}
table_writer = DynamicTableWriter(column_definitions)
for (slack_channel, message) in messages:
table_writer.write_row(slack_channel, message)
table = table_writer.table
We now can use all of Deephaven's table operations and tools on our Slack messages! If we want to write our data to disk, we can use the Parquet file writer method to write our table.
from deephaven.parquet import write
write(table, "/data/slack_messages.parquet")
Make it your own
As more and more data is generated by various data sources, it's important to know how to retrieve and store this data for future needs. This blog post shows just one of many examples of how you can work with data using Deephaven. The code in this project comes from Deephaven's social data collector, so feel free to check out that project and use it for your own needs. Tell us what other data sources you're working with by reaching out on - where else? - Slack.