Skip to main content

Create your own historical crypto database from scratch

· 4 min read
DALL·E prompt: A data server rack being squeezed in a vice, digital painting
JJ Brosnan

Do you have big ambitions for cryptocurrency analysis, but need a better way to store and access data? Historical data is critical to infrastructure. It is required for processes like validation, integration with real-time data, proofs of concept, and root cause analysis. Here's how I obtain, store, and access crypto data with Deephaven Community Core.

Using Parquet files, you can create a historical crypto database with 14 coins, 4 exchanges, 2 currencies, and 14 granularities in just a few minutes with the Deephaven plus CryptoWatch repository. After all that, you'll still be left with most of your daily CryptoWatch credits. Use those to increase that to 100+ coins and dozens of exchanges.

Creating a historical crypto ecosystem from scratch

The CryptoWatch Public Data API

In building my local database of crypto data, I spent quite some time exploring different options to access publicly available crypto data. After considering and trying a few, I went with CryptoWatch public data API. Their REST API is free to use, doesn't require a key, has a ton of data, and tells you about your usage with every single request.

The free API gives you 10 credits per day for use without an API key. Each request to the API costs between 0.002 and 0.015 credits, meaning you can perform anywhere from 660 and 5,000 requests per day. For more information on the credit cost of request types, see their rate limits page. If you need more API usage, that page also has options available.

Deephaven plus CryptoWatch

Everything discussed in this blog can be done via the Deephaven plus CryptoWatch repository. There, you'll find all of the requirements and code to integrate the CryptoWatch REST API with Deephaven to create your own historical Parquet crypto database in minutes.

Available assets and exchanges

Obtaining all assets and exchanges available on CryptoWatch is simple. The following URLs have all of the data. Accessing each URL costs only 0.002 credits each.

Historical OHLC data

Historical OHLC data for a pair (coin + currency) on a given exchange is obtained via the following link:

Every get request to the OHLC interface gets you historical data at 14 different granularities ranging from a minute to a week. Each historical OHLC pull costs 0.015 CryptoWatch credits.

Organize your crypto data

I recently blogged about Parquet for crypto data storage. One major advantage of the Parquet format is its reduced file size. When using crypto data, we can store significantly more data on disk. When pulling historical data using the script Pull_Coins_Currencies_From_Exchanges_CryptoWatch.py, the data is written to a single table. If write_flag is kept as True, the data from the table is written to Parquet in two ways:

  • Bulk
    • The bulk data is written to a file called /data/CryptoWatch/historical_crypto_data_{TODAYS_DATE}.parquet. This file will contain all of the data you pulled from CryptoWatch for the given day.
  • Nested
    • The data is also written in a nested fashion using partitioning. The partitioning is done on the following columns (in order): Coin, Exchange, Currency, Granularity.

Here's how you could read Parquet data for a given coin, exchange, currency, and granularity:

from deephaven import parquet as dhpq

kraken_btc_usd_1h = dhpq.read("/data/CryptoWatch/Coin/btc/kraken/usd/1h/data.parquet")

img

Try it for yourself

Head on over to the Deephaven plus CryptoWatch repo to start utilizing CryptoWatch and Deephaven for your crypto analysis. The repository contains functionalities for pulling:

  • Assets
  • Exchanges
  • Historical OHLC data

The README has directions on how to do all of this.

Got any questions/comments/concerns? Reach out on Slack.

Further reading