Import from Quandl
Quandl is an popular source for various types of data, mostly financial. Quandl provides an API that allows access to data, metadata, and some catalog information for these data sets. Files are available in CSV, JSON, and XML formats. Some of the data sets are free, while others are paid subscription items.
At this time, all of Quandl's data is snapshot-updated. Much of the data is updated daily. However, depending on the dataset, update frequency ranges from hourly to annually.
Quandl Data Formats
Time-Series
Time-series metrics account for the majority of data available through Quandl. Time-series data sets have a date or date/time value for each row, and only numeric values for the other fields. For the most part, they are vertically partitioned, with a single metric per data set. Horizontally partitioned sets such as per-symbol instrument data also exist. The API catalog provides information about time-series data sets, the publishers that exist, and data sets available from each publisher. Metadata for time-series is a bit limited, often providing only the names of columns and not the data types contained therein.
Datatables
Datatables are standard looking tables. Like time-series data, these have a date or date/time value for each row, but also contain text values in their fields, such as ticker symbols. The datatables API supports different filtering and downsampling options that are defined per table. The API also provides fairly rich metadata that includes the data type of each column. However, datatables are not listed in the Quandl catalog API.
Connecting to Quandl
Quandl's API is web-hosted. To interact with it, you can use a regular Web browser. You submit requests to it through HTTP Gets that include the specific data set and options being requested, and an API key parameter for authorization. An API key can be obtained by registering at www.quandl.com, and is a prerequisite to using Quandl data import features in Deephaven. The API key parameter must always be included in API requests. If it is invalid or empty, the Quandl API will reject the request with an HTTP status code of 400. Even sample data from premium content channels requires a valid API key, which is then used to associate subscriptions to paid content.
Quickstart
The following is a sample sequence of command line commands to import end-of-day data for U.S. stocks starting in 2018. These commands assume a typical Deephaven installation, and that iris_exec
is on the user's PATH.
Create the table schema file
This can be run in the user's home directory:
iris_exec quandl_schema_creator -- -ns Quandl -qd WIKI -qt PRICES -gc ticker -pc data_set -ak <Quandl API Key>
The schema file will be generated in the Quandl directory.
ls -l Quandl/*.schema
Deploy the schema
sudo -u irisadmin /usr/illumon/latest/bin/dhconfig schema import --file <fully qualified schema file name>
Run the Quandl importer to download and import data to the new table
sudo -u dbmerge iris_exec quandl_import -- -dp localhost/2018-09-26 -ns Quandl -qd WIKI -qt PRICES -ak <Quandl API Key> -om REPLACE -aa "date.gte=20180101 qopts.export=true"`
Caution
The Quandl import process may take more time than expected.
Schema Creator
The schema creator for Quandl generates a table schema from a Quandl data table or data set(s) using the metadata provided by Quandl.
Command Reference
iris_exec quandl_schema_creator <launch args> -- <schema creator args>
The following arguments are available when running the Quandl schema creator:
Argument | Description |
---|---|
-ns or --namespace <namespace> | (Required) The namespace in which to find the target table (required). |
-tn or --tableName <name> | (Optional) TThe name of the target table (optional) will be derived from the Quandl database, and dataset or datatable name, if not specified. E.g., WIKI and PRICES would result in a table called WikiPrices. |
-qd or --quandlDatabase | (Required) The name of the Quandl database from which to download. |
-qt or --quandlTable | (Required) he name of the Quandl datatable or dataset from which to download. |
-sp or --schemaPath | An optional path to which the schema file will be written, if a schema file is being generated. If not specified, this will default to the current working directory and will create or use a sub-directory that matches the namespace. |
-ak or --apiKey | The API key to use when connecting to the Quandl API. Like the getQuandl commands described earlier, the QuandlImporter class will use an API key from a configuration settings Quandl.APIKey value, if present, and if no key is specified using the -ak argument. |
-gc or --groupingColumn | Optional column to mark as the grouping column if schema is being generated. |
-pc or --partitionColumn | Optional name for the partitioning column if schema is being generated. If this is not provided, the importer will default to Date for the partitioning column. Any existing column from the source that matches the name of the partitioning column will be renamed to source_[original column name]. (See: Tables & Schemas > Partitions) |
-dc or --datasetColumn | Optional name to use for the column that will designate the dataset name from which rows were imported. This applies only to Quandl time-series data sources. If no name is provided, the importer will add a column called "QuandlDataset" to store this value. |
-lp or --logProgress | Enables additional progress logging during the import operation, which maybe useful for monitor the progress of large, long-running imports. |
Import Query
There is presently no support for Quandl imports via an Import Query.
Importing to In-Memory Tables
Importing from Quandl by script differs somewhat from the file-based imports. Quandl data can be pulled into in-memory tables using the techniques described here.
Deephaven Quandl functions are contained in the QuandlHelpers
class. This class is not imported into the Deephaven Groovy session by default, so use of these methods requires either explicit imports, or full qualification of method names.
Explicit import
import static com.illumon.iris.quandl.QuandlHelpers.*
quandlAPIKey("4qxnjGszorJKZ565sJP7")
Warning
Note: "4qxnjGszorJKZ565sJP7" is not a valid Quandl API Key. Replace with your own.
Fully-qualified
com.illumon.iris.quandl.QuandlHelpers.quandlAPIKey("4qxnjGszorJKZ565sJP7")
Warning
Note: "4qxnjGszorJKZ565sJP7" is not a valid Quandl API Key. Replace with your own.
Methods
There are three main methods for creating in-memory tables from Quandl data:
getQuandlTimeSeries
– connects to the time-series API, and creates a table from the resultant data retrieved.getQuandlTable
– connects to the datatables API, and creates a table from the resultant data retrieved.getQuandl
– will try both APIs – datatables first, then time-series if datatables doesn't have the requested data set.
Each of the methods throws an IOException, which is normally caused by a problem parsing CSV data downloaded from Quandl. If the method is used through the Deephaven console, the console itself will set up a handler for the IOException. If the method is used in Java code, the method will need to be called from within a try/catch block to ensure the IOException is caught if thrown. RuntimeExceptions normally result from other problems that can occur during the download and the parsing of Quandl data (e.g., a timeout, or an incorrect property value).
Each of the methods noted above has several forms in which it can be used. The simplest form takes two arguments: a Quandl database name, and a Quandl dataset or datatable name. (Note: These names are shown in Quandl's online data catalog.) For instance, using the database "WIKI" and the datatable "PRICES" provides free end-of-day stock data for 3,000 U.S. stocks.
A slightly more complex form for calling these functions takes three or more arguments. Like the simplest form, the first two arguments are the Quandl database name and the Quandl dataset name. However, the last argument is a variable argument (vararg), and it can be used to include one or more Quandl API arguments (separated by commas) to limit the scope of the data requested from Quandl.
For example:
myTable=getQuandlTable("WIKI", "PRICES", "date.gte=20100101", "qopts.export=true")
In this case, "date.gte=20100101"
tells the Quandl API to filter the WIKI/PRICES datatable so only those dates that are greater than or equal to January 1, 2010 are included. The next argument, "qopts.export=true", tells the Quandl API to package the results in a zip file for download. (Note: Deephaven automatically handles checking for the zipped results to be ready for download, downloading them, and unpacking them.) The option to zip results is only available for datatables, not for time series. Also, for large datatables, if the results are not zipped, only the first 10,000 rows will be returned.
Another argument that can be passed to these methods is a StatusCallback
object. This progress parameter allows the getQuandl methods to update process status as data is downloaded and parsed. This can be used to poll status, or to register a callback that can be used to update a status message and/or a progress bar. If the version of a method that includes a StatusCallback parameter is called, with the parameter value explicitly set to null, this will result in progress update info messages being logged to the console log. If a version of the method that does not include a StatusCallback
parameter is used, then no progress messages will be logged. A custom StatusCallback object can also be created by the calling application and passed in, to allow logging and/or other progress tracking.
getQuandl
When using getQuandl
commands, the Quandl API key can be provided using the following methods:
- Set it in a configuration (
.prop
) file: The settingQuandl.APIKey
can be included in a configuration file. - Set it before calling
getQuandl
: ThequandlAPIKey
property of theQuandlHelpers
class can be used to set or retrieve the current API key being used by the class instance. - Specify it as part of the optional arguments to a
getQuandl
call: Including api_key=[your_api_key] as one of the arguments will result in this API key being used for the call.
The order or priorities for an API key are:
- (Highest) API key set by optional argument.
- API key set through quandlAPIKey property.
- API key set in configuration file.
- If none of these is set, an empty API key will be used, and the request will most-likely fail with an HTTP 400 status code.
getQuandlTimeSeries
For time-series data, the dataset (second) argument to the commands also accepts a comma-separated list of dataset names, as well as wildcard patterns (using * as a wildcard character).
For example, the following query will download the time-series datasets for Apple, IBM, and Google end-of-day data, and merge them into a single table. When providing a list or pattern of datasets, a QuandlDataset
column will be added to the table, and populated with the name of the dataset that provided the rows. Rows that came from the WIKI/IBM dataset will have IBM in their QuandlDataset
column. Note: This column is only added when a list or pattern is used for datasets. A single dataset is returned exactly as seen on Quandl's site, with no extra column.
myTable=getQuandlTimeSeries("WIKI", "AAPL, IBM, GOOG")
Similarly, the following query will download and concatenate all time-series WIKI datasets that start with AT:
myTable=getQuandlTimeSeries("WIKI", "AT*")
For the Quandl WIKI time-series datasets, all the datasets contain the same columns in the same order, so there are no concerns about compatibility between the different datasets. However, other Quandl databases offer datasets with disparate column sets. If a selection includes datasets that use different schema from the first dataset downloaded, the command will fail.
getQuandlDatasets
One other command available to make it easier to use time-series data is getQuandlDatasets
. For example, the following command returns a table of dataset IDs and their respective descriptions for the YALE database (Yale University Department of Economics):
yaleCatalog=getQuandlDatasets("YALE")
Or, with an API Key:
yaleCatalog=getQuandlDatasets("YALE","*","api_key=4qxnjGszorJKZ565sJP7")
Warning
Note: "4qxnjGszorJKZ565sJP7"is not a valid Quandl API Key. Replace with your own.
Import from Command Line
QuandlImporter
assumes (like other importers) that the target namespace and table already exist, and can then:
- Request and download zipped or non-zipped Quandl data files.
- Extract zipped data files.
- Import data to the Intraday space for the Deephaven table.
Command Reference
iris_exec quandl_import <launch args> -- <quandl import args>
Quandl Import Arguments
Argument | Description |
---|---|
-qd or --quandlDatabase | (Required) The name of the Quandl database from which to download. |
-qt or --quandlTable | (Required) The name of the Quandl datatable or dataset from which to download. |
-ak or --apiKey | The API key to use when connecting to the Quandl API. Like the getQuandl commands described earlier, the QuandlImporter class will use an API key from a configuration settings Quandl.APIKey value, if present, and if no key is specified using the -ak argument. |
-aa or --apiArgs | Additional optional arguments to pass to the Quandl API - a space-delimited, double-quoted, list of name=value arguments. |
| Either a destination directory or partition must be provided. A directory can be used to write a new set of table files to specific location on disk, where they can later be read with TableTools. A destination partition is used to write to intraday locations for existing tables. The internal partition value is used to separate data on disk; it does not need to be unique for a table. The name of the import server is a common value for this. The partitioning value is a string data value used to populate the partitioning column in the table during the import. This value must be unique within the table. |
-ns or --namespace <namespace> | (Required) The namespace in which to find the target table. |
-tn or --tableName <name> | (Optional) The name of the target table will be derived from the Quandl database, and dataset or datatable name, if not specified; e.g., WIKI and PRICES would result in a table called WikiPrices. Of course, the derived name must still match the name of an existing table definition. |
-om or --outputMode <import behavior> | (Optional)
|
-wd or --workingDirectory <path> | (Optional) Path in which to store data file downloaded from Quandl (optional). If this option is not used, data files will be downloaded and processed in memory. This option can be helpful to conserve memory when downloading large amounts of data, or on systems with small amounts of memory available for the import process. |
-kt or --keepTempFiles | When used with -wd instructs the import process to leave temp files in the working directory after importing them. By default, these temp files are deleted after import. |
dt | (Optional) Download timeout. More precisely, this is the amount of time (in seconds) to wait for the Quandl server to generate a zip file when qopts.export=true has been specified. The default is five minutes. Generation of some large table result files could be as long as 20 minutes, but simpler results often only take a few seconds. This setting only applies to datatables imports, since dataset files are never zipped. |
-lp or --logProgress | Enables additional progress logging during the import operation. Useful to monitor progress for large, long-running imports. |