If you’re looking for a database to manage accounting, HR and CRM systems, relational databases running SQL offer tremendous flexibility and extensibility. However, if you want to analyze massive time series datasets, like those essential for today’s capital markets, you need a different tool.
Capital markets produce and consume massive amounts of time series data. For example, an options market maker might need to analyze many terabytes of data to produce a strategy that appears to build alpha. This escalation in scale fundamentally changes the requirements for speed. In high-frequency / low-latency trading models, microseconds define the difference between capturing edge and losing it. The combination of massive data volumes and increased time-sensitivity creates formidable challenges to using relational database systems for capital market computations.
Relational databases are particularly suited for transactional systems, where data are structured in rows that each represent an aspect of a transaction. Most business queries for these transactions involve the data contained in only a small number of entire rows and can be processed efficiently. In comparison, consider the capital market quant, who needs access to data that exists in only a few columns from vast quantities of rows — perhaps tens of millions, or even billions of rows. Using a relational database and its row-based architecture to access and analyze this data would be slow and inefficient.
Teasing insights from capital market data is often supported with live, streaming data – making it an always-on endeavor. Any number of factors can change in the blink of an eye, so the agility of being able to react promptly to the nuances presented in streaming data provides tremendous advantages. However, relational databases simply can’t deliver in this regard. Queries run on relational databases return static results, and nothing changes until the query is re-run. The struggle to repeatedly run queries against constantly updating source data turns into a perpetual burden on system processing and efficiency.
So, if a relational database system is not the right tool for the job, what is? Let’s look again at the requirements. Analyzing massive time series datasets requires a system with tremendous capacity and throughput combined with incredible speed and efficiency. Using a column-based data architecture enhances speed still further. Finally, the ability to consume live, streaming data provides a distinct advantage. There may be systems that perform well when considering only one or two of those requirements, but achieving them all is elusive.
You could build your own system, but the challenges are steep. You would first need to source highly qualified developers, programmers, system architects and data scientists. Even when using an elite team with capital markets experience, it would take years to architect, code and produce the software, during which time your enterprise will be missing out on many of the benefits and competitive edge this “future” platform might provide. Moreover, there is no guarantee of success.
What about commercial solutions?
There is one well-known and capable option among capital market customers. It has high capacity and great speed. It also uses column-based data architectures, and can handle live data processing. Nevertheless, this database uses a proprietary and non-intuitive query language that is extremely difficult to learn and use.
This means quants and traders are dependent on advanced programmers to write their queries, creating inevitable backlogs and effectively keeping the raw data inaccessible to knowledge experts. New ideas — even slight tweaks to existing ideas — are unnecessarily delayed as they are filtered through multiple people and teams. Long cycle times (which often take weeks or months) between strategy development, testing, adjustment, and implementation are non-productive and expensive. More importantly, alpha ticks away.
Enter Deephaven. Fast. Easy. Integrated.
Deephaven was designed from the ground up to find and exploit the insights hidden in capital market data. Intelligent system architecture emphasizes high-speed throughput while column-oriented data stores provide additional efficiencies in processing massive volumes of streaming data and/or historical data. The underlying database engine is paired with an easy-to-learn, human-readable query language, which enables all users to develop, test, and adjust their own strategies in near-real time.
Deephaven also integrates seamlessly with Java, Python, R, C#, and C++ applications and libraries, as well as open source data and machine learning technologies. This means quants, traders and compliance personnel can code their own algos, signaling models, and surveillance routines using the tools and programming languages they already know and use.
This single, shared platform encourages collaboration, and ensures each domain expert in the enterprise can be self-sufficient, performing their own research and analyses without the usual delays and costs inherent when using intermediaries. No data bottlenecks. Quick iteration and evolution becomes a competitive edge when blazing fast research cycles occur in hours and days instead of weeks and months.
Deephaven has been battle-tested in the capital markets for more than five years with outstanding success and is now commercially available from Deephaven Data Labs. Clients are currently using Deephaven to develop and deliver high-feature order management systems, vol-surface fitters, real-time surveillance systems, risk control scenario analyses, greybox trading platforms, stat arb strategies, slippage and execution quality analytics.
To learn how Deephaven can empower your enterprise, please contact email@example.com.