Import HTML files
While Deephaven does not have its own methods for reading HTML tables, it's easy to do with pandas or BeautifulSoup. This guide will demonstrate multiple ways to pull data from online HTML tables into Deephaven tables.
HTML table
We'll use this HTML table in the examples below:
Color | Shades | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| primary | #e5faff | #d7f7ff | #c8f4ff | #b9f1ff | #a9edff | #99eaff | #7fc6d9 | #65a4b4 | #4d8391 | #35636f | #1f454f |
| secondary | #6a81f2 | #596fe9 | #495ee0 | #384cd6 | #2439cb | #0625c0 | #0121aa | #001e94 | #001a7e | #011569 | #031155 |
| negative | #f883a0 | #f87694 | #f76989 | #f65a7d | #f54a72 | #f33666 | #da2e59 | #c1254c | #a91d40 | #911534 | #7a0d28 |
| positive | #9dfc7e | #93fa72 | #89f966 | #7ef758 | #72f549 | #65f336 | #58d92e | #4cc025 | #3fa71d | #348f15 | #28780d |
| warn | #ffe590 | #ffe183 | #fedd75 | #fed966 | #fdd455 | #fdd041 | #e8be38 | #d4ac2f | #c09b27 | #ac8a1e | #997915 |
| info | #f08df9 | #ee7ff8 | #eb70f7 | #e960f6 | #e64df4 | #e336f3 | #cd34dd | #b732c7 | #a22fb2 | #8d2c9d | #792989 |
| fg | #f3f7fa | #eef2f5 | #e9edf0 | #e4e9ec | #dfe4e7 | #dadfe2 | #c6cbcf | #b2b8bc | #9fa5aa | #8c9298 | #798086 |
| bg | #434f56 | #3c474d | #353e44 | #2f363c | #282f33 | #22272b | #1f2327 | #1b2023 | #181c1f | #15181b | #121517 |
pandas.read_html
pandas.read_html is a convenient method for reading HTML tables into a list of pandas DataFrames. The method can be used to read tables from a URL or a local file.
We'll start with an example of how to use pandas.read_html to read an HTML table from a URL and convert it to a Deephaven table. This approach is simple and easy, but has some limitations - for instance, it does not handle the non-numerical values in this table gracefully:
The sections below properly handle data types, leading to tables with actual data rather than just NaNs.
BeautifulSoup
BeautifulSoup is a Python library for pulling data out of HTML and XML files.
In this example, we'll use BeautifulSoup to read the same HTML table as in the example above and convert it to a Deephaven table:
Data types
Since HTML tables store all data as plain text and have no concept of data types, some care must be taken when importing HTML tables into Deephaven to ensure that you end up with correct data types for each column. Deephaven's to_table method will automatically infer types as long as infer_objects=True, but to guarantee that the types are correct, manual specification is recommended.
Whether you are using pandas or BeautifulSoup, you can specify the data type of each column at either the DataFrame stage or by calling update to typecast columns after the Deephaven table has been created. This can be done by using the astype method in pandas or by using one of Deephaven's selection methods.
Typing with selection methods
In this example, we use the same HTML table as in the examples above as a source. We then read the table into Deephaven with BeautifulSoup and convert it to a Deephaven table with to_table. Finally, we restore the correct types fore each column with Deephaven's update method:
Note that the Pandas DataFrame.astype method can also be used to restore typing. However, it does not handle Deephaven's datetime types effectively, so the update method is recommended in those cases.
Typing with astype
This example demonstrates how to add typing to the deephaven_theme_colors table created above, using astype: