Box Plot
A box plot, also known as a box-and-whisker plot, is a data visualization that presents a summary of a dataset’s distribution. It displays key statistics such as the median, quartiles, and potential outliers, making it a useful tool for visually representing the central tendency and variability of data. To learn more about the mathematics involved in creating box plots, check out this article.
Box plots are appropriate when the data have a continuous variable of interest. If there is an additional categorical variable that the variable of interest depends on, side-by-side box plots may be appropriate using the by
argument.
What are box plots useful for?
- Visualizing overall distribution: Box plots reveal the distribution of the variable of interest. They are good first-line tools for assessing whether a variable’s distribution is symmetric, right-skewed, or left-skewed.
- Assessing center and spread: A box plot displays the center (median) of a dataset using the middle line, and displays the spread (IQR) using the width of the box.
- Identifying potential outliers: The dots displayed in a box plot are considered candidates for being outliers. These should be examined closely, and their frequency can help determine whether the data come from a heavy-tailed distribution.
Examples
A basic box plot
Visualize the distribution of a single variable by passing the column name to x
or y
.
import deephaven.plot.express as dx
tips = dx.data.tips()
# control the plot orientation using `x` or `y`
box_plot_x = dx.box(tips, x="TotalBill")
box_plot_y = dx.box(tips, y="TotalBill")
Distributions for multiple groups
Box plots are useful for comparing the distributions of two or more groups of data. Pass the name of the grouping column(s) to the by
argument.
import deephaven.plot.express as dx
tips = dx.data.tips()
# total bill distribution by Smoker / non-Smoker
box_plot_group_1 = dx.box(tips, y="TotalBill", by="Smoker")
# total bill distribution by male / female
box_plot_group_2 = dx.box(tips, y="TotalBill", by="Sex")
API Reference
Returns a box chart
Returns: DeephavenFigure
A DeephavenFigure that contains the box chart
Parameters | Type | Default | Description |
---|---|---|---|
table | PartitionedTable | Table | DataFrame | A table to pull data from. | |
x | str | list[str] | None | None | A column or list of columns that contain x-axis values. If both x and y are specified, one should be numerical and the other categorical. If x is numerical, the violins are drawn horizontally. |
y | str | list[str] | None | None | A column or list of columns that contain y-axis values. If both x and y are specified, one should be numerical and the other categorical. If y is numerical, the violins are drawn vertically. |
by | str | list[str] | None | None | A column or list of columns that contain values to plot the figure traces by. All values or combination of values map to a unique design. The variable by_vars specifies which design elements are used. This is overriden if any specialized design variables such as color are specified |
by_vars | str | list[str] | 'color' | A string or list of string that contain design elements to plot by. Can contain color. If associated maps or sequences are specified, they are used to map by column values to designs. Otherwise, default values are used. |
color | str | list[str] | None | None | A column or list of columns that contain color values. The value is used for a plot by on color. See color_discrete_map for additional behaviors. |
hover_name | str | None | None | A column that contains names to bold in the hover tooltip. |
labels | dict[str, str] | None | None | A dictionary of labels mapping columns to new labels. |
color_discrete_sequence | list[str] | None | None | A list of colors to sequentially apply to the series. The colors loop, so if there are more series than colors, colors will be reused. |
color_discrete_map | dict[str | tuple[str], str] | None | None | If dict, the keys should be strings of the column values (or a tuple of combinations of column values) which map to colors. |
boxmode | str | 'group' | Default 'group', which draws the boxes next to each other or 'overlay' which draws them on top of each other. |
log_x | bool | False | A boolean that specifies if the corresponding axis is a log axis or not. |
log_y | bool | False | A boolean that specifies if the corresponding axis is a log axis or not. |
range_x | list[int] | None | None | A list of two numbers that specify the range of the x-axis. |
range_y | list[int] | None | None | A list of two numbers that specify the range of the y-axis. |
points | bool | str | 'outliers' | Default 'outliers', which draws points outside the whiskers. 'suspectedoutliers' draws points below 4*Q1-3*Q3 and above 4*Q3-3*Q1. 'all' draws all points and False draws no points. |
notched | bool | False | If True boxes are drawn with notches |
title | str | None | None | The title of the chart |
template | str | None | None | The template for the chart. |
unsafe_update_figure | Callable | <function default_callback> | An update function that takes a plotly figure as an argument and optionally returns a plotly figure. If a figure is not returned, the plotly figure passed will be assumed to be the return value. Used to add any custom changes to the underlying plotly figure. Note that the existing data traces should not be removed. This may lead to unexpected behavior if traces are modified in a way that break data mappings. |