Violin Plots in Matplotlib

Video Tutorial
FREE
Violin Plots thumbnail
This video belongs to
EDA and Data Visualization Course in Data Science
7 modules
Certificate
Topics Covered

Overview

Violin plot basic image Have you ever wondered, What is the best way to visualize a data set such that the interquartile range, median, and probability density are captured at once?

Violin plots make this task easy by summing up these things into a single visualization. The violin plot is a superset of a box plot combined with histogram data.

A violin plot represents all data that a box plot has: a marker showing the median, a bar representing the interquartile range, and all the data points of a given set.

What are Violin Plots in Matplotlib?

Matplotlib is one of the most popular python libraries used for data visualization. The popularity of matplotlib is because of its simple syntax and ease of programming. Matplotlib can create simple as well as complex graphs with a minimal amount of code.

Violin plots are used to visualize a numeric data set. They can be considered to have kernel density data combined over a box plot.

Consider the following violin plot

Violin plot image

One can easily notice that the violin plot has all the data a box plot contains. It has both the upper and lower adjacent values, median, and mean of the data represented. In addition, the kernel density is also depicted in the figure using the blue bubble around the horizontal line.

The kernel density bubble tells the density of data at a point in Y-Axis. If the bubble is thicker at some value of Y-Axis, the density is higher at that point. For a narrow width or no width of the bubble, the density is very less or zero, respectively.

What does a Violin Plot Signify?

A violin plot is a combination of a box plot along with a histogram. A histogram shows the density distribution of data. A box plot shows the mean, upper adjacent value, lower adjacent value, and interquartile data. As seen in the figure below, the violin plot combines all these representations in one figure.

Violin plot image with bar

Here we can see

  1. The Median is represented by the white dot
  2. First Quartile is represented by the lower end of the bold bar
  3. The Third Quartile is represented by the upper end of the bold bar
  4. Lower Adjacent and Upper Adjacent Values are represented by the lower and upper end of the thin line
  5. The tubular bubble represents the density distribution of the data points

Syntax of a Matplotlib Violin Plot

The violin plot in matplot lib is plotted using violinplot() function. The function is available under matplotlib.pyplot.violinplot() package.

The violinplot() function has the following syntax.

Parameters of a Matplotlib Violin Plot

The syntax above highlights that the violinplot() function takes only one mandatory parameter, which is a dataset.

The dataset is either a data frame containing columns or a sequence of vectors. One violin will be plotted for each column(or each vector) in the dataset.

The optional parameters of violinplot() are:

ParameterFunction
positionsIt is an array-like data. Its default value is [1, 2, ..., n]
The items in the array denote the positions of the violins to be plotted. The values of positions are used as a reference to determine the limits and ticks of the violins.
vertIt takes a boolean value. Its default value is True
If this is set to False, it creates a horizontal violin plot.
widthsIt is an array-like data. Its default value is 0.5
widths can be a vector or scalar value that sets the maximum width of each violin. The default value of 0.5 means each violin will use half the width of available horizontal space.
showmeansIt takes a boolean value. Its default value is False
When it is True the plot will contain a line representing the means.
showextremaIt takes a boolean value. Its default value is True
When it is True the plot will contain a line representing the extrema.
showmediansIt takes a boolean value. Its default value is False
When it is True the plot will contain a line representing the medians.
quantilesIt is an array-like data. Its default value is None
It takes a list of floats values ranging between [0, 1]. The list should contain one float value for each violin. Each float value will determine the number of quantiles that will be rendered for the particular violin.
pointsIt is an integer with a default value of 100
This value sets the total number of points to be considered for the Gaussian kernel density estimation.
bw_methodIt takes a string, whose values can be either scalar or callable
The method used to calculate the estimator bandwidth. This can be scott, 'silverman', a scalar constant, or a callable. If a scalar, this will be used directly as kde.factor. If a callable, it should take a GaussianKDE instance as its only parameter and return a scalar. If None (default), scott is used.
dataIt can have a value as an indexable object
If the indexable object is provided then the values are interpreted as data. Therefore it can also accept a string as input.

Examples of Violin Plot in Matplotlib

  1. We generate two violins using random numeric data generated by NumPy. The data is the plot using the matplotlib violinplot() function.

Output: Violin plot from random data

  1. We plot a violin plot to compare two data sets, one with normal data distribution(random values) and another with uniform data distribution(consecutive values)

Output: Violin plot comparing uniform and normal data sets

Customizing Violin Plots in Matplotlib

Following are a few ways to add customization to Violin Plots in Matplotlib.

Adding X and Y Ticks

Graphs are more comfortable to interpret when one knows what kind of data each graph is representing. To add categorical information to the violin plot, labels on the x-axis are beneficial.

The labels on x-axis can be added using set_xticks() and set_xticklabels() functions.

Output:

Violin plot with xticks

Plotting Horizontal Violin Plot in Matplotlib

To make the violin plot horizontal we need to set the parameter vert in violinplot() function to False. This will result in the violins being drawn horizontally.

Output: Horizontal Violin plots

Showing Dataset Means in Violin Plots

We know that medians are an inherent property in a violin plot, but the matplotlib provides an additional option to calculate and represent the mean of the dataset. To show the mean on the violin plot we must set the parameter showmeans to True.

Output: Violin plots with mean

Customizing Kernel Density Estimation for Violin Plots

While generating the violin plot, the matplotlib library uses 100 points to calculate the Kernel Density Estimations. We can change the number of points by changing the value of thepoints parameter in violinplot() function.

Output:

Violin plots with custom kernel density estimation size

Note: that a lesser value of the points will result in the distribution density representation being less accurate.

Conclusion

In this article, we have understood:

  • What are violin plots in matplotlib
  • The syntax and parameters of the violinplot() function in matplotlib
  • Violin plots are more informative than box plots as they show kernel density distribution of data
  • How to generate a violin plot from random NumPy data in matplotlib
  • The customizations provided by violinplot() function
    • Adding labels to each violin on the x-axis
    • How to draw a horizontal violin plot
    • Representing Mean value in a violin plot
    • Customizing the number of points considered in kernel density estimation