← 返回首页
Mirror barplot with data points

Mirror barplot with data points

We will create a mirror barplot with individual observations using the Matplotlib library to visualize data about the Erasmus Program in European countries.
It was originally produced by Benjamin Nowak in R. Here we will reproduce one of its charts in Python.

About

This plot is a mirror barplot. It shows the number of outgoing and incoming students in different countries. Each bar is the average of the country, and the points the values for each year.

The chart was originally made with R. This post is a translation to Python by Joseph B..

Thanks to him for accepting sharing his work here!

Let's see what the final picture will look like:

Libraries

First, you need to install the following librairies:

  • matplotlib is used for creating the chart and add customization features
  • pandas is used to put the data into a dataframe and data manipulation
  • numpy is used for adding noise to the positions of each marker

And that's it!

# Libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt

Dataset

For this reproduction, we're going to retrieve the data directly from the gallery's Github repo. This means we just need to give the right url as an argument to pandas' read_csv() function to retrieve the data.

# URLs resume_url = 'https://raw.githubusercontent.com/holtzy/the-python-graph-gallery/master/static/data/resume.csv' erasmus_url = 'https://raw.githubusercontent.com/holtzy/the-python-graph-gallery/master/static/data/erasmus.csv' # load datasets resume = pd.read_csv(resume_url) data = pd.read_csv(erasmus_url)

Ordered barplot

First, let's create the barplots without so much customization.

In order to have the barplots side by side, we just have to specify than the second one will be in negative (-resume['mean_send'] when using the barh() function). Moreover, since we ordered before our dataset by the mean_send column, it will automatically be in the right order (decreasing).

The alpha argument defines the opacity of the bars (between 0 and 1).

# Create a figure and axis with a specific size fig, ax = plt.subplots(figsize=(6, 6)) # Create both barplots ax.barh(resume['country_name'], resume['mean_rec'], color='blue', alpha=0.3) ax.barh(resume['country_name'], -resume['mean_send'], color='darkorange', alpha=0.3) # Add a title ax.set_title('Number of Student', weight='bold') # Display the plot plt.show()

Remove spines and shift country names

In this step, we will remove the spines (border of the graph) and put the country names on each bar

  • remove label using ax.set_xticks([])
  • remove spines using ax.spines[['right', 'top', 'left', 'bottom']].set_visible(False)
  • add labels on the center using the text() function
# Create a figure and axis with a specific size fig, ax = plt.subplots(figsize=(6, 6)) # Create both barplots ax.barh(resume['country_name'], resume['mean_rec'], color='blue', alpha=0.3) ax.barh(resume['country_name'], -resume['mean_send'], color='darkorange', alpha=0.3) # Remove axis labels ax.set_xticks([]) ax.set_yticks([]) # Removes spines ax.spines[['right', 'top', 'left', 'bottom']].set_visible(False) # Put country names on the center of the chart for i, country_name in enumerate(resume['country_name']): ax.text(0, i, country_name, ha='center', va='center', fontsize=8, alpha=0.6) # Add a title ax.set_title('Number of Student', weight='bold', fontsize=9) # Display the plot plt.show()

Add individual points

The peculiarity of the points in this graph is linked to 2 things:

  • their position on the y-axis, which is different for each country
  • their opacity, which depends on the year concerned

In practice, we iterate over the rows of our data dataframe (thanks to iterrows()) that we haven't used until now, retrieving positions and opacity according to country name and year.

For visual purpose, we add a very small noise to the y_position so that the points are slightly burst.

# Create a figure and axis with a specific size fig, ax = plt.subplots(figsize=(6, 6)) # Create both barplots ax.barh(resume['country_name'], resume['mean_rec'], color='blue', alpha=0.3) ax.barh(resume['country_name'], -resume['mean_send'], color='darkorange', alpha=0.3) # Remove axis labels ax.set_xticks([]) ax.set_yticks([]) # Removes spines ax.spines[['right', 'top', 'left', 'bottom']].set_visible(False) # Put country names on the center of the chart for i, country_name in enumerate(resume['country_name']): ax.text(0, i, country_name, ha='center', va='center', fontsize=8, alpha=0.6) # Add each observations, for each year and country y_position = 0 for i, row in data.iterrows(): # Get values sending = -row['participants_x'] receiving = row['participants_y'] y_position = row['y_position'] years = row['academic_year'] # Change alpha parameter according to the year concerned year_alpha_mapping = {'2014-2015': 0.3, '2015-2016': 0.4, '2016-2017': 0.5, '2017-2018': 0.6, '2018-2019': 0.7, '2019-2020': 0.9} alpha = year_alpha_mapping[years]*0.6 # adjust as needed # Add small noise to the y_position y_position += np.random.normal(0, 0.1, 1) # Add ax.scatter(sending, y_position, c='darkorange', alpha=alpha, s=3) ax.scatter(receiving, y_position, c='darkblue', alpha=alpha, s=3) # Add a title ax.set_title('Number of Student', weight='bold', fontsize=9) # Display the plot plt.show()

Add annotations for final chart

All that's missing is a few annotations, but the hard part's over!

Thanks to the text() function, you can easily add annotations of different sizes and styles.

# Create a figure and axis with a specific size fig, ax = plt.subplots(figsize=(8, 6)) # Create both barplots ax.barh(resume['country_name'], resume['mean_rec'], color='blue', alpha=0.2) ax.barh(resume['country_name'], -resume['mean_send'], color='darkorange', alpha=0.2) # Remove axis labels ax.set_xticks([]) ax.set_yticks([]) # Removes spines ax.spines[['right', 'top', 'left', 'bottom']].set_visible(False) # Put country names on the center of the chart for i, country_name in enumerate(resume['country_name']): ax.text(0, i, country_name, ha='center', va='center', fontsize=8, alpha=0.6) # Add each observations, for each year and country y_position = 0 for i, row in data.iterrows(): # Get values sending = -row['participants_x'] receiving = row['participants_y'] y_position = row['y_position'] years = row['academic_year'] # Change alpha parameter according to the year concerned year_alpha_mapping = {'2014-2015': 0.3, '2015-2016': 0.4, '2016-2017': 0.5, '2017-2018': 0.6, '2018-2019': 0.7, '2019-2020': 0.9} alpha = year_alpha_mapping[years]*0.6 # Add small noise to the y_position y_position += np.random.normal(0, 0.2, 1) # Add ax.scatter(sending, y_position, c='darkorange', alpha=alpha, s=3) ax.scatter(receiving, y_position, c='darkblue', alpha=alpha, s=3) # Label of Outgoing and Incoming students ax.text(-6000, 24, 'Outgoing\nstudents', color='darkorange', ha='center', va='center', weight='bold') ax.text(6000, 24, 'Incoming\nstudents', color='darkblue', ha='center', va='center', weight='bold') # big title ax.text(-7000, 9, 'Students\nexchanges\nin Europe', ha='left', va='center', weight='bold', fontsize=14) # description text = '''Country ranking based on a sample Erasmus programs. Bars show the annual average for the period, points show the values for each year.''' ax.text(-7000, 4.5, text, ha='left', va='center', fontsize=7) # credits text = '''Data: Data.Europa | Plot: @BjnNowak''' ax.text(-7000, 1, text, ha='left', va='center', fontsize=6) # Academic year legend ax.text(x=4200, y=11, s='Academic Year', fontsize=7, weight='bold') y_position = 10 # start at the 10th bar for year, alpha in year_alpha_mapping.items(): # Add the point ax.scatter(4000, y_position, alpha=alpha, s=5, c='black') ax.text(x=4200, y=y_position-0.2, s=year, fontsize=7) y_position -= 1 # decrease of one bar for the next iteration # Add a title at the top ax.set_title('Number of Student', weight='bold', fontsize=9) # Display the plot plt.show()

Going further

This article explains how to reproduce a mirror barplot with annotations, individual observations, custom style and nice features.

For more examples of advanced customization in barplot, check out this circular barplot. Also, you might be interested in creating advanced annotations.

Barplot

Spider / Radar

Wordcloud

Parallel

Lollipop

Circular Barplot

Table

🚨 Grab the Data To Viz poster!


Do you know all the chart types? Do you know which one you should pick? I made a decision tree that answers those questions. You can download it for free!

    Get Poster