Libraries
First, we need to load a few libraries:
- matplotlib: for creating the chart
- pandas: for data manipulation
Dataset
Our dataset is about the number of people having a certain name, and the evolution of this number over the years. Let's load it with pandas:
| 1880 | F | Helen | 636 | 0.006516 |
| 1880 | F | Amanda | 241 | 0.002469 |
| 1880 | F | Betty | 117 | 0.001199 |
| 1880 | F | Dorothy | 112 | 0.001147 |
| 1880 | F | Linda | 27 | 0.000277 |
Our goal here is to explore the evolution of the number of people named "Ashley" and "Amanda" over the years in a single scatter plot.
For this, we need to change a bit the dataset:
- filter on those 2 names
- filter on only date after 1970
- pivot the table to have the names as columns and the years as rows, using the pivot_table() function
| 4133.0 | 1164.0 |
| 4181.0 | 1176.0 |
| 5627.0 | 1253.0 |
| 7476.0 | 1626.0 |
| 12653.0 | 1988.0 |
Connected scatterplot for evolution
In practice, we just have to call the plot() function with our columns as arguments. We specify that markers have to be used with marker='o' and that the lines have to be connected with linestyle='-'.
Annotations
Our last chart has a major issue: we don't know which year is represented by each point.
To fix this, we can use the annotate() function to add a text next to each point. However, it's not necessary to add all the years, and decide to only plot 1 out of 3 years.
In practice, we loop over the number of rows with a step of 3 (thanks to range(0, len(df), 3)) and add an annotation with the annotate() function.
What a nice way to visualize the evolution of the number of people named Ashley and Amanda over the years!
Going further
This post explains how to create a connected scatterplot with matplotlib.
You might be interested in how to reproduce this beautiful connected scatter plot and how to create multiple connected scatter plots on the same chart.