🎨 Seaborn Plotting Tutorial¶
1. What is Seaborn?¶
Seaborn is a data visualization library in Python that's built on top of Matplotlib. It provides a high-level interface for drawing attractive statistical graphics. It comes with several built-in themes and color palettes to make creating visually appealing plots simpler.
2. How is Seaborn Related to Matplotlib?¶
Built on Top: Seaborn is essentially a higher-level API based on Matplotlib. It leverages Matplotlib's capabilities, adding its own functionalities to make plotting easier and more aesthetically pleasing.
Customization: While Seaborn simplifies many tasks, if you need deeper customization or want to tweak details in your Seaborn plot, you can do so with Matplotlib commands since every Seaborn plot is a Matplotlib object.
Integration: Seaborn functions can accept data structures from Pandas (like DataFrames), which it then easily integrates into its plots. The combination of Matplotlib, Seaborn, and Pandas makes data visualization in Python both powerful and user-friendly.
Advantages of Seaborn Over Matplotlib:
Simpler Syntax: Seaborn generally requires less code for complex plots compared to Matplotlib.
Built-in Themes: Seaborn offers built-in themes that give your plots a polished look without requiring extra customization.
Advanced Plots: Some complex plots like pair plots, violin plots, and cluster maps, which require significant customization in Matplotlib, can be done in Seaborn with just a line or two of code.
3. Installing and Importing Seaborn¶
Before using Seaborn, you need to install it. This can be done via pip or poetry:
pip install seaborn
poetry add seaborn
Once installed, you can import it as follows:
import seaborn as sns
import matplotlib.pyplot as plt
4. Seaborn Plots and Their Uses¶
For this tutorial, we will use the tips dataset from seaborn. This dataset contains information about the tips given by customers in a restaurant. It has the following columns:
- total_bill: the total bill (cost of the meal), including tax, in US dollars
- tip: the tip (gratuity) in US dollars
- total_bill: the total bill (cost of the meal), including tax, in US dollars
- sex: the sex of the person giving the tip
- smoker: whether the customer was a smoker or not
- day: the day of the week
- time: whether the customer ate during lunch or dinner
- size: the size of the party
tips = sns.load_dataset("tips")
tips.head()
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
4.1 Lineplot¶
- Description: A line plot is used to display data points in a time sequence. It connects individual data points using line segments, making it ideal for visualizing data over a continuous interval or time period.
- Useful For: Observing trends over intervals or time. It's commonly used for time series data, but can be used for any sequential data points.
sns.lineplot(x="time", y="total_bill", hue="sex", data=tips)
plt.show()
Points to Note:
- When multiple lines are displayed (like using the
hue
parameter), Seaborn intelligently creates a legend to differentiate between the lines. - The
lineplot
function automatically sorts data by the x-axis. If you have time series data, ensure it's in the correct order before plotting. - Seaborn's
lineplot
can also work with Pandas DataFrames where the index contains time data, making it even more powerful for time series analysis.
4.2 Scatterplot¶
- Description: Uses dots to represent values for two numerical variables.
- Useful For: Observing relationships between two variables.
sns.scatterplot(tips, x="total_bill", y="tip", hue="day")
plt.show()
4.3 Distplot (Distribution Plot)¶
- Description: Displays the distribution of a univariate set of observations.
- Useful For: Examining the distribution of a variable.
sns.histplot(tips["total_bill"], kde=True)
plt.show()
4.4 Boxplot¶
- Description: Represents data in terms of the quartiles, displaying the distribution and skewness clearly.
- Useful For: Identifying outliers and understanding the distribution's spread and skewness.
sns.boxplot(x="day", y="total_bill", data=tips)
plt.show()
4.4 Violinplot¶
- Description: Combines aspects of boxplots and distplots, offering a deeper insight into the data distribution.
- Useful For: Visualizing the distribution and density of data.
sns.violinplot(tips, x="day", y="total_bill")
plt.show()
4.5 Heatmap¶
- Description: Represents data as colors in a matrix.
- Useful For: Visualizing correlations or for displaying data in a 2D format.
# use heatmap to show correlation
num_vars_df = tips[['total_bill', 'tip', 'size']]
sns.heatmap(num_vars_df.corr(), annot=True)
plt.show()
<Axes: >
4.6 Pairplot¶
- Description: Plots pairwise relationships across an entire dataframe.
- Useful For: Quickly visualizing relationships between multiple variables.
sns.pairplot(tips, hue="sex")
plt.show()
/home/vscode/.pyenv/versions/3.9.17/lib/python3.9/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
<seaborn.axisgrid.PairGrid at 0x7f2b65fb0160>
4.7 Barplot¶
- Description: Displays the average value of a numerical variable per category.
- Useful For: Comparing averages across different categories.
sns.barplot(x="day", y="total_bill", data=tips)
plt.show()
4.8 Countplot¶
- Description: Displays counts of categories.
- Useful For: Visualizing the number of occurrences in different categories.
sns.countplot(x="day", data=tips)
plt.show()
Conclusion
Seaborn provides an extensive set of tools for both simple and advanced data visualizations. The key to effective data visualization is to choose the plot that best represents the structure and nature of your data. Always consider the story you wish to tell or the questions you're trying to answer when selecting a plot type. Experiment, and as you become more familiar with Seaborn, you'll intuitively understand which plots to use in different scenarios.