Data Visualization with Python: Using Matplotlib and Seaborn

Data Visualization with Python: Using Matplotlib and Seaborn

posted 12 min read

Effective data analysis and science depend on data visualization. It aids in the identification of patterns, developments, and linkages in data, enabling well-informed choices. Python packages like seaborn and matplotlib facilitate data visualization.

This article teaches you matplotlib and seaborn while highlighting the significance of data visualization. You'll discover how to extract insightful information, produce illuminating visualizations, and effectively convey data-driven conclusions.

Getting Started with matplotlib

Matplotlib is a well-liked Python package for making static, animated, and interactive visualizations.

Installing with pip

Use pip to install matplotlib by running the following command.

pip install matplotlib

After installation, use this command to view the Pandas version:

import matplotlib
print(matplotlib.__version__)
Tip: Use virtual environments to keep your matplotlib installs apart and your dependencies organised across many projects.

1. Basic Plotting:

Matplotlib offers various plot types, including: Line plots: plot(), Scatter plots: scatter(), Bar plots: bar().

A. Line plot:

import matplotlib.pyplot as plt
x = [1, 2, 3]
y = [2, 4, 6]
plt.plot(x, y)
plt.show()

B: Scatter Plot:

import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.plot(x, y, color='blue', linestyle='-', marker='o', label='Line Plot')
plt.scatter(x, y, color='red', marker='o', label='Scatter Plot')
plt.show()

2. Customizing Plot Appearance:

Customize your plots with: Labels: xlabel(), ylabel(), Titles: title(), Colors: color, Styles: style.

import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.plot(x, y)
plt.xlabel('X-axis', fontsize=12, fontweight='bold', color='blue')
plt.ylabel('Y-axis', fontsize=12, fontweight='bold', color='blue')
plt.xticks(fontsize=10, rotation=45)
plt.yticks(fontsize=10)
plt.legend(['Data'], loc='upper left', fontsize=10)
plt.text(3, 6, 'Important point', fontsize=10, color='red')
plt.show()

Exploring Advanced Features of matplotlib

We'll explore the essential procedures for using matplotlib's pyplot interface to create simple plots. Here's a thorough description and an example of the code:

Multiple Axes and Subplots

You can make complicated figures with numerous subplots and axes within a single figure using matplotlib's "Multiple Axes and Subplots" functionality.

plt.subplots(): This function is used to produce a figure with several subplots grouped in a grid format. It yields a multidimensional array of axis objects and a figure object. Axes objects are used to represent each subplot and can be used to customize or plot data for each subplot.

axes.plot(): After using plt.subplots() to generate subplots, you may plot data on particular axes by using the axes object's plot() method. With this approach, you may individually alter each subplot's appearance and attributes, including line styles, colours, markers, and labels.

import matplotlib.pyplot as plt
import numpy as np
fig, axs = plt.subplots(2, 2)
x = np.linspace(0, 2*np.pi, 400)
axs[0, 0].plot(x, np.sin(x))
axs[0, 1].plot(x, np.cos(x))
axs[1, 0].plot(x, np.sin(x**2))
axs[1, 1].plot(x, np.cos(x**2))
plt.show()

Saving Plots:

import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.plot(x, y)
# Saving the plot as a PDF file
plt.savefig('plot.pdf')
plt.savefig('plot.svg')
plt.show()

Introduction to seaborn

Based on matplotlib, seaborn is a Python data visualization package that offers a high-level interface for creating visually appealing and educational statistical visuals. It makes it simple to produce educational and captivating visualizations by concentrating on statistical analysis visualization.

Installing with pip

Run the following command to install seaborn using pip.

pip install seaborn

After installation, use this command to view the Pandas version:

import seaborn
print(seaborn.__version__)

Advantages of seaborn over matplotlib

  • High-level interface: To create intricate visualizations, seaborn offers an easier-to-use API.
  • Statistical graphics: Seaborn is perfect for data exploration and visualization because it was created especially for statistical analysis.
  • Aesthetics: Seaborn's default colour schemes and style selections produce plots that are aesthetically pleasing and appear well-designed.
  • Integration with Pandas: Seaborn's smooth integration with Pandas DataFrames facilitates data processing and visualization.
Note: Python's intuitive statistical visuals might help you streamline your data visualization process.

Exploring seaborn's Capabilities

A variety of features are available from seaborn to improve your visualizations, such as:

  1. Standard themes for a polished appearance
  2. colour schemes for dependable and eye-catching hues
  3. Exploring categorical data with categorical plots
  4. Relationship charts to illustrate the connections between factors

1. Using Seaborn's Default Themes

Seaborn comes with several default themes.

import matplotlib.pyplot as plt
import seaborn as sns
# Set the default theme
sns.set_theme()
sns.stripplot(x=[1, 2, 3, 4])
plt.title("Default Theme")
plt.show()

2. Using Colour Palettes

Seaborn provides several colour palettes.

sns.color_palette("husl", 8)  # Use the husl palette with 8 colors
sns.color_palette("RdBu", 8)  # Use the RdBu palette with 8 colors

3. Categorical Plots

The sns.catplot() function from seaborn produces eye-catching and educational category graphs, such as: Swarm plots, Box plots, Violin plots. It makes it possible to specify the x and y data axes using, correspondingly, arguments like x and y. The kind argument determines the sort of categorical plot (e.g., "strip" or "swarm"), while the data parameter makes dataset provision easier. 

Swarm Plot:

import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
sns.catplot(x="sex", y="total_bill", data=tips, kind="swarm")
plt.show()

Box Plot:

import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
sns.catplot(x="sex", y="total_bill", data=tips, kind="box")
plt.show()

Violin Plot:

import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
sns.catplot(x="sex", y="total_bill", data=tips, kind="violin")
plt.show()

Relationship Plot:

Seaborn's sns.relplot() function visualizes relationships between variables. The dataset provisioning process also makes use of the data parameter. Data subsets are distinguished using parameters like hue, col, and row. Plot types like "scatter" and "line" are specified by the kind option, which makes it easier to visualize variable relationships.

Scatter Plot:

sns.relplot(x="x", y="y", data=df)

Line Plot:

sns.relplot(x="x", y="y", data=df , kind="line")

Advanced Data Visualization Techniques

1. Pair Plots

Pair plots are a great way to visualize relationships between multiple variables. Seaborn's sns.pairplot() function creates a matrix of pairwise plots, showing the relationship between each pair of variables.

import seaborn as sns
import matplotlib.pyplot as plt
# Load the iris dataset
iris = sns.load_dataset("iris")
# Create a pair plot
sns.pairplot(iris)
# Show the plot
plt.show()

2. Facet Grids

For more effective comparison, faceted grids are a fantastic tool for creating multi-plot grids. Seaborn's social network. A grid of plots is produced using the FacetGrid() function, each of which displays a distinct portion of the data.

import seaborn as sns
import matplotlib.pyplot as plt
# Load the tips dataset
tips = sns.load_dataset("tips")
# Create a facet grid
g = sns.FacetGrid(tips, col="sex", row="smoker")
# Map a plot to each subset
g.map(sns.histplot, "total_bill", alpha=0.7)
# Show the plot
plt.show()

3. Heatmaps

A useful tool for visualizing correlation matrices is a heatmap. A heatmap can be produced from a dataset or correlation matrix using seaborn's sns.heatmap() method.

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Load the flights dataset
flights = sns.load_dataset("flights")
# Select only numerical columns
flights_num = flights.select_dtypes(include=[np.number])
# Create a heatmap of the correlation matrix
sns.heatmap(flights_num.corr(), annot=True, cmap="YlGnBu")
# Show the plot
plt.show()

Real-World Examples

Let's analyse and display some data from a CSV file that contains employee details.

import pandas as pd
# Read the CSV file into a DataFrame
data = pd.read_csv('data.csv')
# Display the first few rows of the DataFrame
print(data.head())
# Print information about the DataFrame
print(data.info())
# Display descriptive statistics of the DataFrame
print(data.describe())
# Plot a histogram of the 'age' column
plt.hist(data['age'])
# Plot a count of employees per department
sns.countplot(x='department', data=data)
# Plot a scatterplot of age vs salary
sns.scatterplot(x='age', y='salary', data=data)
# Calculate the average salary per department
dept_avg_salary = data.groupby('department')['salary'].mean()
# Select employees with salaries greater than $100,000
high_salary_employees = data[data['salary'] > 100000]

This example provides a basic demonstration, illustrating the use of seaborn and matplotlib for visualizing real-world datasets. There are a plethora of additional ways to study and visualize your data based on the particular questions you wish to address!

Additional visualization libraries

1. Plotly

Plotly is a flexible Python module that works well for interactive plotting, easily meeting the needs of web dashboards and data exploration. It is a well-liked solution for many applications due to its wide range of chart options, which support both static and interactive displays.

import plotly.express as px
import pandas as pd
df = pd.DataFrame({
    "x": [1, 2, 3, 4, 5],
    "y": [2, 4, 6, 8, 10]
})
fig = px.line(df, x="x", y="y", title="Plotly Line Chart")
fig.show()

2. Bokeh

Bokeh is a Python interactive visualization framework that works well for web-based plots and can be embedded into programs like Flask or Django. It includes capabilities for hovering, panning, and zooming.

from bokeh.plotting import figure, output_notebook, show
# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
# Plotting with Bokeh
output_notebook()
p = figure(title="Bokeh Line Plot")
p.line(x, y)
show(p)
FAQ
Q: What are data visualization techniques?
A: Data visualization techniques include charts (line, bar, or pie), plots (bubble or scatter), diagrams, maps (heat maps, geographic maps, etc.), and matrices.

Conclusion

In this article, we've covered the principles of matplotlib and seaborn data visualization. We have looked at: Configuring and personalizing plots; displaying several data formats (numerical, categorical, and time series); developing interactive plots; and using data visualization on actual datasets. 

Understanding and sharing insights is made easier with the help of data visualization. Your data may have hidden trends, patterns, and correlations that you can find by using matplotlib and seaborn. Don't be scared to experiment, investigate, and visualize; you might be surprised by the insights you find!

Reference

For more reference on matplotlib and seaborn:

Matplotlib Official Wensite:
Matplotlib Official Website

Seaborn Official Website:
Seaborn: statistical data visualization

If you read this far, tweet to the author to show them you care. Tweet a Thanks

More Posts

Mastering Data Visualization with Matplotlib in Python

Muzzamil Abbas - Apr 18

Numpy Tuitorial: Data Analysis for Data Science

Honey - Mar 24

Pandas in Python: A Comprehensive Guide

Muzzamil Abbas - Apr 9

NumPy in Python: An Advanced Guide

Muzzamil Abbas - Mar 13

NameError: name 'pd' is not defined in Python [Solved]

muhammaduzairrazaq - Feb 29
chevron_left