19. Visualization Libraries#
1. Seaborn — Overview#
What is Seaborn?#
- Python library for EDA and statistical visualization
- Built on top of Matplotlib
- Higher-level, less code for better-looking plots
- ✅ Primary use = exploratory data analysis and statistical visualization — exam answer (TDS_(1) Q5)
- ❌ NOT for web development (Django/Flask)
- ❌ NOT for numerical computing (NumPy)
- ❌ NOT for HTTP requests (requests)
2. Seaborn — Key Plots:#
import seaborn as sns
sns.heatmap(df.corr(), annot=True) # correlation heatmap ✅
sns.histplot(df['col']) # histogram
sns.boxplot(x='cat', y='val', data=df) # box plot
sns.scatterplot(x='a', y='b', data=df) # scatter plot
sns.barplot(x='cat', y='val', data=df) # bar plot
sns.pairplot(df) # pairwise relationships
sns.violinplot(x='cat', y='val', data=df) # violin plot
sns.lineplot(x='date', y='val', data=df) # line plot
3. Seaborn — When to Use Which Plot:#
| Plot | Use Case |
|---|
heatmap | Correlation matrix visualization ✅ |
histplot | Distribution of single variable |
boxplot | Distribution + outliers across groups |
scatterplot | Relationship between two variables |
barplot | Compare means across categories |
pairplot | All pairwise relationships in dataset |
violinplot | Distribution shape across groups |
lineplot | Trends over time |
4. Matplotlib — Overview#
What is Matplotlib?#
- Low-level Python plotting library
- Foundation for Seaborn and Pandas plotting
- More control, more code required
- Best for: custom plots, fine-tuned control
import matplotlib.pyplot as plt
plt.plot(x, y) # line plot
plt.scatter(x, y) # scatter plot
plt.bar(categories, values) # bar chart
plt.hist(data, bins=30) # histogram
plt.pie(values) # pie chart
plt.boxplot(data) # box plot
plt.xlabel('X Label')
plt.ylabel('Y Label')
plt.title('Title')
plt.legend()
plt.show()
plt.savefig('plot.png', dpi=300)
5. Matplotlib — Subplots:#
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes[0,0].hist(data, bins=30)
axes[0,0].set_title('Histogram')
axes[0,1].boxplot(data)
axes[0,1].set_title('Box Plot')
axes[1,0].scatter(x, y)
axes[1,0].set_title('Scatter')
axes[1,1].plot(x, y)
axes[1,1].set_title('Line')
plt.tight_layout()
plt.show()
6. Plotly — Interactive Charts#
What is Plotly?#
- Creates interactive charts in browser
- Zoom, pan, hover tooltips built-in
- Works in Jupyter notebooks and web apps
import plotly.express as px
px.line(df, x='date', y='sales', title='Sales Over Time')
px.bar(df, x='category', y='revenue')
px.scatter(df, x='advertising', y='sales', color='region')
px.histogram(df, x='age', nbins=30)
px.box(df, x='category', y='price')
px.choropleth(df, locations='country', color='value') # map
7. Folium — Interactive Maps#
- Covered fully in Section 14 (Geospatial)
- Creates interactive maps in browser
- Best for: geographic data visualization
- ✅ Use for store location maps, coverage analysis
8. Streamlit — Data Dashboards#
What is Streamlit?#
- Python library for building data dashboards
- No HTML/CSS/JavaScript needed
- Used with Docker for deployment (exam-relevant)
import streamlit as st
import pandas as pd
import plotly.express as px
st.title('Sales Dashboard')
# Sidebar filter
region = st.sidebar.selectbox('Region', df['region'].unique())
filtered = df[df['region'] == region]
# Metrics
col1, col2 = st.columns(2)
col1.metric("Total Sales", f"${filtered['sales'].sum():,.0f}")
col2.metric("Orders", len(filtered))
# Chart
fig = px.line(filtered, x='date', y='sales')
st.plotly_chart(fig)
# Table
st.dataframe(filtered)
Visualization Library Comparison — Complete:#
| Library | Type | Interactive | Best For |
|---|
| Seaborn | Statistical | ❌ | EDA, statistical plots ✅ |
| Matplotlib | General | ❌ | Custom, fine-tuned plots |
| Plotly | General | ✅ | Interactive charts |
| Folium | Maps | ✅ | Geographic visualization |
| Streamlit | Dashboard | ✅ | Data apps, dashboards |
| Bokeh | General | ✅ | Web-based interactive |
Choosing Right Visualization — Exam Scenarios:#
"EDA and statistical visualization"
→ Seaborn ✅
"Interactive map with markers for stakeholders"
→ Folium ✅
"Deploy Python dashboard as web app"
→ Streamlit ✅
"Correlation matrix visualization"
→ sns.heatmap(df.corr()) ✅
"Coverage analysis: 8 stores within 5km"
→ Folium colored markers + radius circle ✅
"Statistical distribution + outliers"
→ sns.boxplot() ✅
"Relationship between two variables"
→ sns.scatterplot() ✅
"Trends over time"
→ px.line() or sns.lineplot() ✅
pandas Built-in Plotting:#
# Quick plots directly from DataFrame
df['col'].plot() # line plot
df['col'].plot(kind='hist') # histogram
df['col'].plot(kind='bar') # bar chart
df.plot(x='date', y='sales') # line from DataFrame
df.boxplot(column='sales', by='region') # box plot
df.hist(bins=30, figsize=(12,8)) # all columns histogram
Quick Reference#
Library Selection:
EDA + statistical plots → Seaborn ✅
Custom/fine-tuned plots → Matplotlib
Interactive charts → Plotly
Geographic/map → Folium ✅
Data dashboards → Streamlit
Seaborn key plots:
sns.heatmap(df.corr()) → correlation ✅
sns.boxplot() → distribution + outliers
sns.scatterplot() → two variable relationship
sns.pairplot() → all pairwise relationships
sns.histplot() → single variable distribution
Exam answers:
"EDA and statistical visualization" → Seaborn ✅
"Interactive map for stakeholders" → Folium ✅
"Deploy Python dashboard" → Streamlit ✅
"Correlation heatmap" → sns.heatmap(df.corr()) ✅