19. Visualization Libraries#

1. Seaborn — Overview#

What is Seaborn?#

  • Python library for EDA and statistical visualization
  • Built on top of Matplotlib
  • Higher-level, less code for better-looking plots
  • ✅ Primary use = exploratory data analysis and statistical visualization — exam answer (TDS_(1) Q5)
  • ❌ NOT for web development (Django/Flask)
  • ❌ NOT for numerical computing (NumPy)
  • ❌ NOT for HTTP requests (requests)

2. Seaborn — Key Plots:#

import seaborn as sns

sns.heatmap(df.corr(), annot=True)     # correlation heatmap ✅
sns.histplot(df['col'])                # histogram
sns.boxplot(x='cat', y='val', data=df) # box plot
sns.scatterplot(x='a', y='b', data=df) # scatter plot
sns.barplot(x='cat', y='val', data=df) # bar plot
sns.pairplot(df)                       # pairwise relationships
sns.violinplot(x='cat', y='val', data=df) # violin plot
sns.lineplot(x='date', y='val', data=df)  # line plot

3. Seaborn — When to Use Which Plot:#

PlotUse Case
heatmapCorrelation matrix visualization ✅
histplotDistribution of single variable
boxplotDistribution + outliers across groups
scatterplotRelationship between two variables
barplotCompare means across categories
pairplotAll pairwise relationships in dataset
violinplotDistribution shape across groups
lineplotTrends over time

4. Matplotlib — Overview#

What is Matplotlib?#

  • Low-level Python plotting library
  • Foundation for Seaborn and Pandas plotting
  • More control, more code required
  • Best for: custom plots, fine-tuned control
import matplotlib.pyplot as plt

plt.plot(x, y)              # line plot
plt.scatter(x, y)           # scatter plot
plt.bar(categories, values) # bar chart
plt.hist(data, bins=30)     # histogram
plt.pie(values)             # pie chart
plt.boxplot(data)           # box plot

plt.xlabel('X Label')
plt.ylabel('Y Label')
plt.title('Title')
plt.legend()
plt.show()
plt.savefig('plot.png', dpi=300)

5. Matplotlib — Subplots:#

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

axes[0,0].hist(data, bins=30)
axes[0,0].set_title('Histogram')

axes[0,1].boxplot(data)
axes[0,1].set_title('Box Plot')

axes[1,0].scatter(x, y)
axes[1,0].set_title('Scatter')

axes[1,1].plot(x, y)
axes[1,1].set_title('Line')

plt.tight_layout()
plt.show()

6. Plotly — Interactive Charts#

What is Plotly?#

  • Creates interactive charts in browser
  • Zoom, pan, hover tooltips built-in
  • Works in Jupyter notebooks and web apps
import plotly.express as px

px.line(df, x='date', y='sales', title='Sales Over Time')
px.bar(df, x='category', y='revenue')
px.scatter(df, x='advertising', y='sales', color='region')
px.histogram(df, x='age', nbins=30)
px.box(df, x='category', y='price')
px.choropleth(df, locations='country', color='value') # map

7. Folium — Interactive Maps#

  • Covered fully in Section 14 (Geospatial)
  • Creates interactive maps in browser
  • Best for: geographic data visualization
  • ✅ Use for store location maps, coverage analysis

8. Streamlit — Data Dashboards#

What is Streamlit?#

  • Python library for building data dashboards
  • No HTML/CSS/JavaScript needed
  • Used with Docker for deployment (exam-relevant)
import streamlit as st
import pandas as pd
import plotly.express as px

st.title('Sales Dashboard')

# Sidebar filter
region = st.sidebar.selectbox('Region', df['region'].unique())
filtered = df[df['region'] == region]

# Metrics
col1, col2 = st.columns(2)
col1.metric("Total Sales", f"${filtered['sales'].sum():,.0f}")
col2.metric("Orders", len(filtered))

# Chart
fig = px.line(filtered, x='date', y='sales')
st.plotly_chart(fig)

# Table
st.dataframe(filtered)

Visualization Library Comparison — Complete:#

LibraryTypeInteractiveBest For
SeabornStatisticalEDA, statistical plots ✅
MatplotlibGeneralCustom, fine-tuned plots
PlotlyGeneralInteractive charts
FoliumMapsGeographic visualization
StreamlitDashboardData apps, dashboards
BokehGeneralWeb-based interactive

Choosing Right Visualization — Exam Scenarios:#

"EDA and statistical visualization"
→ Seaborn ✅

"Interactive map with markers for stakeholders"
→ Folium ✅

"Deploy Python dashboard as web app"
→ Streamlit ✅

"Correlation matrix visualization"
→ sns.heatmap(df.corr()) ✅

"Coverage analysis: 8 stores within 5km"
→ Folium colored markers + radius circle ✅

"Statistical distribution + outliers"
→ sns.boxplot() ✅

"Relationship between two variables"
→ sns.scatterplot() ✅

"Trends over time"
→ px.line() or sns.lineplot() ✅

pandas Built-in Plotting:#

# Quick plots directly from DataFrame
df['col'].plot()                    # line plot
df['col'].plot(kind='hist')         # histogram
df['col'].plot(kind='bar')          # bar chart
df.plot(x='date', y='sales')        # line from DataFrame
df.boxplot(column='sales', by='region')  # box plot
df.hist(bins=30, figsize=(12,8))    # all columns histogram

Quick Reference#

Library Selection:
  EDA + statistical plots    → Seaborn ✅
  Custom/fine-tuned plots    → Matplotlib
  Interactive charts         → Plotly
  Geographic/map             → Folium ✅
  Data dashboards            → Streamlit

Seaborn key plots:
  sns.heatmap(df.corr())     → correlation ✅
  sns.boxplot()              → distribution + outliers
  sns.scatterplot()          → two variable relationship
  sns.pairplot()             → all pairwise relationships
  sns.histplot()             → single variable distribution

Exam answers:
  "EDA and statistical visualization" → Seaborn ✅
  "Interactive map for stakeholders"  → Folium ✅
  "Deploy Python dashboard"           → Streamlit ✅
  "Correlation heatmap"               → sns.heatmap(df.corr()) ✅