12. Geospatial Analysis#

1. GeoPandas — Overview#

  • Extends pandas to support spatial/geographic data
  • Handles: points, lines, polygons, spatial joins, buffers, distance calculations
  • ❌ pandas alone cannot do spatial operations
  • ❌ matplotlib alone cannot do interactive maps

2. Creating GeoDataFrame from CSV:#

import geopandas as gpd

gdf = gpd.GeoDataFrame(
    df,
    geometry=gpd.points_from_xy(df['longitude'], df['latitude']),
    crs='EPSG:4326'    # always set CRS ✅
)

3. gpd.points_from_xy():#

  • Creates Point geometries from longitude and latitude columns
  • Note: longitude first, latitude second (x, y order)
  • ❌ Common mistake: swapping lat/lon order

4. Reprojecting CRS:#

# ✅ Reproject to metric before distance calculation
gdf_metric = gdf.to_crs('EPSG:3857')

# Distance in meters now
gdf_metric.geometry.distance(city_center_point)

5. CRS — Coordinate Reference Systems#

Two Main Types:#

CRSCodeUnitUse For
WGS84 (Geographic)EPSG:4326DegreesStoring GPS coordinates
Web Mercator (Projected)EPSG:3857MetersDistance calculations ✅

The Distance Problem — Exam Scenario:#

Symptom: "Stores 1km apart show as 0.01 units apart"
Cause:   Using degree-based CRS (EPSG:4326)
Fix:     Reproject to metric CRS (EPSG:3857) ✅
  • ✅ Coordinates in degree-based CRS → reproject to metric for correct distances — exam answer
  • ❌ Data is corrupted → wrong diagnosis
  • ❌ GeoPandas doesn’t work with lat/lon → false
  • ❌ Multiply all distances by 100 → wrong fix

6. Distance Filtering:#

# ✅ Correct approach — exam answer (TDS Q19)
# Step 1: Create city center point
from shapely.geometry import Point
city_center = gpd.GeoDataFrame(
    geometry=[Point(77.2090, 28.6139)],  # lon, lat
    crs='EPSG:4326'
).to_crs('EPSG:3857')

# Step 2: Reproject stores to metric
gdf_metric = gdf.to_crs('EPSG:3857')

# Step 3: Calculate distances in meters
gdf_metric['distance_m'] = gdf_metric.geometry.distance(
    city_center.geometry[0]
)

# Step 4: Filter stores within 5km
within_5km = gdf_metric[gdf_metric['distance_m'] < 5000]
  • ✅ Create Point → calculate distances → filter where distance < 5000m — exam answer
  • ❌ Manually calculate latitude difference and compare to 5 → wrong
  • ❌ Sort by latitude and pick first 5 → wrong
  • ❌ Use pandas string matching on store names → wrong

7. Shapely — Geometric Objects#

What is Shapely?#

  • Creates and manipulates geometric objects
  • Works alongside GeoPandas for spatial calculations

Core Objects:#

from shapely.geometry import Point, LineString, Polygon

Point(77.2090, 28.6139)          # (longitude, latitude)
LineString([(0,0), (1,1), (2,0)]) # line
Polygon([(0,0),(1,0),(1,1),(0,1)])# polygon

Geometric Relationships:#

point.within(polygon)       # is point inside polygon?
polygon.contains(point)     # does polygon contain point?
geom1.intersects(geom2)    # do they intersect?
geom1.distance(geom2)      # distance between geometries
geom1.buffer(distance)     # create buffer zone

8. Point(lon, lat) — Shapely Coordinate Order#

  • Shapely uses (longitude, latitude) — x, y order
  • ❌ Common mistake: putting latitude first
  • Folium uses (latitude, longitude) — opposite order
  • Always double-check which library you’re using

9. GeoPandas Spatial Operations:#

gdf.area              # area of each geometry
gdf.length            # perimeter/length
gdf.centroid          # center point
gdf.buffer(500)       # buffer zone (500 meters if metric CRS)
gpd.sjoin(gdf1, gdf2) # spatial join

10. Folium — Interactive Maps#

What is Folium?#

  • Creates interactive maps rendered in web browser
  • Based on Leaflet.js
  • Output: HTML file viewable in browser
  • ✅ Best for: stakeholder-facing interactive visualization
  • ❌ NOT for data manipulation
  • ❌ NOT for numerical calculations
  • ❌ NOT for HTTP requests

11. Adding Elements to Folium Map:#

import folium

# Create base map
m = folium.Map(
    location=[28.6139, 77.2090],  # [lat, lon] — note order!
    zoom_start=12
)

# Add marker
folium.Marker(
    location=[28.6289, 77.2167],  # [lat, lon]
    popup='Downtown Store',
    tooltip='S001'
).add_to(m)

# Add circle (radius in meters)
folium.Circle(
    location=[28.6139, 77.2090],
    radius=5000,            # 5km
    color='red',
    fill=True,
    fill_opacity=0.2
).add_to(m)

m.save('map.html')

12. Colored Markers — Coverage Analysis:#

# ✅ Exam-relevant: different colors for within/outside zone
for _, store in gdf.iterrows():
    color = 'green' if store['within_5km'] else 'red'
    folium.Marker(
        location=[store['latitude'], store['longitude']],
        popup=store['store_name'],
        icon=folium.Icon(color=color)
    ).add_to(m)

13. Saving Folium Map:#

m.save('store_map.html')   # save as HTML file
# Open in any web browser — interactive ✅

14. Library Combination — Exam Answer#

LibraryRole
pandasLoad CSV, data manipulation
GeoPandasSpatial operations, distance, buffers
ShapelyGeometric object creation
FoliumInteractive map visualization
  • ✅ GeoPandas + Shapely + Folium = most comprehensive — exam answer (May_FN Q364)
  • ❌ Matplotlib + NumPy + Pandas → no spatial operations
  • ❌ Only pandas → no spatial distance
  • ❌ Only matplotlib → no interactive maps

15. Haversine Formula — Overview#

What is Haversine?#

  • Computes great-circle distance between two GPS points on Earth
  • Accounts for Earth’s spherical shape
  • Result in kilometers or miles
import math

def haversine(lat1, lon1, lat2, lon2):
    R = 6371  # Earth radius in km
    lat1, lon1, lat2, lon2 = map(math.radians, [lat1, lon1, lat2, lon2])
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = math.sin(dlat/2)**2 + math.cos(lat1)*math.cos(lat2)*math.sin(dlon/2)**2
    return R * 2 * math.asin(math.sqrt(a))

dist = haversine(28.6139, 77.2090, 28.7041, 77.1025)

16. Haversine — What It Does and Doesn’t Do#

  • ✅ Provides accurate great-circle distances for GPS coordinates — exam answer
  • ✅ Works directly with lat/lon
  • ✅ Does NOT require road network data
  • ❌ Does NOT calculate exact road distances
  • ❌ Does NOT calculate travel time
  • ❌ Does NOT determine elevation changes
  • ❌ Does NOT measure road surface quality

17. NetworkX — Route Optimization#

What is NetworkX?#

  • Python library for graph analysis and shortest paths
  • Models locations as nodes, roads as edges
import networkx as nx

G = nx.Graph()
G.add_edge('Hospital', 'CommunityA', weight=15.2)
G.add_edge('Hospital', 'CommunityB', weight=8.7)

# Shortest path
path = nx.shortest_path(G, 'Hospital', 'CommunityA', weight='weight')

18. OR-Tools — Vehicle Routing#

What is OR-Tools?#

  • Google’s library for complex optimization problems

  • Handles: vehicle routing, scheduling, multi-point optimization

  • Best for: optimizing routes for multiple vehicles/ambulances

  • ✅ NetworkX/OR-Tools → complex multi-point route optimization — exam answer

  • ❌ Microsoft Excel → only manual, simple calculations

  • ❌ Google Maps → visualization only, not programmable

  • ❌ QGIS → static mapping, not dynamic routing


Geospatial — Visualization Choice#

Which Visualization for Coverage Finding?#

Finding: 8 stores within 5km, 17 stores outside

✅ Interactive map with colored markers + 5km radius circle
   → Spatial context immediately clear
   → Green = within, Red = outside
   → Stakeholders can explore interactively

❌ Text file listing store IDs → no spatial context
❌ Pie chart of store revenue → wrong metric
❌ Table of raw coordinates → hard to interpret

Quick Reference#

Library Selection:
  Spatial operations + distance → GeoPandas ✅
  Geometric objects → Shapely ✅
  Interactive maps → Folium ✅
  Route optimization → NetworkX / OR-Tools ✅
  GPS distance only → Haversine formula ✅

CRS:
  EPSG:4326 → degrees (store GPS coords)
  EPSG:3857 → meters (distance calculation) ✅
  Symptom: tiny distances → wrong CRS → reproject ✅

Distance workflow:
  set_crs(4326) → to_crs(3857) → .distance() → filter ✅

Coordinate order:
  Shapely  → Point(lon, lat)  ← x,y order
  Folium   → [lat, lon]       ← opposite!

Haversine:
  ✅ great-circle distance from GPS coords
  ❌ NOT road distance, travel time, elevation

Coverage visualization:
  ✅ Colored markers + radius circle on interactive map