12. Geospatial Analysis#
1. GeoPandas — Overview#
- Extends pandas to support spatial/geographic data
- Handles: points, lines, polygons, spatial joins, buffers, distance calculations
- ❌ pandas alone cannot do spatial operations
- ❌ matplotlib alone cannot do interactive maps
2. Creating GeoDataFrame from CSV:#
import geopandas as gpd
gdf = gpd.GeoDataFrame(
df,
geometry=gpd.points_from_xy(df['longitude'], df['latitude']),
crs='EPSG:4326' # always set CRS ✅
)3. gpd.points_from_xy():#
- Creates Point geometries from longitude and latitude columns
- Note: longitude first, latitude second (x, y order)
- ❌ Common mistake: swapping lat/lon order
4. Reprojecting CRS:#
# ✅ Reproject to metric before distance calculation
gdf_metric = gdf.to_crs('EPSG:3857')
# Distance in meters now
gdf_metric.geometry.distance(city_center_point)5. CRS — Coordinate Reference Systems#
Two Main Types:#
| CRS | Code | Unit | Use For |
|---|---|---|---|
| WGS84 (Geographic) | EPSG:4326 | Degrees | Storing GPS coordinates |
| Web Mercator (Projected) | EPSG:3857 | Meters | Distance calculations ✅ |
The Distance Problem — Exam Scenario:#
Symptom: "Stores 1km apart show as 0.01 units apart"
Cause: Using degree-based CRS (EPSG:4326)
Fix: Reproject to metric CRS (EPSG:3857) ✅- ✅ Coordinates in degree-based CRS → reproject to metric for correct distances — exam answer
- ❌ Data is corrupted → wrong diagnosis
- ❌ GeoPandas doesn’t work with lat/lon → false
- ❌ Multiply all distances by 100 → wrong fix
6. Distance Filtering:#
# ✅ Correct approach — exam answer (TDS Q19)
# Step 1: Create city center point
from shapely.geometry import Point
city_center = gpd.GeoDataFrame(
geometry=[Point(77.2090, 28.6139)], # lon, lat
crs='EPSG:4326'
).to_crs('EPSG:3857')
# Step 2: Reproject stores to metric
gdf_metric = gdf.to_crs('EPSG:3857')
# Step 3: Calculate distances in meters
gdf_metric['distance_m'] = gdf_metric.geometry.distance(
city_center.geometry[0]
)
# Step 4: Filter stores within 5km
within_5km = gdf_metric[gdf_metric['distance_m'] < 5000]- ✅ Create Point → calculate distances → filter where distance < 5000m — exam answer
- ❌ Manually calculate latitude difference and compare to 5 → wrong
- ❌ Sort by latitude and pick first 5 → wrong
- ❌ Use pandas string matching on store names → wrong
7. Shapely — Geometric Objects#
What is Shapely?#
- Creates and manipulates geometric objects
- Works alongside GeoPandas for spatial calculations
Core Objects:#
from shapely.geometry import Point, LineString, Polygon
Point(77.2090, 28.6139) # (longitude, latitude)
LineString([(0,0), (1,1), (2,0)]) # line
Polygon([(0,0),(1,0),(1,1),(0,1)])# polygonGeometric Relationships:#
point.within(polygon) # is point inside polygon?
polygon.contains(point) # does polygon contain point?
geom1.intersects(geom2) # do they intersect?
geom1.distance(geom2) # distance between geometries
geom1.buffer(distance) # create buffer zone8. Point(lon, lat) — Shapely Coordinate Order#
- Shapely uses (longitude, latitude) — x, y order
- ❌ Common mistake: putting latitude first
- Folium uses (latitude, longitude) — opposite order
- Always double-check which library you’re using
9. GeoPandas Spatial Operations:#
gdf.area # area of each geometry
gdf.length # perimeter/length
gdf.centroid # center point
gdf.buffer(500) # buffer zone (500 meters if metric CRS)
gpd.sjoin(gdf1, gdf2) # spatial join10. Folium — Interactive Maps#
What is Folium?#
- Creates interactive maps rendered in web browser
- Based on Leaflet.js
- Output: HTML file viewable in browser
- ✅ Best for: stakeholder-facing interactive visualization
- ❌ NOT for data manipulation
- ❌ NOT for numerical calculations
- ❌ NOT for HTTP requests
11. Adding Elements to Folium Map:#
import folium
# Create base map
m = folium.Map(
location=[28.6139, 77.2090], # [lat, lon] — note order!
zoom_start=12
)
# Add marker
folium.Marker(
location=[28.6289, 77.2167], # [lat, lon]
popup='Downtown Store',
tooltip='S001'
).add_to(m)
# Add circle (radius in meters)
folium.Circle(
location=[28.6139, 77.2090],
radius=5000, # 5km
color='red',
fill=True,
fill_opacity=0.2
).add_to(m)
m.save('map.html')12. Colored Markers — Coverage Analysis:#
# ✅ Exam-relevant: different colors for within/outside zone
for _, store in gdf.iterrows():
color = 'green' if store['within_5km'] else 'red'
folium.Marker(
location=[store['latitude'], store['longitude']],
popup=store['store_name'],
icon=folium.Icon(color=color)
).add_to(m)13. Saving Folium Map:#
m.save('store_map.html') # save as HTML file
# Open in any web browser — interactive ✅14. Library Combination — Exam Answer#
| Library | Role |
|---|---|
| pandas | Load CSV, data manipulation |
| GeoPandas | Spatial operations, distance, buffers |
| Shapely | Geometric object creation |
| Folium | Interactive map visualization |
- ✅ GeoPandas + Shapely + Folium = most comprehensive — exam answer (May_FN Q364)
- ❌ Matplotlib + NumPy + Pandas → no spatial operations
- ❌ Only pandas → no spatial distance
- ❌ Only matplotlib → no interactive maps
15. Haversine Formula — Overview#
What is Haversine?#
- Computes great-circle distance between two GPS points on Earth
- Accounts for Earth’s spherical shape
- Result in kilometers or miles
import math
def haversine(lat1, lon1, lat2, lon2):
R = 6371 # Earth radius in km
lat1, lon1, lat2, lon2 = map(math.radians, [lat1, lon1, lat2, lon2])
dlat = lat2 - lat1
dlon = lon2 - lon1
a = math.sin(dlat/2)**2 + math.cos(lat1)*math.cos(lat2)*math.sin(dlon/2)**2
return R * 2 * math.asin(math.sqrt(a))
dist = haversine(28.6139, 77.2090, 28.7041, 77.1025)16. Haversine — What It Does and Doesn’t Do#
- ✅ Provides accurate great-circle distances for GPS coordinates — exam answer
- ✅ Works directly with lat/lon
- ✅ Does NOT require road network data
- ❌ Does NOT calculate exact road distances
- ❌ Does NOT calculate travel time
- ❌ Does NOT determine elevation changes
- ❌ Does NOT measure road surface quality
17. NetworkX — Route Optimization#
What is NetworkX?#
- Python library for graph analysis and shortest paths
- Models locations as nodes, roads as edges
import networkx as nx
G = nx.Graph()
G.add_edge('Hospital', 'CommunityA', weight=15.2)
G.add_edge('Hospital', 'CommunityB', weight=8.7)
# Shortest path
path = nx.shortest_path(G, 'Hospital', 'CommunityA', weight='weight')18. OR-Tools — Vehicle Routing#
What is OR-Tools?#
Google’s library for complex optimization problems
Handles: vehicle routing, scheduling, multi-point optimization
Best for: optimizing routes for multiple vehicles/ambulances
✅ NetworkX/OR-Tools → complex multi-point route optimization — exam answer
❌ Microsoft Excel → only manual, simple calculations
❌ Google Maps → visualization only, not programmable
❌ QGIS → static mapping, not dynamic routing
Geospatial — Visualization Choice#
Which Visualization for Coverage Finding?#
Finding: 8 stores within 5km, 17 stores outside
✅ Interactive map with colored markers + 5km radius circle
→ Spatial context immediately clear
→ Green = within, Red = outside
→ Stakeholders can explore interactively
❌ Text file listing store IDs → no spatial context
❌ Pie chart of store revenue → wrong metric
❌ Table of raw coordinates → hard to interpretQuick Reference#
Library Selection:
Spatial operations + distance → GeoPandas ✅
Geometric objects → Shapely ✅
Interactive maps → Folium ✅
Route optimization → NetworkX / OR-Tools ✅
GPS distance only → Haversine formula ✅
CRS:
EPSG:4326 → degrees (store GPS coords)
EPSG:3857 → meters (distance calculation) ✅
Symptom: tiny distances → wrong CRS → reproject ✅
Distance workflow:
set_crs(4326) → to_crs(3857) → .distance() → filter ✅
Coordinate order:
Shapely → Point(lon, lat) ← x,y order
Folium → [lat, lon] ← opposite!
Haversine:
✅ great-circle distance from GPS coords
❌ NOT road distance, travel time, elevation
Coverage visualization:
✅ Colored markers + radius circle on interactive map