Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

L2 - Geopandas


10 total points

Due: January 25th, 2026 at 11:59 p.m.


Directions:

  1. Please rename the file by clicking on “LX-First-Last.ipynb” where X is the lab number, and replace First and Last with your first and last name.

  2. Click File -> Save to make sure your most recent edits are saved.

  3. In the upper right hand corner of the screen, click on “Share”. Click on “Restricted” and change it to “Anyone with the link”.

  4. Copy the link and submit it on Blackboard. Make sure you follow these steps completely, or I will be unable to grade your work.


Overview

This lab will help you understand Geopandas and its capabilities. We will walk through some examples of how geopandas can help solve Geoscience problems. Periodically, I will 1) ask you to either repeat the demonstrated code in a slightly different way; or 2) ask you to combine two or more techniques to solve a problem.

Please do not use generative AI to answer these problems. In one example, I provide generative AI code that “solves” a problem and ask you to tell me where it went wrong. It is crucial that you understand the code well enough to effectively use generative AI tools that are likely to be widely available and recommended for use at many organizations. Although they are improving at an incredible rate, they still produce bugs, especially with domain-specific and complex problems.

Download

Make sure that you run the following commands to download the data used for this lab. This dataset includes all tornado warning and severe thunderstorm warning attribute, location, and metadata information for 2020 in the USA:

!mkdir warning_data
!cd warning_data && wget -nc https://mesonet.agron.iastate.edu/pickup/wwa/2020_tsmf_sbw.zip
!cd warning_data && unzip 2020_tsmf_sbw.zip
--2026-01-19 20:26:51--  https://mesonet.agron.iastate.edu/pickup/wwa/2020_tsmf_sbw.zip
Resolving mesonet.agron.iastate.edu (mesonet.agron.iastate.edu)... 129.186.90.1, 2610:130:108:480::81ba:5a01
Connecting to mesonet.agron.iastate.edu (mesonet.agron.iastate.edu)|129.186.90.1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5261747 (5.0M) [application/zip]
Saving to: ‘2020_tsmf_sbw.zip’

2020_tsmf_sbw.zip   100%[===================>]   5.02M  5.48MB/s    in 0.9s    

2026-01-19 20:26:52 (5.48 MB/s) - ‘2020_tsmf_sbw.zip’ saved [5261747/5261747]

Archive:  2020_tsmf_sbw.zip
  inflating: wwa_202001010000_202012312359.shp  
  inflating: wwa_202001010000_202012312359.shx  
  inflating: wwa_202001010000_202012312359.dbf  
  inflating: wwa_202001010000_202012312359.prj  
  inflating: wwa_202001010000_202012312359.cpg  
  inflating: wwa_202001010000_202012312359.csv  

Install the required packages:

!pip install cartopy
!pip install geopandas
Collecting cartopy
  Downloading cartopy-0.25.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (6.1 kB)
Requirement already satisfied: numpy>=1.23 in /usr/local/lib/python3.12/dist-packages (from cartopy) (2.0.2)
Requirement already satisfied: matplotlib>=3.6 in /usr/local/lib/python3.12/dist-packages (from cartopy) (3.10.0)
Requirement already satisfied: shapely>=2.0 in /usr/local/lib/python3.12/dist-packages (from cartopy) (2.1.2)
Requirement already satisfied: packaging>=21 in /usr/local/lib/python3.12/dist-packages (from cartopy) (25.0)
Requirement already satisfied: pyshp>=2.3 in /usr/local/lib/python3.12/dist-packages (from cartopy) (3.0.3)
Requirement already satisfied: pyproj>=3.3.1 in /usr/local/lib/python3.12/dist-packages (from cartopy) (3.7.2)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (4.61.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (1.4.9)
Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (11.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (3.3.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (2.9.0.post0)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from pyproj>=3.3.1->cartopy) (2026.1.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.7->matplotlib>=3.6->cartopy) (1.17.0)
Downloading cartopy-0.25.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (11.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 76.3 MB/s eta 0:00:00
Installing collected packages: cartopy
Successfully installed cartopy-0.25.0
Requirement already satisfied: geopandas in /usr/local/lib/python3.12/dist-packages (1.1.2)
Requirement already satisfied: numpy>=1.24 in /usr/local/lib/python3.12/dist-packages (from geopandas) (2.0.2)
Requirement already satisfied: pyogrio>=0.7.2 in /usr/local/lib/python3.12/dist-packages (from geopandas) (0.12.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from geopandas) (25.0)
Requirement already satisfied: pandas>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from geopandas) (2.2.2)
Requirement already satisfied: pyproj>=3.5.0 in /usr/local/lib/python3.12/dist-packages (from geopandas) (3.7.2)
Requirement already satisfied: shapely>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from geopandas) (2.1.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas>=2.0.0->geopandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas>=2.0.0->geopandas) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas>=2.0.0->geopandas) (2025.3)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from pyogrio>=0.7.2->geopandas) (2026.1.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas>=2.0.0->geopandas) (1.17.0)

Geopandas can directly read shapefiles. To read in the data we just downloaded, run the following code:

import geopandas as gpd

warnings = gpd.read_file("warning_data/wwa_202001010000_202012312359.shp")

warnings.head()
Loading...

Problem 1

You are tasked with visualizing the warning polygons associated with the derecho event of 2020 in Illinois. The first step is to subset the ~28,000 rows so that you are only working with data that overlaps your period of interest.

Use the .columns property to examine what columns are available for filtering purposes:

warnings.columns
Index(['WFO', 'ISSUED', 'EXPIRED', 'INIT_ISS', 'INIT_EXP', 'PHENOM', 'GTYPE', 'SIG', 'ETN', 'STATUS', 'NWS_UGC', 'AREA_KM2', 'UPDATED', 'HV_NWSLI', 'HV_SEV', 'HV_CAUSE', 'HV_REC', 'EMERGENC', 'POLY_BEG', 'POLY_END', 'WINDTAG', 'HAILTAG', 'TORNTAG', 'DAMAGTAG', 'PROD_ID', 'geometry'], dtype='object')

The ISSUED column is the time that the warning was first produced and sent by the local NWS station. We can examine just this column using column header indexing. We see that the date and time is formatted as such:

202005072219 -> YYYYMMDDHHmm

We could use string indexing to extract dates and times from this string, but it is easier just to turn it into a datetime object.

warnings['ISSUED'].head()
Loading...

One way to do this is to use the pandas method to_datetime which can handle many datetime string formats withoput much effort. If you scroll all the way to the right below, you can see a new column named ISSUED_DT that contains a formated date:

import pandas as pd

warnings['ISSUED_DT'] = pd.to_datetime(warnings['ISSUED'])

warnings.head()
Loading...

This is very useful for us because we can access datetime methods by accessing the column via column indexing, accessing the dt (datetime) property for each row, and then access the properties: year, month, day, hour, minute, etc.

Below is an example of using this approach to print out all of the months for each row:

print(warnings['ISSUED_DT'].dt.month)
0        5
1        7
2        6
3        8
4        8
        ..
28026    2
28027    5
28028    4
28029    5
28030    5
Name: ISSUED_DT, Length: 28031, dtype: int32

We can even do statistics on these properties. As we discussed in Chapter 6, value_counts is a very useful method that counts how often a unique value occurs (e.g., a month int) and then provides a sorted, descending list of the frequencies.

7 showed up the most (nearly 6000 times), which suggests that July was the most active month from a warning issuance perspective:

warnings['ISSUED_DT'].dt.month.value_counts()
Loading...

We can examine the breakdown of each type of warning throughout the year. We are most interested in:

  1. SV (severe thunderstorm warning); and

  2. TO (tornado warning):

warnings['PHENOM'].value_counts()
Loading...

Problem 1a

Use pandas filtering on the warning GeoDataFrame to compare monthly counts of tornado warnings and severe thunderstorm warnings.

Problem 1b

  1. Use external sources to determine generally when the derecho moved through Illinois (date and hour range).

  2. Use the image archive tool to examine radar loops from that time. If you correctly fill in the variables below, the resulting link should bring you to an animation on the Iowa State Mesonet page. NOTE: These are UTC times (add 5 hours to central time in August).

### Change these to match the time you want to examine
year = 2020
month = 12
day = 31
hour = 23
minute = 45

### Do not change below
prefix = "https://mesonet.agron.iastate.edu/current/mcview.phtml?prod=lotrad&java=script&mode=archive&frames=13&interval=5"
time_str = f"&year={year}&month={month}&day={day}&hour={hour}&minute={minute}"

print(prefix + time_str)
https://mesonet.agron.iastate.edu/current/mcview.phtml?prod=lotrad&java=script&mode=archive&frames=13&interval=5&year=2020&month=12&day=31&hour=23&minute=45

Problem 1c

Based on you manual assessment of the radar animation, use ISSUED_DT to filter warnings for rows that are within the time range of the derecho passing through Illinois and are tornado or severe thunderstorm warnings. Set the result to a new variable named warn_derecho.

datetime.datetime takes the arguments year, month, day, hour, and minute in order.

import datetime

start_date = datetime.datetime(2020, 1, 1, 0, 1)
end_date = datetime.datetime(2020, 12, 31, 23, 45)

warn_derecho = warnings

warn_derecho
Loading...

Problem 1d

Verify that your date range is correct by printing out the max and min times within the subset’s ISSUED_DT.

Why might these not match the bounds you specified above (ANSWER IN MARKDOWN)?

Answer:

### change variable from warnings to warn_derecho
print("minimum time is: ", warn_derecho['ISSUED_DT'].min())

### change variable from warnings to warn_derecho
print("maximum time is: ", warn_derecho['ISSUED_DT'].max())
minimum time is:  2020-01-02 20:41:00
maximum time is:  2020-12-31 21:52:00

Problem 2

Considering your map project is an important part of any analysis that includes spatial data types. We can examine the projection information associated with the shapefile we read in earlier by using the crs property:

warn_derecho.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich

The coordinate system and mapping information associated with the warnings is the basic “latitude / longitude”. This is very convenient for sharing data and long-term file storage, but it can cause inconsistencies in your analysis.

For example, let’s look at the area of the qualifying warnings by using the area property in the GeoDataFrame. This is based on the shapely property that is built-in to spatial data types that have areas (e.g., polygons, etc.).

You will notice that you have a warning that tells you, basically, to not trust the area that is returned! These areas are very small because they are “square degrees”.

warn_derecho.area
/tmp/ipython-input-1819352967.py:1: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  warn_derecho.area
Loading...

geopandas also has a solution to this issue. There is a method called to_crs that will automatically convert your geometry between coordinate systems.

The easiest way to perform this task is to find an appropriate projection code. This website is useful for getting many of the most common, non-custom, projection codes: https://epsg.io/

Typically, if we are trying to calculate area, we want what is called an “equal area” projection. For the US, there are two popular choices:

  1. USGS Contiguous USA Albers Equal Area (EPSG:5070; units = meters)

  2. UTM 16 N (if examining Illinois or states directly North or South; EPSG:26916; units = meters)

A “USA Albers Equal Area” is a good general use projection for the USA. UTM 16 N is only good for specific states, however, it is more accurate than the general USA equal area (2.0 m accuracy).

Let’s choose the USA version and perform the projection below.

NOTE: this will have no impact on your attribute columns, just the geometry columns and the spatial properties of the GeoDataFrame.

# make sure your GeoDataFrame has a projection
print("initial proj code:", warn_derecho.crs.to_epsg())

# use to_crs to convert to "ESRI:102039"
warn_derecho_proj = warn_derecho.to_crs("EPSG:5070")

# examine new code to make sure the projection worked:
print("new proj code:", warn_derecho_proj.crs.to_epsg())

# convert to sq. km (1 sq. km = 1e6 sq. m)
warn_derecho_proj['area_albers_sqm'] = warn_derecho_proj.area
warn_derecho_proj['area_albers_sqkm'] = warn_derecho_proj['area_albers_sqm'] / 1e6

# print out the new areas
display(warn_derecho_proj[['area_albers_sqm', 'area_albers_sqkm']])
initial proj code: 4326
new proj code: 5070
Loading...

Problem 2a

Compare the area differences in sq. km between albers and UTM by projecting to UTM and creating a new column. Display both columns as demonstrated above.

Problem 2b

Find the mean size (in sq. km) of the warnings during the period you are examining. What is the size of DeKalb County? Do your results make sense?

Problem 2c

Compare the mean size of tornado warnings and severe thunderstorm warnings. Which are bigger on average?

Problem 2d

Run another area analysis that you find interesting or are curious about. Explain your approach in the markdown and interpret the results.

Problem 3

Plotting geometry on a cartopy map is relatively straightforward. The only tricky part can be handling projections.

For simplicity, we are going to use the filtered but lat/lon version of the GeoDataFrame warn_derecho.

We will use cartopy to map the warning data.

Making a simple cartopy map only takes a few lines of code. We will talk more about mapping in future labs, but here are the basics:

import cartopy.crs as ccrs: ccrs handles all of the projection duties

import cartopy.feature as cfeature: cfeature handles convenient methods to plot common boundaries (US States, etc.).

These two imports will be included in most, if not all, of your code that involves cartopy mapping.

The code below:

  1. Sets the map projection to lat/lon (projection=ccrs.PlateCarree())

  2. Adds US states to the map ax.add_feature(cfeature.STATES)

  3. Sets the map extent to areas centered on Illinois.

import matplotlib.pyplot as plt

import cartopy.crs as ccrs
import cartopy.feature as cfeature

ax = plt.subplot(1, 1, 1, projection=ccrs.PlateCarree())

ax.set_extent([-92, -87, 36, 43])

ax.add_feature(cfeature.STATES)
<cartopy.mpl.feature_artist.FeatureArtist at 0x782fd701f500>
<Figure size 640x480 with 1 Axes>

Problem 3a

The add_geometries method can add any geometry with geographic information to the map. This function takes a list or numpy array of geometries. Create a polygon that would be within Illinois based on its lat/lon coordinates.

from shapely.geometry import Polygon

poly = Polygon([(1,1), (1,1), (1,1), (1,1)])

ax = plt.subplot(1, 1, 1, projection=ccrs.PlateCarree())

ax.set_extent([-92, -87, 36, 43])

ax.add_feature(cfeature.STATES)

ax.add_geometries([poly], crs=ccrs.PlateCarree(), color='red')
<cartopy.mpl.feature_artist.FeatureArtist at 0x782fd4949d30>
<Figure size 640x480 with 1 Axes>

Problem 3b

Remove two instances of .sample(100) from the code below once you determine the correct subset for warn_derecho in Problem 1c.

Modify the following code so the tornado warnings are red and severe thunderstorm warnings are yellow in your 2020 derecho subset.

ax = plt.subplot(1, 1, 1, projection=ccrs.PlateCarree())

ax.set_extent([-92, -87, 36, 43])

ax.add_feature(cfeature.STATES)

warn_derecho.sample(100).plot(ax=ax, alpha=0.5, facecolor='red', edgecolor='black')

warn_derecho.sample(100).plot(ax=ax, alpha=0.5, facecolor='yellow', edgecolor='black')
<GeoAxes: >
<Figure size 640x480 with 1 Axes>