Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Chapter 7.1 - Spatial Data Types

In Chapter 2, we learned about primitive and composite data types in Python. One of the comparisons we made to the “real world” was chess. Each chess piece has a set of rules that allow them to move in certain ways (with advantages and disadvantages). The pawn, for example, has a very limited set of moves, whereas the queen has almost all of the moves that other chess pieces have.

Spatial data types are similar. At the highest level, there are two major spatial data types:

  1. Raster Data - Representations of spatial data using an n-dimensional array representing equally-spaced geographic/geometric regions (i.e., cells, grid cells, pixels, etc.). Typically used for continuous spatial data.

  1. Vector Data - Representations of spatial data using geometric / geographic coordinates (i.e., points, lines, polygons, etc.). Typically used to describe a specific geographic coordinate or set of coordinates.

In the Geosciences, these are examples of each data type:

Vector data:

  1. Hourly weather observation for DeKalb Airport (point)

  2. Earthquake epicenter in California (point)

  3. Analyzed frontal boundaries (line)

  4. San Andres Fault (line)

  5. Kansas Tornado watch in April 2012 (polygon)

  6. Tectonic Plates (polygon)

Raster data:

  1. USA Precipitation Climatology

  2. Earthquake Frequency Map

In this chapter, we will discuss geographic information systems (GIS) and their applications in the Geosciences. Applicable GIS processes can include simple mapping and cartography, vector transforms, and even complex raster analyses.

The following packages are required to run this notebook (you might have to restart your notebook):

!pip install shapely
!pip install geopandas
!pip install cartopy
!pip install xarray
Requirement already satisfied: shapely in /usr/local/lib/python3.12/dist-packages (2.1.2)
Requirement already satisfied: numpy>=1.21 in /usr/local/lib/python3.12/dist-packages (from shapely) (2.0.2)
Requirement already satisfied: geopandas in /usr/local/lib/python3.12/dist-packages (1.1.2)
Requirement already satisfied: numpy>=1.24 in /usr/local/lib/python3.12/dist-packages (from geopandas) (2.0.2)
Requirement already satisfied: pyogrio>=0.7.2 in /usr/local/lib/python3.12/dist-packages (from geopandas) (0.12.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from geopandas) (25.0)
Requirement already satisfied: pandas>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from geopandas) (2.2.2)
Requirement already satisfied: pyproj>=3.5.0 in /usr/local/lib/python3.12/dist-packages (from geopandas) (3.7.2)
Requirement already satisfied: shapely>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from geopandas) (2.1.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas>=2.0.0->geopandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas>=2.0.0->geopandas) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas>=2.0.0->geopandas) (2025.3)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from pyogrio>=0.7.2->geopandas) (2026.1.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas>=2.0.0->geopandas) (1.17.0)
Requirement already satisfied: cartopy in /usr/local/lib/python3.12/dist-packages (0.25.0)
Requirement already satisfied: numpy>=1.23 in /usr/local/lib/python3.12/dist-packages (from cartopy) (2.0.2)
Requirement already satisfied: matplotlib>=3.6 in /usr/local/lib/python3.12/dist-packages (from cartopy) (3.10.0)
Requirement already satisfied: shapely>=2.0 in /usr/local/lib/python3.12/dist-packages (from cartopy) (2.1.2)
Requirement already satisfied: packaging>=21 in /usr/local/lib/python3.12/dist-packages (from cartopy) (25.0)
Requirement already satisfied: pyshp>=2.3 in /usr/local/lib/python3.12/dist-packages (from cartopy) (3.0.3)
Requirement already satisfied: pyproj>=3.3.1 in /usr/local/lib/python3.12/dist-packages (from cartopy) (3.7.2)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (4.61.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (1.4.9)
Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (11.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (3.3.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.12/dist-packages (from matplotlib>=3.6->cartopy) (2.9.0.post0)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from pyproj>=3.3.1->cartopy) (2026.1.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.7->matplotlib>=3.6->cartopy) (1.17.0)
Requirement already satisfied: xarray in /usr/local/lib/python3.12/dist-packages (2025.12.0)
Requirement already satisfied: numpy>=1.26 in /usr/local/lib/python3.12/dist-packages (from xarray) (2.0.2)
Requirement already satisfied: packaging>=24.1 in /usr/local/lib/python3.12/dist-packages (from xarray) (25.0)
Requirement already satisfied: pandas>=2.2 in /usr/local/lib/python3.12/dist-packages (from xarray) (2.2.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas>=2.2->xarray) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas>=2.2->xarray) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas>=2.2->xarray) (2025.3)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas>=2.2->xarray) (1.17.0)

Vector Data in Geopandas

In Chapter 6, we learned about the Python package named pandas. It was very useful for reading in and analyzing tabular data. geopandas can be understood as an extension of pandas--namely, it allows geometric/geographic data to be attached to tabular data.

In the examples above (earthquake information, rainfall observations, etc.), three types of information are stored:

  1. Attributes -- descriptive variables that can be used either in spatial or non-spatial analyses. Often these include date and time information, as well as variables for a particular problem. For example, precipitation totals can be added up to find a sum over a date period using non-spatial variables;

  2. Location - variables that encode a geographic location. The most basic coordinate system used is latitude/longitude degrees, but some projected coordinate systems use meters or even feet.

  3. Metadata - descriptions of the data that add more context. For example, the coordinate system, units, etc.

In geopandas the attributes of the event/observation are handled by the basic pandas methods, whereas the location and metadata information are handled by geopandas methods. As a result, you can transfer your pandas knowledge directly to geopandas.

In the example below, I use cartopy to quickly download a shapefile that describes the location and boundaries of countries using vector information.

import cartopy.io.shapereader as shpreader imports the specific “shapefile reader” (shpreader) tool. This tool has a method called natural_earth that automatically downloads common shapefiles (like state, country, etc., boundaries).

What columns are attribute data and what columns are location data?

import geopandas as gpd
import cartopy.io.shapereader as shpreader

shpfilename = shpreader.natural_earth(
    resolution="110m",
    category="cultural",
    name="admin_0_countries",
)

gdf = gpd.read_file(shpfilename)

# .head() shows the first 5 rows by default
gdf.head()
Loading...

Notice how the output looks extremely similar to a pandas DataFrame. The only difference is the final column geometry. This is where your spatial data types (vector data) are stored. The geometry field is handled by a package named shapely. This package encodes all of the “chess moves” a particular spatial data type can make. For example, a point does not represent an area, so checking to see if something is “within” a point does not make sense. Alternatively, a Polygon describes a geographic boundary, so it makes sense to check if something is “within” it.

The following spatial data types are supported by Shapely:

TypeWhat it representsTypical usesDocs
PointA single coordinate (x, y)Station/location observations; centroids; sampling points; nearest-neighbor queriesPoint[1]
LineStringOne or more connected line segments (an ordered vertex path)Roads/streams/tracks; transects; flow paths; distance/overlay operationsLineString[1]
LinearRingA closed LineString (first point == last point) used as a boundaryPolygon shells/holes; validating/constructing polygon boundariesLinearRing [1]
PolygonAn area enclosed by a ring (optionally with interior holes)Administrative boundaries; catchments; masks/clipping; areal stats (intersection/union)Polygon[1]
MultiPointA collection of PointsEvent catalogs; clustered point sets; multi-feature outputs from opsMultiPoint[1]
MultiLineStringA collection of LineStringsRiver networks/segments; multi-part lines from splits/mergesMultiLineString[1]
MultiPolygonA collection of PolygonsMulti-part admin units; islands/archipelagos; dissolved outputsMultiPolyon[1]
GeometryCollectionA heterogeneous collection of geometries (mixed types)generic collection; holding mixed outputs in one objectGeometryCollection[1]

[1]: https://shapely.readthedocs.io/en/2.1.2/geometry.html" Geometry — Shapely 2.1.2 documentation - Read the Docs"

Chapter 7.1.1 - Point

Creating a shapely Point requires an x and a y coordinate. This can be in simple “cartesian” coordinates (e.g., [1, 2]), or more commonly in geographic coordinates (e.g., -104.3, 45.1).

To create a point, you must import the class from Shapely.

Note, the typical format that you find latitude/longitude data on, for example, google maps is “YY.YY North” and “XX.XX West”. Shapely and other programs require float data and typically have the “x” coordinate (e.g., longitude) first and the “y” coordinate (e.g., latitude) second.

If you have a longitude “east” or a latitude “north”, the float remains the same, just without the “E” or “N”. If you have a longitude “west” (as is the case in the U.S.) or a latitude “south”, the “W” and “S” are removed, and you include a negative sign. For example:

41.9295° N, 88.7499° W (DeKalb, IL) -> -88,7499, 41.9295

37.8136° S, 144.9631° E (Melbourne, Australia) -> 144.9631, -37.8136

from shapely.geometry import Point

point = Point(41.9295, -88.7499)

point
Loading...

You can access the following Point properties using “dot” notation:

  1. p.x: x-coordinate value

  2. p.y: y-coordinate value

  3. p.z: z-coordinate value (i.e., height, elevation, etc., if included)

  4. p.has_z: bool flag if a z-coordinate is available or not

Think about: How might you modify the constructor above to add a z-coordinate?

print(point.x, "x coordinate")
print(point.y, "y coordinate")

if point.has_z:
    print(point.z, "z coordinate")
else:
    print("point has no z coordinate")
41.9295 x coordinate
-88.7499 y coordinate
point has no z coordinate

Buffer: a common point-based method.

A buffer method used with a Point will create a circle centered on the point that has a given radius r. This turns the Point into a Polygon.

This can be used to find other points or geometry within a certain distance. For example, if you have an epicenter of an earthquake, you might want to know how many houses are within x miles. For a hail report, you might want to know if any car dealerships are within x meters of the report.

Here is a simple example of a buffer that creates a circle of radius 5 around the point we defined above:

Think about: when you are defining a radius distance, what is the first unit of measurement that comes to mind? Foot? Meter? Degree? The buffer will always be in the same unit as the coordinates. So, if you have a latitude/longitude degree coordinate, and want a buffer of 10 kilometers, providing an r of 10 will give you a very unexpected result. We will talk about how to avoid this later in this chapter.

buffer = point.buffer(distance=5)

buffer
Loading...

Compare the printed point and printed buffer attributes below. What do you notice?

The original variable point has a spatial data type of Point. After applying the buffer method to point, the new variable now has a type of Polygon. Since you set the result equal to a different variable, the original point data is not modified.

Think about: Why would such a simple shape (circle) have so many coordinates in the Shapely data model?

print(point)

print(buffer)
POINT (41.9295 -88.7499)
POLYGON ((46.9295 -88.7499, 46.90542363336098 -89.2399857016478, 46.83342640201615 -89.72535161008064, 46.71420167866104 -90.2013233862723, 46.54889766255643 -90.66331716182545, 46.339106321741774 -91.10688368412998, 46.086848061512725 -91.52775116509801, 45.79455226681368 -91.92186642081822, 45.465033905932735 -92.28543390593273, 45.10146642081823 -92.61495226681369, 44.70735116509801 -92.90724806151272, 44.28648368412998 -93.15950632174177, 43.84291716182545 -93.36929766255643, 43.38092338627231 -93.53460167866105, 42.90495161008064 -93.65382640201615, 42.419585701647804 -93.72582363336097, 41.9295 -93.7499, 41.4394142983522 -93.72582363336099, 40.95404838991936 -93.65382640201615, 40.47807661372769 -93.53460167866105, 40.01608283817455 -93.36929766255643, 39.57251631587001 -93.15950632174177, 39.15164883490199 -92.90724806151272, 38.75753357918177 -92.61495226681369, 38.39396609406726 -92.28543390593273, 38.06444773318631 -91.92186642081822, 37.77215193848727 -91.52775116509801, 37.51989367825822 -91.10688368412998, 37.31010233744357 -90.66331716182545, 37.14479832133895 -90.2013233862723, 37.02557359798384 -89.72535161008064, 36.95357636663901 -89.2399857016478, 36.9295 -88.7499, 36.95357636663901 -88.2598142983522, 37.02557359798384 -87.77444838991936, 37.14479832133895 -87.29847661372769, 37.31010233744357 -86.83648283817455, 37.51989367825822 -86.39291631587001, 37.77215193848727 -85.97204883490198, 38.06444773318631 -85.57793357918177, 38.39396609406726 -85.21436609406726, 38.75753357918177 -84.88484773318632, 39.151648834901984 -84.59255193848728, 39.57251631587001 -84.34029367825822, 40.01608283817455 -84.13050233744356, 40.47807661372769 -83.96519832133896, 40.95404838991936 -83.84597359798384, 41.4394142983522 -83.773976366639, 41.9295 -83.7499, 42.4195857016478 -83.773976366639, 42.90495161008064 -83.84597359798384, 43.38092338627231 -83.96519832133895, 43.84291716182545 -84.13050233744356, 44.28648368412998 -84.34029367825822, 44.707351165098004 -84.59255193848728, 45.10146642081823 -84.88484773318632, 45.465033905932735 -85.21436609406726, 45.79455226681368 -85.57793357918177, 46.086848061512725 -85.97204883490198, 46.339106321741774 -86.39291631587001, 46.54889766255643 -86.83648283817455, 46.71420167866104 -87.29847661372769, 46.83342640201615 -87.77444838991936, 46.90542363336098 -88.2598142983522, 46.9295 -88.7499))

Chapter 7.1.2 - LineString

A line string requires at least two geographic points to define the starting vertex and the ending vertex (i.e., the vertices of the LineString). The data type automatically creates an edge (line) between the points. LineString expects a list of points, so make sure you are passing in the correct Python data type. Typically, coordinates are written as tuples and can be stored in a list.

For example, we might want to describe the starting and ending points of a Tornado using a LineString.

Let’s say it started in DeKalb, IL (41.9295° N, 88.7499° W) and ended in St. Charles, IL (41.9142° N, 88.3087° W). Modify the example below to create that track:

from shapely.geometry import LineString

vertices = [(1, 2), (3, 4)]

line = LineString(vertices)

line
Loading...

Let’s say the tornado was 1000 meters wide. Run a buffer method on the line to show the true width of the tornado:

LineString methods

LineStrings have some useful methods that can help when running an analysis or displaying data.

  1. length: find the length of the line (in the geographic units provided)

  2. xy: get the vertices in list form (makes it easier to plot on a map)

Print the length of the line you created above. Does it seem reasonable?

Think about: what is the equation to calculate the distance between two points? You can use this to validate your output and detect any errors or assumption violations.

In the weeds: Measuring distance becomes more complicated with latitude/longitude and even projected coordinates when the two points are far away from one another. For latitude/longitude, you have to use great circle distance to accurately calculate the distance. This rotates the sphere (i.e., Earth’s representation) so that the two points are on the “equator” of the sphere ( Geographical distance ). For relatively small distances, especially within projected coordinates, it is reasonable to treat the coordinates as Cartesian coordinates , as long as there is not a need for extremely accurate measurements. For this course, we are most interested in understanding and using projections correctly. An advanced GIS course may cover the other topics in detail.

print(line.length)
2.8284271247461903

Chapter 7.1.3 - Polygon

Respect the polygon!

A Polygon is created in a similar way to LineString, except it requires at least 3 points (why 3?). A Polygon represents the boundaries of a geographic area. This can be very simple (3 vertices) or extremely complex.

In the weeds: The Coastline Paradox . This is the unintuitive finding that the “actual” distance of a coastline (or other geographic feature) can only be approximated. To increase the detail of the Polygon (see below), you have to increase the number of vertices to better approximate the “real” boundaries. However, coastline features become increasingly fractal at smaller scales, resulting in an exponential increase in vertex count and complexity (and processing times!). As you may have noticed, even something as simple as measuring distance or the length of a coastline requires assumptions and/or compromises. In this case, you have to pick a spatial scale that is “good enough” for your application.

cparadox

Create a tornado warning polygon that includes DeKalb, IL by modifying the Polygon definition below.

Note: Shapely will close your Polygon if you do not include the first coordinate as the last coordinate.

from shapely.geometry import Polygon

# will automatically add (1, 2) to close the polygon
vertices = [(1, 2), (1, 4), (3, 4), (3, 2)]

poly = Polygon(vertices)

poly
Loading...

Polygon Methods

  1. area: the area of the polygon in the given coordinates

  2. length: the perimeter of the polygon

  3. contains: does the polygon contain a given geometry object?

Print out the area and perimeter of your polygon below. Does this make sense?

Think about: What are the equations for area and perimeter? You can use these to validate the results and find bugs or assumption violations.

print(poly.area, "area")
print(poly.length, "perimeter")
4.0 area
8.0 perimeter

Chapter 7.1.4 - General Geometry Methods

Shapely (and by extension, geopandas) provides multiple types of methods that can manipulate, measure, and analyze geometric objects. The general types of processes include:

  1. Affine transformations: Methods that can translate (move), rotate (modify the orientation), and/or scale (change size) geometric objects. The full list of methods can be found in the Shapely Documentation - Affine.

  2. Spatial Relationships: Methods that test the existence of certain relationships for a geometry or between geometry. For example, is one object inside the other object? Do the two objects intersect in space? The full list of methods can be found in the Shapely Documentation - Spatial Relationships.

  3. Measurements: geometry-aware measurements of object length, area, perimeter, etc.

  4. Constructive: geometry-aware manipulations that create a new polygon from the original polygon. Examples include buffer, bounding box, geometry simplifications, etc.

  5. Overlay: Widely-used methods (sometimes called set operations) that overlay multiple geometric objects to create a new object with geometric features derived from the original objects. Examples include union, intersect, etc. The full list of methods can be found in the Shapely Documentation - Set Operations

Affine Transformations

First, define a polygon (make it a rectangle so the transforms are easier to see):

from shapely.geometry import Polygon

vertices = [(1, 2), (1, 10), (3, 10), (3, 2)]

poly = Polygon(vertices)

poly
Loading...

import the affinity module from Shapely.

This module has several methods, including

  1. rotate: modify the orientation of the geometry by a given degree

  2. scale: modify the size of the geometry in the x and/or y dimensions by a given factor

  3. translate: move the geometry in the x and/or y dimension by a given distance (hard to see here, so we will not demonstrate yet)

Modify the code below to rotate the polygon by 90 degrees in any direction.

Note: In this coordinate system, counter-clockwise rotation occurs with a positive angle, and clockwise rotation with a negative angle.

from shapely import affinity

scaled = affinity.rotate(poly, angle=-15, origin='centroid')

scaled
Loading...

Modify the code below to scale the geometry by 10 in the x dimension and 0.5 in the y dimension. Does the result meet your expectations?

from shapely import affinity

scaled = affinity.scale(poly, xfact=10, yfact=0.5)

scaled
Loading...

Overlay transformations

Assuming two polygons a and b:

  1. intersection: new polygon only where a and b overlap

  2. union: new polygon combining regions contained by a and b.

  3. difference: assuming a.difference(b), only areas in a that are not contained by b

from shapely.geometry import Polygon

vertices1 = [(1, 2), (1, 10), (3, 10), (3, 2)]

vertices2 = [(0, 2), (1, 11), (3, 8), (3, 0)]

poly1 = Polygon(vertices1)
poly2 = Polygon(vertices2)

poly2
Loading...

Create an intersection of poly1 and poly2 based on the documentation here: https://shapely.readthedocs.io/en/stable/reference/shapely.intersection.html

Create a union of poly1 and poly2 based on the documentation here: https://shapely.readthedocs.io/en/2.1.1/reference/shapely.union.html#shapely.union

Create a difference of poly1 and poly2 based on the documentation here: https://shapely.readthedocs.io/en/stable/reference/shapely.difference.html