More common spatial analysis tools

Rachel Franklin

Newcastle University

Goals for this session

Get comfortable with a few more foundational GIS spatial analysis tools
Start to see how concepts and tools map onto research questions
Building understanding of spatial analysis workflows

Without further ado, let’s dive in

Thiessen (or Voronoi) polygons

Deriving polygons from points

What about the case where we might want to go from point features to polygons?
For example, suppose we want to define “catchment areas” for each metro station in Rome.
- The assumption being that each person will use the station closest to them

Thiessen (or Voronoi) polygons

Deriving polygons from points

Thiessen (or Voronoi) polygons

Allocating areas to their closest points

In QGIS: Voronoi Polygons tool
- Creates new polygon layer as output
- Default extent of polygons is the bounded rectangle containing the points
  - Use buffer region option to expand boundaries of new polygon layer

Thiessen (or Voronoi) polygons

Allocating areas to their closest points

How to change boundaries of a layer to match another

`Clip`

In QGIS located under Vector > Geoprocessing
- Clips—or cuts—one layer, using boundaries of a second layer
- So here, we can clip Voronoi polygons using the urban area polygon layer

How to change boundaries of a layer to match another

Clip

Other common tools

Near (`distance to nearest hub` in QGIS)

Calculates the straight line distance from the input feature to the nearest feature in a specified feature set

Grids

Creating our own polygons

Useful for generating uniform polygons as inputs for spatial joins
- Overcomes shortcomings of administrative geographies which can vary considerably in size
Also useful for visualizing data in an appealing way
In QGIS, under Research Tools > Create Grid
- Choice of size and shape of output polygons (e.g., hexagons, squares, diamonds)

Grids

Creating our own polygons

Grids

Spatial join with grids: AirBnb

What other tools could we use to improve this visualization?

Density estimation

Take our standard approach so far

Visualising points

Take our standard approach so far

Point in polygon

The usual approach, where we reference counts of occurrences by some spatial unit and/or by area

Density estimation

Here, the darker the polygon, the higher the point count, which can be problematic

Dependent on choice of spatial unit
Assumes that points immediately outside the unit have no impact
An alternative is to generate a surface that estimates the density of points across the study area
We’ll talk about the most common method:
- Kernel Density Estimation

Density estimation

With density estimation, we start with a set of points
These may be simple points (e.g., crime occurrences) or points with data associated with them (e.g., cities with a population attribute)
In this case, we are looking for alternative ways to visualize and measure the density of points
This can be useful for:
- Hot spot identification
- Alternatives to choropleth mapping

Kernel density

Outputs a raster, where we decide the cell size
Cell contents are the estimated density of points for that area
Kernel density methods place a kernel of a particular shape and of a particular bandwidth over each point and distributes its value across the kernel
This kernel is like the neighborhood, except that it surrounds the points and distributes their value underneath the curve
- Cell densities are then the sum of the kernel intersections that occur

Kernel density

Airbnb density in Rome, March 2023

Kernel density

Results will depend on cell size and bandwidth (or radius)
- Larger cell sizes will be grainier
- Larger radii will result in a smoother surface

Spatial statistics, a short overview

Spatial statistics

Spatial statistics are like regular statistics
We have numbers we calculate to describe spatial patterns or how values are distributed across space
Common applications for spatial statistics:
- to help us understand when a pattern really is unusual or where outliers really are
- to estimate models that either measure the impact “space” has on our dependent variable or filter that effect out so we can focus on the variables we’re truly interested in

Spatial statistics

A big umbrella term

Point pattern analysis
- Is the distribution of points random? Uniform? Can we identify clusters?
Measures of spatial autocorrelation or dependence
- Global – Do we observe positive or negative autocorrelation across our study area
- Local – Are values correlated with local neighbors?
  - House values
  - Crime
Capturing spatial heterogenity in relationships and processes
- e.g., Geographically Weighted Regression

Spatial autocorrelation

Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than distant things.”
A variable’s values are related to each other in space – they’re correlated
This means that observations are often not independent of each other
- For example, house values: If I tell you how much a particular house is worth, does it affect your prediction of the neighboring house’s value?
We distinguish between two types of autocorrelation: positive and negative

Spatial autocorrelation

GeoDa

GeoDa((https://geodacenter.github.io) – An exploratory spatial data analysis package.
It’s free and in some ways much more powerful than QGIS or ArcMap
To use, open GeoDa
- And then, under File, open geopackage or shapefile
Native facility with spatial statistics
- Good first choice for visualization, exploration, and model estimation
- Plus cartograms, variable smoothing tools, and lots of other exploratory tools

Spatial autocorrelation

What is neighborhood?

All measures of spatial association depend on scale
- And how we define neighbors
Neighborhoods can be defined based on distance, contiguity, or nearest neighbors
- Distance: My neighbors are those who live within a mile of me, for example
- Contiguity: Refers to polygons. My neighbors are those I share a border with:
  - Queen’s case: Shared borders and corners count for contiguity
  - Rook’s case: Only shared borders count for contiguity
- k Nearest Neighbors: The k nearest units to mine are considered my neighbors
1st order versus 2nd order, etc: We could choose our immediate neighbors, or those that are neighbors of our neighbors.

Spatial autocorrelation

Quantifying neighborhood

When we define our neighborhood, this is implemented using a “weights matrix”
- Usually 1 and 0’s that indicate yes or no for whether a spatial unit is my neighbor
- Units are not considered neighbors of themselves
- These matrices are generally symmetric – If I’m your neighbor, then you’re my neighbor (but this isn’t always the case – e.g., k nearest neighbors)
How you define your neighborhood can influence results
- So, best to check how sensitive your results are to your choice of weights
Ideally choice of weights is informed by data, theory, and previous research

Spatial autocorrelation

Global – Moran’s I

The most commonly used measure of global spatial autocorrelation
This is how we estimate the extent of spatial dependence across a set of units for a particular variable
This is also how we might typically assess whether model residuals are spatially auto-correlated

Spatial autocorrelation

Moran’s I – Airbnb counts across Rome urban zones

Urban Zones with counts of AirBnb listings
Weights file
Significance is generated through randomization
Interpretation: The line indicates the correlation between numbers of AirBnbs and spatially lagged values
Positive spatial autocorrelation in the distribution of AirBnbs across Urban Zones in Rome

Spatial autocorrelation

Local Indicators of Spatial Association (LISA)

Local Moran’s I

Identifies the presence or absence of significant spatial clusters or outliers for each location. A randomization approach is used to generate a spatially random reference distribution to assess statistical significance
Rather than looking at the entire spatial configuration of units and values at one time—like the global Moran’s I—local indicators of spatial autocorrelation look at each observation at a time, along with values of neighbors

Spatial autocorrelation

Getis-Ord Gi* Statistic

The resulting Z score tells you where features with either high or low values cluster spatially. This tool works by looking at each feature within the context of neighboring features.
A feature with a high value is interesting, but may not be a statistically significant hot spot. To be a statistically significant hot spot, a feature will have a high value and be surrounded by other features with high values as well.

More common spatial analysis tools

Goals for this session

Without further ado, let’s dive in

Thiessen (or Voronoi) polygons

Deriving polygons from points

Thiessen (or Voronoi) polygons

Deriving polygons from points

Thiessen (or Voronoi) polygons

Allocating areas to their closest points

Thiessen (or Voronoi) polygons

Allocating areas to their closest points

How to change boundaries of a layer to match another

Clip

How to change boundaries of a layer to match another

Clip

Other common tools

Near (distance to nearest hub in QGIS)

Calculates the straight line distance from the input feature to the nearest feature in a specified feature set

Grids

Creating our own polygons

Grids

Creating our own polygons

Grids

Spatial join with grids: AirBnb

Density estimation

Take our standard approach so far

Visualising points

Take our standard approach so far

Point in polygon

Density estimation

Here, the darker the polygon, the higher the point count, which can be problematic

Density estimation

Kernel density

Kernel density

Airbnb density in Rome, March 2023

Kernel density

Spatial statistics, a short overview

Spatial statistics

Spatial statistics

A big umbrella term

Spatial autocorrelation

Spatial autocorrelation

GeoDa

Spatial autocorrelation

What is neighborhood?

Spatial autocorrelation

Quantifying neighborhood

Spatial autocorrelation

Global – Moran’s I

Spatial autocorrelation

Moran’s I – Airbnb counts across Rome urban zones

Spatial autocorrelation

Local Indicators of Spatial Association (LISA)

Spatial autocorrelation

Getis-Ord Gi* Statistic

Spatial autocorrelation

Counts of AirBnb Listings in Rome (GeoDa)

Next up: Tutorial 4!

`Clip`

Near (`distance to nearest hub` in QGIS)