Geographic Boundaries

R
geospatial
census
memphis
sf
tigris
leaflet

Mapping all available Census boundaries for one area.

Author
Published

October 29, 2022

Boundary Map

Below is an interactive map of 2021 TIGRIS boundaries related to the Memphis area. Layers will stack in the order they are selected.

About TIGRIS Boundaries

Every ten years, the Census attempts to count every person living in the US, as required by the Constitution. Data is collected on individuals, households, and housing units and then that data is made public and used to determine important things like political boundaries.

Due to the personal nature of the data, it is anonymized and not released at too fine of a geographic level.1 Instead, housing units are categorized into Census blocks, roughly equivalent to a neighborhood block. These blocks are then used to form other boundaries. Some boundaries nest within each other, like blocks into block groups into tracts, while others do not.

Below is a diagram of Census geographic hierarchies, available at https://www.census.gov/programs-surveys/geography/guidance/hierarchy.html.

Census Geographic Hierarchy

Census Geographic Hierarchy

Boundary data is available from the Census TIGER/Line Shapefiles page. Datasets are also easily accessed in R using the tigris package.

Making the Map

I decided to not include boundaries from the national-state level, instead focusing just on the Memphis region. The broadest geography is the Memphis-Forrest City combined statistical area (CSA), which includes the Memphis Metro Area and Forrest City Micro Area. Metro and Micro areas are collectively known as core-based statistical areas (CBSAs), and are created using counties with “a high degree of social and economic integration” (measured by commute to work).(US Census Bureau 2021) Because Forrest City is relatively small, there is only one county difference between the CSA and CBSA boundaries. I plan to use the CBSA/Metro boundaries more often, which is why that layer is visible by default.

There are a few more boundaries that may have unfamiliar abbreviations. ZCTAs, which stands for ZIP Code Tabulation Areas, are approximate Census boundaries of USPS ZIP Codes and are not bound by county, place, tract, or blocks. PUMAs, or Public Use Microdata Areas, are boundaries set every 10 years which contain at least 100,000 people. They are notably used with Public Use Microdata Sample (PUMS) data, which are anonymized individual-level Census records.2 This data is useful to researchers who want to create custom queries of data rather than using pre-tabulated estimates provided by the Census Bureau.

Using tigris and sf

Boundaries were pulled using the tigris package. A full list of available datasets and their corresponding functions is available in Chapter 5 of the book Analyzing US Census Data.

The Memphis Metro area covers three states, meaning sometimes I needed to run the same function three times for Tennessee, Arkansas, and Mississippi. I used purrr’s map_dfr() to do this automatically. Boundaries were then filtered in one of two ways, either using dplyr’s filter() function for table data or sf’s st_filter() to filter one geometry by another.

In summary, these are the packages I used to get started (I also recommend caching your datasets, but it’s optional).

library(tidyverse)
library(sf)
library(tigris)
options(tigris_use_cache = TRUE)
options(tigris_year = 2021) # currently defaults to 2020

Below is a full list of how I gathered boundaries related to Memphis, grouped by how they’re filtered.

#' Broad regions, filtered by known NAME/DIVISIONs relevant to Memphis Area
nation <- nation()
regions <- regions() %>% filter(NAME == "South")
divisions <- divisions() %>% filter(str_detect(NAME, "South Central"))
states <- states() %>% filter(DIVISION %in% c(6, 7))

# The Memphis MSA include Tennessee, Mississippi, & Arkansas
memST <- list("TN", "MS", "AR")

#' Metro and Urban areas, filtered by "Memphis"
csa <- combined_statistical_areas() %>% filter(str_detect(NAME, "Memphis")) 
cbsa <- core_based_statistical_areas() %>% 
  filter(str_detect(NAME, "Memphis")) 
urb <- urban_areas() %>% filter(str_detect(NAME10, "Memphis"))

#' Areas in Metro or Urban area
csaCounties <- map_dfr((memST), ~{counties(.x)}) %>% 
  st_filter(csa, .predicate = st_within)

counties <- csaCounties[cbsa, op = st_within]

places <- map_dfr((memST), ~{places(.x)}) %>% 
  st_filter(cbsa, .predicate = st_within)

sld <- state_legislative_districts("TN") %>% 
  st_filter(cbsa, .predicate = st_within)

zcta <- zctas(cb = TRUE, starts_with = c("38", "72")) %>% st_filter(urb)
puma <- map_dfr((memST), ~{pumas(.x)}) %>% st_filter(urb)


#' Shelby County and Memphis
shelby <- counties %>% filter(NAME == "Shelby")
memphis <- places %>% filter(NAME == "Memphis")

#' Areas within Shelby County
school <- school_districts("TN") %>% 
  st_filter(shelby, .predicate = st_within)

cong <- congressional_districts("TN") %>% 
  st_filter(shelby, .predicate = st_within)
# cong2 <- congressional_districts("TN") %>% 
#   st_filter(shelby)
# includes the very large district 8

vote <- voting_districts("TN", "Shelby")
csd <- county_subdivisions("TN", "Shelby")
tracts <- tracts("TN", "Shelby")
blkgrp <- block_groups("TN", "Shelby")

The above code may look like a lot, but running it will quickly pull from 18 datasets and automatically filter for the Memphis area. Of course, unless you’re also trying to map all the boundaries at once, there’s no reason to do this. Instead, the above code chunk is meant to be used as a reference, to quickly copy and paste whichever specific boundary is needed.

The final map was created using leaflet. The website and package documentation were useful when I got stuck.

Acknowledgements

To make this post, I heavily referenced the book Analyzing US Census Data by Kyle Walker, particularly Chapter 5, which covers basic usage of tigris, and Chapter 7, which explains how to spatially filter data. There is also a more in–depth explanation of Census boundaries in Chapter 1.

Note on CRS

A coordinate reference system (crs), in essence, tells a map how to look. If your map doesn’t look right, like it’s skewed or warped or whatever, you probably need to reangle the map by setting the crs. If you plan to combine maps, defining the crs ensures projections are consistent.

If you do not define a crs, tigris and sf will default to 4269 (NAD 1983). The package crsuggest can help find the correct crs for your map. I also found the website epsg.io useful; for example, you can quickly see the boundaries for the crs code 6510 at https://epsg.io/6510.

This crs was the top suggestion for my maps, but the projection excluded Tennessee and I found the difference minimal from the default. For simplicity, I stuck with the default.

You can check a table’s crs using st_crs(). If you need to adjust the crs of a dataset, one way is to use st_transform(crs = ####). For more info, see the section on crs in Analyzing US Census Data.

References

US Census Bureau. 2021. “Micropolitan and Metropolitan: About.” https://www.census.gov/programs-surveys/metro-micro/about.html.

Footnotes

  1. The Census also alters some data to hide personally identifying info. For instance, if someone is the only person of a certain race in a block, the Census may swap that person’s info to another block within the block group to protect their identity. According to the Census, these adjustments should not significantly impact data analysis. For more information, see that Census’s page on statistical safeguards.↩︎

  2. Until the API is complete, PUMAs data is available for download from https://www.ipums.org/.↩︎

Citation

BibTeX citation:
@online{johnson2022,
  author = {Johnson, Sarah},
  title = {Geographic {Boundaries}},
  date = {2022-10-29},
  url = {https://sarahjohnson.io/posts/boundaries.html},
  langid = {en}
}
For attribution, please cite this work as:
Johnson, Sarah. 2022. “Geographic Boundaries.” October 29, 2022. https://sarahjohnson.io/posts/boundaries.html.