Racial data vs. Congressional districts¶
We are now awash with data from different sources, but pulling it all together to gain insights can be difficult for many reasons. In this notebook we show how to combine data of very different types to show previously hidden relationships:
- "Big data": 300 million points indicating the location and racial or ethnic category of each resident of the USA in the 2010 census. See the datashader census notebook for a detailed analysis. Most tools would need to massively downsample this data before it could be displayed.
- Map data: Image tiles from ArcGIS showing natural geographic boundaries. Requires alignment and overlaying to match the census data.
- Geographic shapes: 2015 Congressional districts for the USA, downloaded from census.gov. Requires reprojection to match the coordinate system of the image tiles.
Few if any tools can alone handle all of these data sources, but here we'll show how freely available Python packages can easily be combined to explore even large, complex datasets interactively in a web browser. The resulting plots make it simple to explore how the racial distribution of the USA population corresponds to the geographic features of each region and how both of these are reflected in the shape of US Congressional districts. For instance, here's an example of using this notebook to zoom in to Houston, revealing a very precisely gerrymandered Hispanic district:
Here the US population is rendered using racial category using the key shown, with more intense colors indicating a higher population density in that pixel, and the geographic background being dimly visible where population density is low. Racially integrated neighborhoods show up as intermediate or locally mixed colors, but most neighborhoods are quite segregated, and in this case the congressional district boundary shown clearly follows the borders of this segregation.
If you run this notebook and zoom in on any urban region of interest, you can click on an area with a concentration of one racial or ethnic group to see for yourself if that district follows geographic features, state boundaries, the racial distribution, or some combination thereof.
Numerous Python packages are required for this type of analysis to work, all coordinated using conda:
- Numba: Compiles low-level numerical code written in Python into very fast machine code
- Dask: Distributes these numba-based workloads across multiple processing cores in your machine
- Datashader: Using Numba and Dask, aggregates big datasets into a fixed-sized array suitable for display in the browser
- GeoViews (using Cartopy): Project longitude, latitude shapes into Web Mercator and create visible objects
- HoloViews: Flexibly combine each of the data sources into a just-in-time displayable, interactive plot
Each package is maintained independently and focuses on doing one job really well, but they all combine seamlessly and with very little code to solve complex problems.
import holoviews as hv from holoviews import opts import geoviews as gv import datashader as ds import dask.dataframe as dd from cartopy import crs from holoviews.operation.datashader import datashade hv.extension('bokeh', width=95) opts.defaults( opts.Points(apply_ranges=False, ), opts.RGB(width=1200, height=682, xaxis=None, yaxis=None, show_grid=False), opts.Shape(fill_alpha=0, line_width=1.5, apply_ranges=False, tools=['tap']), opts.WMTS(alpha=0.5) )