NA (Last Updated: 2024-04-11)

The Disappearing Walker Lake#

While the loss of the Aral Sea in Kazakhstan and Lake Urmia in Iran have received a lot of attention over the last few decades, this trend is a global phenomena. Reciently a number of papers have been published including one focusing on the Decline of the world’s saline lakes. Many of these lakes have lost the majority of their volume over the last century, including Walker Lake (Nevada, USA) which has lost 90 percent of its volume over the last 100 years.

The following example is intended to replicate the typical processing required in change detection studies similar to the Decline of the world’s saline lakes.

import warnings
import intake
import numpy as np
import holoviews as hv
from holoviews import opts
import geoviews as gv
import cartopy.crs as ccrs

from dask.array import PerformanceWarning
from colorcet import coolwarm
from holoviews.operation.datashader import rasterize

warnings.simplefilter('ignore', PerformanceWarning)
hv.extension('bokeh', width=80)

In this example, we would like to use Dask to demonstrate how image processing can be distributed across workers, either running locally or across a cluster. In the next cell, we instantiate a Dask distributed Client where we request eight, single-threaded workers and declare a memory limit of 8GB per worker. You can experiment with different memory limits (e.g 4GB) and different numbers of workers but note that each worker should only use one thread as Datashader manages its own parallelization using Numba:

# arbitrarily choose a memory limit (8GB) to demonstrate the out of core
# processing infrastructure
from dask.distributed import Client
client = Client(memory_limit=8*1e9, n_workers=8, threads_per_worker=1)
# As Datashader uses parallel Numba for raster rendering, we need to use
# single threaded Dask workers to on each CPU to avoid contention.
client

Client

Scheduler: tcp://127.0.0.1:60253
Dashboard: http://127.0.0.1:60254/status

Cluster

Workers: 8
Cores: 8
Memory: 64.00 GB

Landsat Image Data#

To replicate this study, we first have to obtain the data from primary sources. The conventional way to obtain Landsat image data is to download it through USGS’s EarthExplorer or NASA’s Giovanni, but to facilitate the example two images have been downloaded from EarthExployer and cached.

The two images used by the original study are LT05_L1TP_042033_19881022_20161001_01_T1 and LC08_L1TP_042033_20171022_20171107_01_T1 from 1988/10/22 and 2017/10/22 respectively. These images contain Landsat Surface Reflectance Level-2 Science Product images.

Loading into xarray via `intake`#

In the next cell, we load the Landsat-5 files into a single xarray DataArray using intake. Data sources and caching parameters are specified in a catalog file. Intake is optional, since any other method of creating an xarray.DataArray object would work here as well, but it makes it simpler to work with remote datasets while caching them locally.

cat = intake.open_catalog('./catalog.yml')
landsat_5 = cat.landsat_5()
landsat_5_img = landsat_5.read_chunked()
landsat_5_img

<xarray.DataArray (band: 6, y: 7241, x: 7961)>
dask.array<concatenate, shape=(6, 7241, 7961), dtype=int16, chunksize=(1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 2.424e+05 2.424e+05 2.425e+05 ... 4.812e+05 4.812e+05
  * y        (y) float64 4.414e+06 4.414e+06 4.414e+06 ... 4.197e+06 4.197e+06
  * band     (band) int64 1 2 3 4 5 7
Attributes:
    transform:      (30.0, 0.0, 242385.0, 0.0, -30.0, 4414215.0)
    crs:            +init=epsg:32611
    res:            (30.0, 30.0)
    is_tiled:       0
    nodatavals:     (-9999.0,)
    scales:         (1.0,)
    offsets:        (0.0,)
    descriptions:   ('band 1 surface reflectance',)
    AREA_OR_POINT:  Area
    Band_1:         band 1 surface reflectance

landsat_8 = cat.landsat_8()
landsat_8_img = landsat_8.read_chunked()
landsat_8_img

<xarray.DataArray (band: 7, y: 7941, x: 7821)>
dask.array<concatenate, shape=(7, 7941, 7821), dtype=int16, chunksize=(1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 2.433e+05 2.433e+05 2.434e+05 ... 4.779e+05 4.779e+05
  * y        (y) float64 4.426e+06 4.426e+06 4.426e+06 ... 4.188e+06 4.188e+06
  * band     (band) int64 1 2 3 4 5 6 7
Attributes:
    transform:      (30.0, 0.0, 243285.0, 0.0, -30.0, 4425915.0)
    crs:            +init=epsg:32611
    res:            (30.0, 30.0)
    is_tiled:       0
    nodatavals:     (-9999.0,)
    scales:         (1.0,)
    offsets:        (0.0,)
    descriptions:   ('band 1 surface reflectance',)
    AREA_OR_POINT:  Area
    Band_1:         band 1 surface reflectance

Now let us view some metadata about this DataArray:

print("The shape of the DataArray is :", landsat_5_img.shape)
print("With attributes:\n ", '\n  '.join('%s=%s'%(k,v) for k,v in landsat_5_img.attrs.items()))

The shape of the DataArray is : (6, 7241, 7961)
With attributes:
  transform=(30.0, 0.0, 242385.0, 0.0, -30.0, 4414215.0)
  crs=+init=epsg:32611
  res=(30.0, 30.0)
  is_tiled=0
  nodatavals=(-9999.0,)
  scales=(1.0,)
  offsets=(0.0,)
  descriptions=('band 1 surface reflectance',)
  AREA_OR_POINT=Area
  Band_1=band 1 surface reflectance

We can use this EPSG value shown above under the crs key to create a cartopy coordinate reference system that we will be using later on in this notebook:

crs=ccrs.UTM(11)

Computing the NDVI (1988)#

Now let us compute the NDVI for the 1988 image. Note that we need to promote the DataArray format as returned by rasterio to an xarray DataSet. This restriction should be lifted in future (see geoviews issue 209).

landsat_5_img.data[landsat_5_img.data==-9999] = np.NaN  # Replace the -9999
ndvi5_array = (landsat_5_img[4]-landsat_5_img[3])/(landsat_5_img[4]+landsat_5_img[3])
ndvi5 = ndvi5_array.to_dataset(name='ndvi')[['x','y', 'ndvi']]
client.persist(ndvi5)

<xarray.Dataset>
Dimensions:  (x: 7961, y: 7241)
Coordinates:
  * x        (x) float64 2.424e+05 2.424e+05 2.425e+05 ... 4.812e+05 4.812e+05
  * y        (y) float64 4.414e+06 4.414e+06 4.414e+06 ... 4.197e+06 4.197e+06
Data variables:
    ndvi     (y, x) float64 dask.array<chunksize=(256, 256), meta=np.ndarray>

Computing the NDVI (2017)#

Now we can do this for the Landsat 8 files for the 2017 image:

landsat_8_img.data[landsat_8_img.data==-9999] = np.NaN  # Replace the -9999
ndvi8_array = (landsat_8_img[4]-landsat_8_img[3])/(landsat_8_img[4]+landsat_8_img[3])
ndvi8 = ndvi8_array.to_dataset(name='ndvi')[['x','y', 'ndvi']]
client.persist(ndvi8)

<xarray.Dataset>
Dimensions:  (x: 7821, y: 7941)
Coordinates:
  * x        (x) float64 2.433e+05 2.433e+05 2.434e+05 ... 4.779e+05 4.779e+05
  * y        (y) float64 4.426e+06 4.426e+06 4.426e+06 ... 4.188e+06 4.188e+06
Data variables:
    ndvi     (y, x) float64 dask.array<chunksize=(256, 256), meta=np.ndarray>

Resampling to same size#

import datashader as ds
from datashader import reductions as rd

x_range = (int(max(ndvi5.x.min(), ndvi8.x.min())), int(min(ndvi5.x.max(), ndvi8.x.max())))
y_range = (int(max(ndvi5.y.min(), ndvi8.y.min())), int(min(ndvi5.y.max(), ndvi8.y.max())))

plot_width = min(ndvi8.dims['x'], ndvi5.dims['x'])
plot_height = min(ndvi8.dims['y'], ndvi5.dims['y'])
cvs = ds.Canvas(plot_width=plot_width, plot_height=plot_height,
                x_range = x_range, y_range = y_range)

ndvi5 = cvs.raster(ndvi5, agg= rd.mean('ndvi'))
ndvi8 = cvs.raster(ndvi8, agg= rd.mean('ndvi'))

Computing statistics and projecting display#

The rest of the notebook shows how statistical operations can reduce the dimensionality of the data that may be used to compute new features that may be used as part of an ML pipeline.

The mean and sum over the two time points#

The next plot (may take a minute to compute) shows the mean of the two NDVI images next to the sum of them:

mean_avg = hmap.collapse(dimensions=['Year'], function=np.mean)
mean_img = gv.Image(mean_avg.data, crs=crs, kdims=['lon', 'lat'], 
                    vdims=['ndvi']).relabel('Mean over Year')

summed = hmap.collapse(dimensions=['Year'], function=np.sum)
summed_image = gv.Image(summed.data, crs=crs, kdims=['lon', 'lat'], 
                        vdims=['ndvi']).relabel('Sum over Year')

If you are getting a bunch of warnings, then it is possible that your data are on a different grid. We can check whether summed.data is all null. If it is, then we’ll need to regrid.

if summed.data.ndvi.isnull().all():
    res = 3500 if (plot_width < 100) and (plot_height < 100) else 100
    xsamples = (max(ndvi5.x.max(), ndvi8.x.max()) - min(ndvi5.x.min(), ndvi8.x.min())) / res
    ysamples = (max(ndvi5.y.max(), ndvi8.y.max()) - min(ndvi5.y.min(), ndvi8.y.min())) / res
    cvs = ds.Canvas(plot_width=int(xsamples), plot_height=int(ysamples), x_range = x_range, y_range = y_range)
    resampled_ndvi5 = cvs.raster(ndvi5, agg= rd.mean('ndvi'))
    resampled_ndvi8 = cvs.raster(ndvi8, agg= rd.mean('ndvi'))
    hmap = hv.HoloMap({'1988':gv.Image(resampled_ndvi5, crs=crs, vdims=['ndvi']), 
                       '2017':gv.Image(resampled_ndvi8, crs=crs, vdims=['ndvi'])}, 
                       kdims=['Year']).redim(x='lon', y='lat') # Mapping 'x' and 'y' from rasterio to 'lon' and 'lat'
    mean_avg = hmap.collapse(dimensions=['Year'], function=np.mean)
    mean_img = gv.Image(mean_avg.data, crs=crs, kdims=['lon', 'lat'], 
                        vdims=['ndvi']).relabel('Mean over Year')

    summed = hmap.collapse(dimensions=['Year'], function=np.sum)
    summed_image = gv.Image(summed.data, crs=crs, kdims=['lon', 'lat'], 
                            vdims=['ndvi']).relabel('Sum over Year')
    
rasterize(mean_img) + rasterize(summed_image)

Difference in NDVI between 1988 and 2017#

The change in Walker Lake as viewed using the NDVI can be shown by subtracting the NDVI recorded in 1988 from the NDVI recorded in 2017:

diff = np.subtract(hmap['1988'].data, hmap['2017'].data)

difference = gv.Image(diff, crs=crs, kdims=['lon', 'lat'], vdims=['ndvi'])
difference = difference.relabel('Difference in NDVI').redim(ndvi='delta_ndvi')

rasterize(difference).redim.range(delta_ndvi=(-1.0,1.0)).opts(cmap=coolwarm)

distributed.utils_perf - WARNING - full garbage collections took 10% CPU time recently (threshold: 10%)

You can see a large change (positive delta) in the areas where there is water, indicating a reduction in the size of the lake over this time period.

Slicing across `lon` and `lat`#

As a final example, we can use the sample method to slice across the difference in NDVI along (roughly) the midpoint of the latitude and the midpoint of the longitude. To do this, we define the following helper function to convert latitude/longitude into the appropriate coordinate value used by the DataSet:

def from_lon_lat(x,y):
    return crs.transform_point(x,y, ccrs.PlateCarree())

lon_y, lat_x = from_lon_lat(-118, 39) # Longitude of -118 and Latitude of 39
(difference.sample(lat=lat_x) + difference.sample(lon=lon_y)).cols(1)