2018-11-12 (Last Updated: 2024-04-11)

Datashader Dashboard#

This notebook contains the code for an interactive dashboard for making Datashader plots from any dataset that has latitude and longitude (geographic) values. Apart from Datashader itself, the code relies on other Python packages from the HoloViz project that are each designed to make it simple to:

lay out plots and widgets into an app or dashboard, in a notebook or for serving separately (Panel)
build interactive web-based plots without writing JavaScript (Bokeh)
build interactive Bokeh-based plots backed by Datashader, from concise declarations (HoloViews and hvPlot)
express dependencies between parameters and code to build reactive interfaces declaratively (Param)
describe the information needed to load and plot a dataset, in a text file (Intake)

import os, colorcet, param as pm, holoviews as hv, panel as pn, datashader as ds
import intake
from holoviews.element import tiles as hvts
from holoviews.operation.datashader import rasterize, shade, spread
from collections import OrderedDict as odict

hv.extension('bokeh', logo=False)

/home/runner/work/examples/examples/datashader_dashboard/envs/default/lib/python3.8/site-packages/dask/dataframe/utils.py:367: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  _numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index)
/home/runner/work/examples/examples/datashader_dashboard/envs/default/lib/python3.8/site-packages/dask/dataframe/utils.py:367: FutureWarning: pandas.Float64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  _numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index)
/home/runner/work/examples/examples/datashader_dashboard/envs/default/lib/python3.8/site-packages/dask/dataframe/utils.py:367: FutureWarning: pandas.UInt64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  _numeric_index_types = (pd.Int64Index, pd.Float64Index, pd.UInt64Index)

You can run the dashboard here in the notebook with various datasets by editing the dataset below to specify some dataset defined in dashboard.yml. You can also launch a separate, standalone server process in a new browser tab with a command like:

DS_DATASET=nyc_taxi panel serve --show dashboard.ipynb

(Where nyc_taxi can be replaced with any of the available datasets (nyc_taxi, nyc_taxi_50k (tiny version), census, osm-1b, or any dataset whose description you add to catalog.yml). To launch multiple dashboards at once, you’ll need to add -p 5001 (etc.) to select a unique port number for the web page to use for communicating with the Bokeh server. Otherwise, be sure to kill the server process before launching another instance.

dataset = os.getenv("DS_DATASET", "nyc_taxi")
catalog = intake.open_catalog('catalog.yml')
source  = getattr(catalog, dataset)

The Intake source object lets us treat data in many different formats the same in the rest of the code here. We can now build a class that captures some parameters that the user can vary along with how those parameters relate to the code needed to update the displayed plot of that data source:

plots  = odict([(source.metadata['plots'][p].get('label',p),p) for p in source.plots])
fields = odict([(v.get('label',k),k) for k,v in source.metadata['fields'].items()])
aggfns = odict([(f.capitalize(),getattr(ds,f)) for f in ['count','sum','min','max','mean','var','std']])

norms  = odict(Histogram_Equalization='eq_hist', Linear='linear', Log='log', Cube_root='cbrt')
cmaps  = odict([(n,colorcet.palette[n]) for n in ['fire', 'bgy', 'bgyw', 'bmy', 'gray', 'kbc']])

maps   = ['EsriImagery', 'EsriUSATopo', 'EsriTerrain', 'StamenWatercolor', 'StamenTonerBackground']
bases  = odict([(name, getattr(hvts, name)().relabel(name)) for name in maps])
gopts  = hv.opts.Tiles(responsive=True, xaxis=None, yaxis=None, bgcolor='black', show_grid=False)

class Explorer(pm.Parameterized):
    plot          = pm.Selector(plots)
    field         = pm.Selector(fields)
    agg_fn        = pm.Selector(aggfns)
    
    normalization = pm.Selector(norms)
    cmap          = pm.Selector(cmaps)
    spreading     = pm.Integer(0, bounds=(0, 5))
    
    basemap       = pm.Selector(bases)
    data_opacity  = pm.Magnitude(1.00)
    map_opacity   = pm.Magnitude(0.75)
    show_labels   = pm.Boolean(True)

    @pm.depends('plot')
    def elem(self):
        return getattr(source.plot, self.plot)()

    @pm.depends('field', 'agg_fn')
    def aggregator(self):
        field = None if self.field == "counts" else self.field
        return self.agg_fn(field)

    @pm.depends('map_opacity', 'basemap')
    def tiles(self):
        return self.basemap.opts(gopts).opts(alpha=self.map_opacity)

    @pm.depends('show_labels')
    def labels(self):
        return hvts.StamenLabels().options(level='annotation', alpha=1 if self.show_labels else 0)

    def viewable(self,**kwargs):
        rasterized = rasterize(hv.DynamicMap(self.elem), aggregator=self.aggregator, width=800, height=400)
        shaded     = shade(rasterized, cmap=self.param.cmap, normalization=self.param.normalization)
        spreaded   = spread(shaded, px=self.param.spreading, how="add")
        dataplot   = spreaded.apply.opts(alpha=self.param.data_opacity, show_legend=False)
        
        return hv.DynamicMap(self.tiles) * dataplot * hv.DynamicMap(self.labels)
    
explorer = Explorer(name="")

If we call the .viewable method on the explorer object we just created, we’ll get a plot that displays itself in a notebook cell. Moreover, because of how we declared the dependencies between each bit of code and each parameters, the corresponding part of that plot will update whenever one of the parameters is changed on it. (Try putting explorer.viewable() in one cell, then set some parameter like explorer.spreading=4 in another cell.) But since what we want is the user to be able to manipulate the values using widgets, let’s go ahead and create a dashboard out of this object by laying out a logo, widgets for all the parameters, and the viewable object:

logo = "https://raw.githubusercontent.com/pyviz/datashader/main/doc/_static/logo_horizontal_s.png"

panel = pn.Row(pn.Column(logo, pn.Param(explorer.param, expand_button=False)), explorer.viewable())
panel.servable("Datashader Dashboard")

If you are viewing this notebook with a live Python server process running, adjusting one of the widgets above should now automatically update the plot, re-running only the code needed to update that particular item without re-running Datashader if that’s not needed. It should work the same when launched as a separate server process, but without the extra text and code visible as in this notebook. Here the .servable() method call indicates what should be served when run as a separate dashboard with a command like panel serve --show dashboard.ipynb, or you can just copy the code above out of this notebook into a dashboard.py file then do panel serve --show dashboard.py.

How it works#

You can use the code above as is, but if you want to adapt it to your own purposes, you can read on to see how it works.

Overview#

The code has three main components:

source: A dataset with associated metadata managed by Intake, which allows this notebook to ignore details like:
- File formats
- File locations
- Column and field names in the data
  
  Basically, once the source has been defined in the cell starting with dataset, this code can treat all datasets the same, as long as their properties have been declared appropriately in the dashboard.yml file. Intake objects support .plot, which uses hvPlot to return a HoloViews and Bokeh-based plot object that is used in the later steps below.
explorer: A Parameterized object that declares:
- What parameters we want the user to be able to manipulate
- How to generate the overall plot specified by those parameters, starting from the basic hvPlot-based object and modifying it using HoloViews, GeoViews, and Datashader.
- Which bits of the code need to be run when one of the parameters changes
  
  All of these things are declared in a general way that’s not tied to any particular GUI toolkit, as long as whatever is returned by viewable() is something that can be displayed.
panel: A Panel-based app/dashboard consisting of:
- a logo (just for pretty!)
- The user-adjustable parameters of the explorer object.
- The viewable HoloViews object defined by explorer.viewable.

You can find out more about how to work with these objects at the websites linked for each one.

If you want to start working with this code for your own purposes, parts 1 and 3 should be simple to get started with. You should be able to add new datasets easily to dashboard.yml by copying the description of the simplest dataset (e.g. osm-1b). If you wish, you can then compare that dataset’s description to the other datasets, to see how other fields and metadata can be added if you want there to be more options for users to explore a particular dataset. Similarly, you can easily add additional items to lay out in rows and columns in the panel app; it should be trivial to add anything Panel supports (text boxes, images, other separate plots, etc.) to this layout as described at panel.holoviz.org.

Expressing parameters and dependencies#

Part 2 (the explorer object) is the hard part to specify, because that’s where the complex relationships between the user-visible parameters and the underlying behavior are expressed.

Before we try to understand the full explorer code above, let’s consider a much simpler case. What if our dataset is so small (e.g. nyc_taxi_50k with only 50,000 points) that it would be ok to update the plot every time any widget changed? In that case we could get away with a much simpler object we’ll call explorer2:

class Explorer2(pm.Parameterized):
    plot          = pm.Selector(plots)
    field         = pm.Selector(fields)
    agg_fn        = pm.Selector(aggfns)
    
    normalization = pm.Selector(norms)
    cmap          = pm.Selector(cmaps)
    spreading     = pm.Integer(0, bounds=(0, 5))
    
    basemap       = pm.Selector(bases)
    data_opacity  = pm.Magnitude(1.00)
    map_opacity   = pm.Magnitude(0.75)
    show_labels   = pm.Boolean(True)

    def view(self,**kwargs):
        field      = None if self.field == "counts" else self.field
        rasterized = rasterize(hv.DynamicMap(getattr(source.plot, self.plot)), 
                               aggregator=self.agg_fn(field), width=800, height=400)
        shaded     = shade(rasterized, cmap=self.cmap, normalization=self.normalization)
        spreaded   = spread(shaded, px=self.spreading, how="add")
        dataplot   = spreaded.opts(alpha=self.data_opacity, show_legend=False)
        
        tiles      = self.basemap.opts(gopts).opts(alpha=self.map_opacity)
        labels     = hvts.StamenLabels().options(level='annotation', alpha=1 if self.show_labels else 0)
        return tiles * dataplot * labels
    
explorer2 = Explorer2(name="")

This Explorer2 class declares that it respects each of the listed Parameters (plot, normalization, spreading, and so on), specifying the type and range for each of them (e.g. normalization can be eq_hist, linear, etc. while spreading can be any integer in the range 0 to 5.). The view function accesses these values and constructs a plot appropriately, which in this case is a HoloViews Overlay of three components: (1) the underlying map tiles (like Google Maps), (2) the datashaded dataplot (using the aggregation (rasterization), colormapping (shading) with normalization, and spreading functionality from Datashader), and (3) overlaid geographic labels (which also happen to be a tile-based map, but with only text).

If you were to type explorer2.view() in a cell on its own, you would see that the resulting object is viewable outside of Panel and is already controlled by all those parameters, though without any widgets for a user to manipulate graphically.

# explorer2.show_labels = False
# explorer2.view()

But since we do want widgets, we can pass in this object to Panel and we’ll get all the same widgets as the original explorer has, each updating the plot appropriately when any of the widgets changes.

# pn.Row(pn.Column(logo, explorer2.param), explorer2.view)

So if it’s this easy to get a usable dashboard, why is the real Explorer class so much more complex above, with all those methods and explicit dependency declarations? Well, if you do try running explorer2, you should be able to see that it works but it is not very responsive, because it re-runs view() every single time any widget changes value. That’s a very general approach, but even for a 50k dataset, the plot flickers any time a widget is used. For a larger dataset there can be a very annoying lag, as the entire plot is rebuilt from scratch. Slider widgets in particular can be very confusing with a lag like that, making it difficult to explore the data. So this simple version is not the most usable, but it’s still a good first pass – it makes all the right widgets and connects them all up to control the plot that you see.

Expressing fine-grained dependencies#

So what if we do anticipate working with larger files and still want the interface to be responsive wherever possible? The full Explorer class shows how to ensure that only the specific bits of code that depend directly on each parameter are re-run when that widget is changed, making it highly reponsive even with large datasets. For instance, the map_opacity slider affects only the underlying map tiles, and so in Explorer that slider can be dragged with instantaneous updating regardless of the dataset size; the data plot stays the same while the tiles update. The spreading and cmap widgets do need to access the data, but even they can still be very fast, because they affect only the very last step in the data processing, after aggregation but before the final display.

So, how is this fine-grained control over bits of computation achieved? First, you’ll notice that Explorer has a method named as an adjective “viewable” rather than the imperative “view” method of Explorer2. For Explorer2, we provided the view method directly to Panel, and Panel then finds its dependencies and calls view every time a parameter changes, generating a completely new plot. But we called viewable() only once, with the result of the call provided to Panel. This result is a viewable (and dynamically updatable) object from HoloViews whose parts are precisely tied internally to each of the relevant parameters. (Hence the perhaps too-subtle difference in the names of those two methods; one is a command to view immediately, and the other returns a viewable object that can be kept around and viewed at will.)

To understand the explorer.viewable() method, first consider a simpler version that doesn’t display the data at all:

def viewable(self,**kwargs):
    return hv.DynamicMap(self.tiles) * hv.DynamicMap(self.labels)

Here, hv.DynamicMap(callback) returns a dynamic HoloViews object that calls the provided callback whenever the object needs updating. When given a Parameterized object’s method, hv.DynamicMap understands any dependencies that have been declared for that method. In this case, the map tiles will thus be dynamically updated whenever the map_opacity or basemap parameters change, and the overlaid labels will be updated whenever the show_labels parameter changes (because those are the relationships expressed on def tiles(self) and def labels(self) with the pm.depends decorator in the declaration of Explorer above). The viewable() method here then returns an overlay (constructed by the * syntax for HoloViews objects), retaining the underlying dynamic behavior of the two overlaid items.

Still following along? If not, try changing viewable to the simpler version shown above and play around with the source code to see how those parts fit together. Once that all makes sense, then we could add in a plot of the actual data:

def viewable(self,**kwargs):
    return hv.DynamicMap(self.tiles) * hv.DynamicMap(self.elem) * hv.DynamicMap(self.labels)

Just as before, we use a DynamicMap to call the .elem() method whenever one of its parameter dependencies changes (plot in this case). Don’t actually run this version, though, unless you have a very small dataset (even the tiny nyc_taxi_50k may be too large for some browsers). As written, this code will pass all the data on to your browser, with disastrous results for large datasets! This is where Datashader comes in; to make it safe for large data, we can instead wrap this object in some HoloViews operations that turn it into something safe to display:

def viewable(self,**kwargs):
    return hv.DynamicMap(self.tiles) * spread(shade(rasterize(hv.DynamicMap(self.elem)))) * hv.DynamicMap(self.labels)

This version is now runnable, with rasterize() dynamically aggregating the data using Datashader whenever a new plot is needed, shade() then dynamically colormapping the data into an RGB image, and spread() dynamically spreading isolated pixels so that they become visible data points. But if you try it, you’ll notice that the plot is ignoring all of the rasterization, shading, and spreading parameters we declared above, because those parameters are not declared as dependencies of the elem method that was given to this DynamicMap.

We could of course add those parameters as dependencies to .elem, but if we do that, then the whole set of chained operations will need to be re-run every time any one of those parameters changes. For a large dataset, re-running all those steps can take seconds or even minutes, yet some of the changes only affect the very last (and very cheap) stages of the computation, such as spread or shade.

So, we come to the final version of viewable() that’s used in the actual Explorer class definition above, which creates a whole slew of chained hv.DynamicMap objects that each dynamically respond to some of the possible user actions:

hv.DynamicMap(self.elem) returns an appropriate HoloViews element whenever the plot parameter changes
rasterized applies the Datashader aggregation operation to the result of hv.DynamicMap(self.elem) whenever that result changes or when the dependencies of the self.aggregator method change (the field and agg_fn parameters)
shaded applies the Datashader shading operation to the result of rasterized whenever that result changes or the cmap and normalization parameters change.
dataplot sets options on the result of shaded whenever that result changes or the data_opacity and show_legend parameters change.

So far we have only discussed how pm.depends() allows a Parameterized method to declare its dependencies, but there are actually currently four different ways to set up dynamic, responsive behavior, of which Explorer.viewable() uses methods 2, 3, and 4:

Method dependency for Panel: Decorating a Parameterized object method with pm.depends('paramname'), and passing that method to Panel so that Panel will call the method when any of the dependencies changes (used for Explorer2.view, but not for Explorer.viewable). Completely general, but very coarse-grained; useful for a first pass and for simple cases.
Method dependency for DynamicMap: Decorating a Parameterized object method with pm.depends('paramname'), and passing that method to a HoloViews DynamicMap so that HoloViews will call the method when any of the dependencies change (used for most of the methods of Explorer).
Parameter instance-object argument: Supplying a Param Parameter object as an argument to a HoloViews operation or DynamicMap instead of a concrete value like an integer or float, which will cause that operation to re-run its computation when the parameter value changes (used for the dataplot object in Explorer, which responds dynamically to cmap, normalization, spreading, and data_opacity because those parameters are supplied like px=self.param.spreading (the param.Integer Parameter object) rather than px=self.spreading (which is simply equivalent to px=0 if the current value of self.spreading is 0)). Dependencies are inferred only when the whole Parameter object is supplied, not just the current value.
Other HoloViews streams: Approaches 2 and 3 are based on a feature of HoloViews called streams, which support many types of dynamic behavior other than responding to widgets. For instance, the rasterize operation attaches a RangeXY stream that re-aggregates the data whenever the viewport (x or y range) changes, as a user zooms or pans a Bokeh plot. Other streams can be created manually to perform custom behavior, such as consuming streaming data sources, reacting to arbitrary plot events, and so on.

These sources of dynamic behavior make the dataplot chain of DynamicMaps be richly interactive. The interactive dataplot is then overlaid with hv.DynamicMap(self.tiles) (which itself is updated when the map_opacity and basemap parameters change), and with hv.DynamicMap(self.labels) (which itself is updated when the show_labels parameter changes). Now, each part of the plot updates only if the relevant parameters have changed, and otherwise is left as it was, avoiding flicker and providing snappy performance.

As you can see, if we want to be very careful to tie each bit of computation to the values that affect it, then we can precisely determine which code to re-run interactively, providing maximal responsiveness where possible (try dragging the opacity sliders or selecting colormaps), while re-running everything when needed (when aggregation-related parameters change). In your own Panel dashboards and apps, you can often use the simplest approach (which can be as simple as a one-line call to interact), but it is important to know that fine-grained control is available when you need it, and is still typically vastly more concise and explicitly expressed than with other dashboarding approaches.