Written by James A. Bednar
Created: November 12, 2018
Last updated: August 5, 2021
This notebook contains the code for an interactive dashboard for making Datashader plots from any dataset that has latitude and longitude (geographic) values. Apart from Datashader itself, the code relies on other Python packages from the HoloViz project that are each designed to make it simple to:
- lay out plots and widgets into an app or dashboard, in a notebook or for serving separately (Panel)
- build interactive Bokeh-based plots backed by Datashader, from concise declarations (HoloViews and hvPlot)
- express dependencies between parameters and code to build reactive interfaces declaratively (Param)
- describe the information needed to load and plot a dataset, in a text file (Intake)
import os, colorcet, param as pm, holoviews as hv, panel as pn, datashader as ds import intake from holoviews.element import tiles as hvts from holoviews.operation.datashader import rasterize, shade, spread from collections import OrderedDict as odict hv.extension('bokeh', logo=False)
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) /tmp/ipykernel_6132/1430983750.py in <module> ----> 1 import os, colorcet, param as pm, holoviews as hv, panel as pn, datashader as ds 2 import intake 3 from holoviews.element import tiles as hvts 4 from holoviews.operation.datashader import rasterize, shade, spread 5 from collections import OrderedDict as odict ModuleNotFoundError: No module named 'datashader'
You can run the dashboard here in the notebook with various datasets by editing the
dataset below to specify some dataset defined in
dashboard.yml. You can also launch a separate, standalone server process in a new browser tab with a command like:
DS_DATASET=nyc_taxi panel serve --show datashader_dashboard.ipynb
nyc_taxi can be replaced with any of the available datasets (
nyc_taxi_50k (tiny version),
osm-1b, or any dataset whose description you add to
catalog.yml). To launch multiple dashboards at once, you'll need to add
-p 5001 (etc.) to select a unique port number for the web page to use for communicating with the Bokeh server. Otherwise, be sure to kill the server process before launching another instance.
dataset = os.getenv("DS_DATASET", "nyc_taxi") catalog = intake.open_catalog('catalog.yml') source = getattr(catalog, dataset)
source object lets us treat data in many different formats the same in the rest of the code here. We can now build a class that captures some parameters that the user can vary along with how those parameters relate to the code needed to update the displayed plot of that data source:
plots = odict([(source.metadata['plots'][p].get('label',p),p) for p in source.plots]) fields = odict([(v.get('label',k),k) for k,v in source.metadata['fields'].items()]) aggfns = odict([(f.capitalize(),getattr(ds,f)) for f in ['count','sum','min','max','mean','var','std']]) norms = odict(Histogram_Equalization='eq_hist', Linear='linear', Log='log', Cube_root='cbrt') cmaps = odict([(n,colorcet.palette[n]) for n in ['fire', 'bgy', 'bgyw', 'bmy', 'gray', 'kbc']]) maps = ['EsriImagery', 'EsriUSATopo', 'EsriTerrain', 'StamenWatercolor', 'StamenTonerBackground'] bases = odict([(name, getattr(hvts, name)().relabel(name)) for name in maps]) gopts = hv.opts.Tiles(responsive=True, xaxis=None, yaxis=None, bgcolor='black', show_grid=False) class Explorer(pm.Parameterized): plot = pm.Selector(plots) field = pm.Selector(fields) agg_fn = pm.Selector(aggfns) normalization = pm.Selector(norms) cmap = pm.Selector(cmaps) spreading = pm.Integer(0, bounds=(0, 5)) basemap = pm.Selector(bases) data_opacity = pm.Magnitude(1.00) map_opacity = pm.Magnitude(0.75) show_labels = pm.Boolean(True) @pm.depends('plot') def elem(self): return getattr(source.plot, self.plot)() @pm.depends('field', 'agg_fn') def aggregator(self): field = None if self.field == "counts" else self.field return self.agg_fn(field) @pm.depends('map_opacity', 'basemap') def tiles(self): return self.basemap.opts(gopts).opts(alpha=self.map_opacity) @pm.depends('show_labels') def labels(self): return hvts.StamenLabels().options(level='annotation', alpha=1 if self.show_labels else 0) def viewable(self,**kwargs): rasterized = rasterize(hv.DynamicMap(self.elem), aggregator=self.aggregator, width=800, height=400) shaded = shade(rasterized, cmap=self.param.cmap, normalization=self.param.normalization) spreaded = spread(shaded, px=self.param.spreading, how="add") dataplot = spreaded.apply.opts(alpha=self.param.data_opacity, show_legend=False) return hv.DynamicMap(self.tiles) * dataplot * hv.DynamicMap(self.labels) explorer = Explorer(name="")
If we call the
.viewable method on the
explorer object we just created, we'll get a plot that displays itself in a notebook cell. Moreover, because of how we declared the dependencies between each bit of code and each parameters, the corresponding part of that plot will update whenever one of the parameters is changed on it. (Try putting
explorer.viewable() in one cell, then set some parameter like
explorer.spreading=4 in another cell.) But since what we want is the user to be able to manipulate the values using widgets, let's go ahead and create a dashboard out of this object by laying out a logo, widgets for all the parameters, and the viewable object:
logo = "https://raw.githubusercontent.com/pyviz/datashader/master/doc/_static/logo_horizontal_s.png" panel = pn.Row(pn.Column(logo, pn.Param(explorer.param, expand_button=False)), explorer.viewable()) panel.servable("Datashader Dashboard")
If you are viewing this notebook with a live Python server process running, adjusting one of the widgets above should now automatically update the plot, re-running only the code needed to update that particular item without re-running Datashader if that's not needed. It should work the same when launched as a separate server process, but without the extra text and code visible as in this notebook. Here the
.servable() method call indicates what should be served when run as a separate dashboard with a command like
panel serve --show datashader_dashboard.ipynb, or you can just copy the code above out of this notebook into a
dashboard.py file then do
panel serve --show dashboard.py.
How it works¶
You can use the code above as is, but if you want to adapt it to your own purposes, you can read on to see how it works.
The code has three main components:
source: A dataset with associated metadata managed by Intake, which allows this notebook to ignore details like:
- File formats
- File locations
- Column and field names in the data
Basically, once the
sourcehas been defined in the cell starting with
dataset, this code can treat all datasets the same, as long as their properties have been declared appropriately in the
dashboard.ymlfile. Intake objects support
.plot, which uses hvPlot to return a HoloViews and Bokeh-based plot object that is used in the later steps below.
explorer: A Parameterized object that declares:
- What parameters we want the user to be able to manipulate
- How to generate the overall plot specified by those parameters, starting from the basic hvPlot-based object and modifying it using HoloViews, GeoViews, and Datashader.
- Which bits of the code need to be run when one of the parameters changes
All of these things are declared in a general way that's not tied to any particular GUI toolkit, as long as whatever is returned by
viewable()is something that can be displayed.
panel: A Panel-based app/dashboard consisting of:
- a logo (just for pretty!)
- The user-adjustable parameters of the
- The viewable HoloViews object defined by
You can find out more about how to work with these objects at the websites linked for each one.
If you want to start working with this code for your own purposes, parts 1 and 3 should be simple to get started with. You should be able to add new datasets easily to
dashboard.yml by copying the description of the simplest dataset (e.g.
osm-1b). If you wish, you can then compare that dataset's description to the other datasets, to see how other fields and metadata can be added if you want there to be more options for users to explore a particular dataset. Similarly, you can easily add additional items to lay out in rows and columns in the
panel app; it should be trivial to add anything Panel supports (text boxes, images, other separate plots, etc.) to this layout as described at panel.holoviz.org.
Expressing parameters and dependencies¶
Part 2 (the
explorer object) is the hard part to specify, because that's where the complex relationships between the user-visible parameters and the underlying behavior are expressed.
Before we try to understand the full
explorer code above, let's consider a much simpler case. What if our dataset is so small (e.g.
nyc_taxi_50k with only 50,000 points) that it would be ok to update the plot every time any widget changed? In that case we could get away with a much simpler object we'll call
class Explorer2(pm.Parameterized): plot = pm.Selector(plots) field = pm.Selector(fields) agg_fn = pm.Selector(aggfns) normalization = pm.Selector(norms) cmap = pm.Selector(cmaps) spreading = pm.Integer(0, bounds=(0, 5)) basemap = pm.Selector(bases) data_opacity = pm.Magnitude(1.00) map_opacity = pm.Magnitude(0.75) show_labels = pm.Boolean(True) def view(self,**kwargs): field = None if self.field == "counts" else self.field rasterized = rasterize(hv.DynamicMap(getattr(source.plot, self.plot)), aggregator=self.agg_fn(field), width=800, height=400) shaded = shade(rasterized, cmap=self.cmap, normalization=self.normalization) spreaded = spread(shaded, px=self.spreading, how="add") dataplot = spreaded.opts(alpha=self.data_opacity, show_legend=False) tiles = self.basemap.opts(gopts).opts(alpha=self.map_opacity) labels = hvts.StamenLabels().options(level='annotation', alpha=1 if self.show_labels else 0) return tiles * dataplot * labels explorer2 = Explorer2(name="")
Explorer2 class declares that it respects each of the listed Parameters (
spreading, and so on), specifying the type and range for each of them (e.g.
normalization can be
linear, etc. while
spreading can be any integer in the range 0 to 5.). The
view function accesses these values and constructs a plot appropriately, which in this case is a HoloViews
Overlay of three components: (1) the underlying map
tiles (like Google Maps), (2) the datashaded
dataplot (using the aggregation (rasterization), colormapping (shading) with normalization, and spreading functionality from Datashader), and (3) overlaid geographic
labels (which also happen to be a tile-based map, but with only text).
If you were to type
explorer2.view() in a cell on its own, you would see that the resulting object is viewable outside of Panel and is already controlled by all those parameters, though without any widgets for a user to manipulate graphically.
# explorer2.show_labels = False # explorer2.view()
But since we do want widgets, we can pass in this object to Panel and we'll get all the same widgets as the original
explorer has, each updating the plot appropriately when any of the widgets changes.
# pn.Row(pn.Column(logo, explorer2.param), explorer2.view)
So if it's this easy to get a usable dashboard, why is the real
Explorer class so much more complex above, with all those methods and explicit dependency declarations? Well, if you do try running
explorer2, you should be able to see that it works but it is not very responsive, because it re-runs
view() every single time any widget changes value. That's a very general approach, but even for a 50k dataset, the plot flickers any time a widget is used. For a larger dataset there can be a very annoying lag, as the entire plot is rebuilt from scratch. Slider widgets in particular can be very confusing with a lag like that, making it difficult to explore the data. So this simple version is not the most usable, but it's still a good first pass -- it makes all the right widgets and connects them all up to control the plot that you see.
Expressing fine-grained dependencies¶
So what if we do anticipate working with larger files and still want the interface to be responsive wherever possible? The full
Explorer class shows how to ensure that only the specific bits of code that depend directly on each parameter are re-run when that widget is changed, making it highly reponsive even with large datasets. For instance, the
map_opacity slider affects only the underlying map tiles, and so in
Explorer that slider can be dragged with instantaneous updating regardless of the dataset size; the data plot stays the same while the tiles update. The
cmap widgets do need to access the data, but even they can still be very fast, because they affect only the very last step in the data processing, after aggregation but before the final display.
So, how is this fine-grained control over bits of computation achieved? First, you'll notice that
Explorer has a method named as an adjective "
viewable" rather than the imperative "
view" method of
Explorer2, we provided the
view method directly to Panel, and Panel then finds its dependencies and calls
view every time a parameter changes, generating a completely new plot. But we called
viewable() only once, with the result of the call provided to Panel. This result is a viewable (and dynamically updatable) object from HoloViews whose parts are precisely tied internally to each of the relevant parameters. (Hence the perhaps too-subtle difference in the names of those two methods; one is a command to view immediately, and the other returns a viewable object that can be kept around and viewed at will.)
To understand the
explorer.viewable() method, first consider a simpler version that doesn't display the data at all:
def viewable(self,**kwargs): return hv.DynamicMap(self.tiles) * hv.DynamicMap(self.labels)
hv.DynamicMap(callback) returns a dynamic HoloViews object that calls the provided
callback whenever the object needs updating. When given a Parameterized object's method,
hv.DynamicMap understands any dependencies that have been declared for that method. In this case, the map tiles will thus be dynamically updated whenever the
basemap parameters change, and the overlaid labels will be updated whenever the
show_labels parameter changes (because those are the relationships expressed on
def tiles(self) and
def labels(self) with the
pm.depends decorator in the declaration of
Explorer above). The
viewable() method here then returns an overlay (constructed by the
* syntax for HoloViews objects), retaining the underlying dynamic behavior of the two overlaid items.
Still following along? If not, try changing
viewable to the simpler version shown above and play around with the source code to see how those parts fit together. Once that all makes sense, then we could add in a plot of the actual data:
def viewable(self,**kwargs): return hv.DynamicMap(self.tiles) * hv.DynamicMap(self.elem) * hv.DynamicMap(self.labels)
Just as before, we use a
DynamicMap to call the
.elem() method whenever one of its parameter dependencies changes (
plot in this case). Don't actually run this version, though, unless you have a very small dataset (even the tiny
nyc_taxi_50k may be too large for some browsers). As written, this code will pass all the data on to your browser, with disastrous results for large datasets! This is where Datashader comes in; to make it safe for large data, we can instead wrap this object in some HoloViews operations that turn it into something safe to display:
def viewable(self,**kwargs): return hv.DynamicMap(self.tiles) * spread(shade(rasterize(hv.DynamicMap(self.elem)))) * hv.DynamicMap(self.labels)
This version is now runnable, with
rasterize() dynamically aggregating the data using Datashader whenever a new plot is needed,
shade() then dynamically colormapping the data into an RGB image, and
spread() dynamically spreading isolated pixels so that they become visible data points. But if you try it, you'll notice that the plot is ignoring all of the rasterization, shading, and spreading parameters we declared above, because those parameters are not declared as dependencies of the
elem method that was given to this DynamicMap.
We could of course add those parameters as dependencies to
.elem, but if we do that, then the whole set of chained operations will need to be re-run every time any one of those parameters changes. For a large dataset, re-running all those steps can take seconds or even minutes, yet some of the changes only affect the very last (and very cheap) stages of the computation, such as
So, we come to the final version of
viewable() that's used in the actual
Explorer class definition above, which creates a whole slew of chained
hv.DynamicMap objects that each dynamically respond to some of the possible user actions:
hv.DynamicMap(self.elem)returns an appropriate HoloViews element whenever the
rasterizedapplies the Datashader aggregation operation to the result of
hv.DynamicMap(self.elem)whenever that result changes or when the dependencies of the
self.aggregatormethod change (the
shadedapplies the Datashader shading operation to the result of
rasterizedwhenever that result changes or the
dataplotsets options on the result of
shadedwhenever that result changes or the
So far we have only discussed how
pm.depends() allows a Parameterized method to declare its dependencies, but there are actually currently four different ways to set up dynamic, responsive behavior, of which
Explorer.viewable() uses methods 2, 3, and 4:
- Method dependency for Panel: Decorating a Parameterized object method with
pm.depends('paramname'), and passing that method to Panel so that Panel will call the method when any of the dependencies changes (used for
Explorer2.view, but not for
Explorer.viewable). Completely general, but very coarse-grained; useful for a first pass and for simple cases.
- Method dependency for DynamicMap: Decorating a Parameterized object method with
pm.depends('paramname'), and passing that method to a HoloViews DynamicMap so that HoloViews will call the method when any of the dependencies change (used for most of the methods of
- Parameter instance-object argument: Supplying a Param
Parameterobject as an argument to a HoloViews operation or DynamicMap instead of a concrete value like an integer or float, which will cause that operation to re-run its computation when the parameter value changes (used for the
Explorer, which responds dynamically to
data_opacitybecause those parameters are supplied like
param.IntegerParameter object) rather than
px=self.spreading(which is simply equivalent to
px=0if the current value of
self.spreadingis 0)). Dependencies are inferred only when the whole Parameter object is supplied, not just the current value.
- Other HoloViews streams: Approaches 2 and 3 are based on a feature of HoloViews called streams, which support many types of dynamic behavior other than responding to widgets. For instance, the
rasterizeoperation attaches a
RangeXYstream that re-aggregates the data whenever the viewport (x or y range) changes, as a user zooms or pans a Bokeh plot. Other streams can be created manually to perform custom behavior, such as consuming streaming data sources, reacting to arbitrary plot events, and so on.
These sources of dynamic behavior make the
dataplot chain of DynamicMaps be richly interactive. The interactive
dataplot is then overlaid with
hv.DynamicMap(self.tiles) (which itself is updated when the
basemap parameters change), and with
hv.DynamicMap(self.labels) (which itself is updated when the
show_labels parameter changes). Now, each part of the plot updates only if the relevant parameters have changed, and otherwise is left as it was, avoiding flicker and providing snappy performance.
As you can see, if we want to be very careful to tie each bit of computation to the values that affect it, then we can precisely determine which code to re-run interactively, providing maximal responsiveness where possible (try dragging the opacity sliders or selecting colormaps), while re-running everything when needed (when aggregation-related parameters change). In your own Panel dashboards and apps, you can often use the simplest approach (which can be as simple as a one-line call to interact), but it is important to know that fine-grained control is available when you need it, and is still typically vastly more concise and explicitly expressed than with other dashboarding approaches.