2017-11-03 (Last Updated: 2024-04-11)

OpenSky#

OpenSky flight trajectories#

Flight path information for commercial flights is available for some regions of the USA and Europe from the crowd-sourced OpenSky Network. OpenSky collects data from a large number of users monitoring public air-traffic control information. Here we will use a subset of the data that was polled from their REST API at an interval of 1 minute over 4 days (September 5-13, 2016), using the collect_data.py and prepare_data.py. In general the terms of use for OpenSky data do not allow redistribution, but we have obtained specific permission for distributing the subset of the data used in this project, which is a 200MB Parquet file (1.1GB as the original database). If you want more or different data, you can run the scripts yourself, or else you can contact OpenSky asking for a copy of the dataset.

NOTE: This dataset is also explorable through the Datashader example dashboard. From inside the examples directory, run: DS_DATASET=opensky panel serve --show opensky.ipynb

We’ll only use some of the fields provided by OpenSky, out of: icao24, callsign, origin, time_position, time_velocity, longitude, latitude, altitude, on_ground, velocity, heading, vertical_rate, sensors, timestamp

Here, we’ll load the data and declare that some fields are categorical (which isn’t information fully expressed in the Parquet file):

%%time
import pandas as pd

flightpaths = pd.read_parquet('./data/opensky.parq')
flightpaths['origin']    = flightpaths.origin.astype('category')
flightpaths['ascending'] = flightpaths.ascending.astype('category')
flightpaths.tail()

CPU times: user 1.12 s, sys: 224 ms, total: 1.34 s
Wall time: 1.34 s

	longitude	latitude	ascending	velocity
10227905	-8.845280e+06	4.553381e+06	True	262.14
10227906	-8.862735e+06	4.540946e+06	False	183.28
10227907	-8.876594e+06	4.530873e+06	False	258.15
10227908	-8.894316e+06	4.521176e+06	True	234.24
10227909	NaN	NaN	False	0.00

The default database has about 10 million points, with some metadata for each.

Now let’s define a datashader-based processing pipeline to render images:

import datashader as ds
import datashader.transfer_functions as tf
from colorcet import fire

import numpy as np

plot_width  = 850
plot_height = 600
x_range = (-2.0e6, 2.5e6)
y_range = (4.1e6, 7.8e6)

We can use this function to get a dump of all of the trajectory information:

cvs = ds.Canvas(plot_width, plot_height, x_range, y_range)

%%time
agg = cvs.line(flightpaths, 'longitude', 'latitude',  ds.count())

CPU times: user 1.75 s, sys: 27.4 ms, total: 1.78 s
Wall time: 1.78 s

tf.set_background(tf.shade(agg, cmap=fire), 'black')

This plot shows all of the trajectories in this database, overlaid in a way that avoids overplotting. With this “fire” color map, a single trajectory shows up as black, while increasing levels of overlap show up as brighter colors.

A static image on its own like this is difficult to interpret, but if we overlay it on a map we can see where these flights originate, and can zoom in to see detail in specific regions:

import holoviews as hv
from holoviews import opts
from holoviews.operation import datashader as hd
hv.extension('bokeh', 'matplotlib')

opts.defaults(
    opts.RGB(width=850, height=600, xaxis=None, yaxis=None))

tile_url ='http://server.arcgisonline.com/ArcGIS/rest/services/World_Street_Map/MapServer/tile/{Z}/{Y}/{X}.png'
datashaded = hd.datashade(hv.Path(flightpaths, ['longitude', 'latitude']), 
                          x_range=x_range, y_range=y_range, aggregator=ds.count())
hv.Tiles(tile_url) * datashaded

E.g. try zooming in on London in the above figure, which has a lot of structure not visible in the initial rendering but visible on a zoom. Note that zooming in will only reveal more detail in the datashader plot if you are working with a live server; a static HTML view (e.g. on Anaconda Cloud) will dynamically update the underlying map plot, but not the data.

We can use the metadata associated with each trajectory to show additional information. For instance, we can color each flight by its country of origin, using the key:

UK - Orange
Germany - Blue
Netherland - Teal
Switzerland - Yellow
France - Purple
Norway - Green
USA - Red

(There are actually more than a hundred different origins, so this key is only approximate.)

def categorical_color_key(aggregator, cmap):
    """Generate a color key from the given colormap with the appropriate number of colors for flightpaths"""
    from matplotlib.colors import rgb2hex
    from matplotlib.cm import get_cmap
    ncats = len(flightpaths[aggregator.column].unique())
    return [str(rgb2hex(get_cmap(cmap)(i))) for i in np.linspace(0, 1, ncats)]

aggregator = ds.count_cat('origin')
datashaded = hd.datashade(hv.Path(flightpaths, ['longitude', 'latitude']), 
                          x_range=x_range, y_range=y_range, aggregator=aggregator, 
                          color_key=categorical_color_key(aggregator, 'hsv_r'))
hv.Tiles(tile_url) * datashaded

Or we can label ascending (Blue) vs. descending flights (Red), which is particularly informative when zooming in on specific airports:

datashaded = hd.datashade(hv.Path(flightpaths, ['longitude', 'latitude']), 
                          x_range=x_range, y_range=y_range, aggregator=ds.count_cat('ascending'))
hv.Tiles(tile_url) * datashaded

Or we can show velocity, which of course decreases (dark colors) when approaching or leaving airports:

datashaded = hd.datashade(hv.Path(flightpaths, ['longitude', 'latitude']), 
                          x_range=x_range, y_range=y_range, aggregator=ds.mean('velocity'), cmap=fire[::-1])
hv.Tiles(tile_url) * datashaded

The flight patterns associated with each airport are clearly visible in these close-ups of various cities, where the circular holding pattern for landings (red) is clearly visible for the various airports in London:

hv.output(backend='matplotlib')

opts.defaults(
    opts.RGB(xaxis=None, yaxis=None, bgcolor='black', axiswise=True),
    opts.Layout(hspace=0.1, vspace=0, sublabel_format=None, framewise=True))

from datashader.utils import lnglat_to_meters

cities = {'Frankfurt' : (8.6821, 50.1109),
          'London'    : (-0.1278, 51.5074), 
          'Paris'     : (2.3522, 48.8566),
          'Amsterdam' : (4.8952, 52.3702),
          'Zurich'    : (8.5417, 47.3769),
          'Munich'    : (11.5820, 48.1351)}

radius = 150000
mercator_cities = {city: lnglat_to_meters(lon, lat) for city, (lon, lat) in cities.items()}
city_ranges = {city: dict(x_range=(lon-radius, lon+radius), y_range=(lat-radius, lat+radius))
               for city, (lon, lat) in mercator_cities.items()}

aggregator = ds.count_cat('origin')
hv.Layout([hd.datashade(hv.Path(flightpaths, ['longitude', 'latitude']), 
                        aggregator=aggregator, 
                        color_key=categorical_color_key(aggregator, 'hsv_r'),
                        dynamic=False, **ranges).relabel(city)
           for city, ranges in sorted(city_ranges.items())]).cols(3)

The patterns for a single city can make a nice wallpaper for your desktop if you wish:

hd.datashade(hv.Path(flightpaths, ['longitude', 'latitude']), 
                        aggregator=aggregator, 
                        color_key=categorical_color_key(aggregator, 'hsv_r'),
                        dynamic=False, **city_ranges["Zurich"]).opts(fig_size=400, bgcolor=None)

As you can see, datashader makes it quite easy to explore even large databases of trajectory information, without trial and error parameter setting and experimentation. These examples have millions of datapoints, but it could work with billions just as easily, covering long time ranges or large geographic areas. Check out the other datashader notebooks for other examples!

This data was obtained by running a cron job with the collect_data.py script running at one-minute intervals over a four-day period. Then the data was transformed into the Parquet format using the prepare_data.py script.