Iex stocks

View a running version of this notebook. | Download this project.


IEX Trading

Written by Jean-Luc Stevens
Created: November 20, 2019
Last updated: July 30, 2021

IEX Stocks

In the previous notebook we saw how all the trades on the IEX stock exchange could be interactively visualized over the course of a whole day (Monday 21st of October 2019). Using datashader, all the trades are rasterized interactively to reveal the density of trades via a colormap.

When viewing a million trades at once for a whole day, it is extremely difficult to identify individual trades using a global view. In order to identify particular trades, it is necessary to zoom into a time window small enough that individual trades can be distinguished at which point the trade metadata can be inspected using the Bokeh hover tool.

What the global visualization helps reveal is the overall pattern of trades. In this notebook we will focus on interactively revealing the trading patterns for individual stocks by partitioning on a set of stock symbols selected with a widget.

Loading the data

First we will load the data as before, converting the integer timestamps into the correctly offset datetimes before counting the total number of events:

In [1]:
import datetime
import pandas as pd
df = pd.read_csv('./data/IEX_2019-10-21.csv')
df.timestamp = df.timestamp.astype('datetime64[ns]')
df.timestamp -= datetime.timedelta(hours=4)
print('Dataframe loaded containing %d events' % len(df))
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_6187/3511001470.py in <module>
      1 import datetime
      2 import pandas as pd
----> 3 df = pd.read_csv('./data/IEX_2019-10-21.csv')
      4 df.timestamp = df.timestamp.astype('datetime64[ns]')
      5 df.timestamp -= datetime.timedelta(hours=4)

/usr/share/miniconda3/envs/test-environment/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

/usr/share/miniconda3/envs/test-environment/lib/python3.7/site-packages/pandas/io/parsers/readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    584     kwds.update(kwds_defaults)
    585 
--> 586     return _read(filepath_or_buffer, kwds)
    587 
    588 

/usr/share/miniconda3/envs/test-environment/lib/python3.7/site-packages/pandas/io/parsers/readers.py in _read(filepath_or_buffer, kwds)
    480 
    481     # Create the parser.
--> 482     parser = TextFileReader(filepath_or_buffer, **kwds)
    483 
    484     if chunksize or iterator:

/usr/share/miniconda3/envs/test-environment/lib/python3.7/site-packages/pandas/io/parsers/readers.py in __init__(self, f, engine, **kwds)
    809             self.options["has_index_names"] = kwds["has_index_names"]
    810 
--> 811         self._engine = self._make_engine(self.engine)
    812 
    813     def close(self):

/usr/share/miniconda3/envs/test-environment/lib/python3.7/site-packages/pandas/io/parsers/readers.py in _make_engine(self, engine)
   1038             )
   1039         # error: Too many arguments for "ParserBase"
-> 1040         return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
   1041 
   1042     def _failover_to_python(self):

/usr/share/miniconda3/envs/test-environment/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py in __init__(self, src, **kwds)
     49 
     50         # open handles
---> 51         self._open_handles(src, kwds)
     52         assert self.handles is not None
     53 

/usr/share/miniconda3/envs/test-environment/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py in _open_handles(self, src, kwds)
    227             memory_map=kwds.get("memory_map", False),
    228             storage_options=kwds.get("storage_options", None),
--> 229             errors=kwds.get("encoding_errors", "strict"),
    230         )
    231 

/usr/share/miniconda3/envs/test-environment/lib/python3.7/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    705                 encoding=ioargs.encoding,
    706                 errors=errors,
--> 707                 newline="",
    708             )
    709         else:

FileNotFoundError: [Errno 2] No such file or directory: './data/IEX_2019-10-21.csv'

Next we will identify the top ten most traded stocks on this day and compute how much of the trading volume (i.e summed over the `size column) that they account for:

In [ ]:
symbol_volumes = df.groupby('symbol')['size'].sum()
top_symbol_volumes = symbol_volumes.sort_values()[-10:]
top_symbols = list(top_symbol_volumes.index)
top_symbol_info = (', '.join(top_symbols),
                   top_symbol_volumes.sum() /symbol_volumes.sum()  * 100)
'The top ten symbols are %s and account for %.2f%% of trading volume' % top_symbol_info

The following dictionary below show the names of the companies that each of these ten symbols correspond to:

In [ ]:
symbol_info = {
    "PInterest":'PINS',
    'Chesapeake Energy Corporation': 'CHK',
    "Snap Inc": 'SNAP',
    "NIO Inc": 'NIO',
    "McDermott International": 'MDR',
    "Teva Pharmaceutical Industries": 'TEVA',
    "Hewlett Packard Enterprise":'HPE',
    "Bank of America": 'BAC',
    "GE": 'General Electric',
    "Infosys":'INFY',
    }

Before we can visualize each of these stocks individually, we will need to import the necessary libraries:

In [ ]:
import holoviews as hv
from bokeh.models import HoverTool
import datashader as ds

from holoviews.operation.datashader import spikes_aggregate
hv.config.image_rtol = 10e-3 # Suppresses datetime issue at high zoom level
hv.extension('bokeh')

Visualizing trade volume by stock

As in the previous notebook, we will create a Spikes element containing our entire dataset. Instead of immediately rasterizing it, we will be selecting individual stocks from it and rasterizing those components individually.

Note: If you display the spikes object at this time, it will probably freeze or crash your browser tab!

In [ ]:
spikes = hv.Spikes(df, ['timestamp'], ['symbol', 'size', 'price'])

Visualizing two stocks at once

In order to understand to build a fairly complex, interactive visualization, it is useful to build a simple prototype to identify the necessary concepts and decide whether it will satisfy our goals. In this section, we will prototype a fixed view that will let us directly compare the trading patterns for the top two stocks (PINS and CHK).

We start by defining some options called raster_opts used to customize the rasterized output of the visualize_symbol_raster function. We will use the responsive=True option to make our rasters fill the screen:

In [ ]:
raster_opts = hv.opts.Image(min_height=400, responsive=True,
                            colorbar=True, cmap='blues', xrotation=90,
                            default_tools=['xwheel_zoom', 'xpan', 'xbox_zoom'])

def visualize_symbol_raster(symbol, offset):
    selection = spikes.select(symbol=symbol)
    return spikes_aggregate(selection, offset=offset,
                            aggregator=ds.sum('size'),
                            spike_length=1).opts(raster_opts)

In the visualize_symbol_raster function, the .select method on our spikes object is used to select only the spikes that match the symbol specified in the argument. This function also take an integer offset argument that offsets the rasterized Image vertically by one unit (the spikes are unit length as specified with spike_length=1).

One other difference from the previous notebook is that now a datashader aggregator over the 'size' column is used in order to visualize the trade volume as opposed to the trade count.

Now we can use this function with our two chosen stock symbols (PINS and CHK) to create an overlay. Lastly, we want to use the y-axis to label these stocks so we use a custom yticks option and set the ylabel.

In [ ]:
overlay = visualize_symbol_raster('PINS', 0) * visualize_symbol_raster('CHK', 1)
overlay.opts(yticks=[(0.5, 'PINS'), (1.5, 'CHK')], ylabel='Stock Symbol')

We now want to generalize this example in the following ways:

  1. We wish to choose from any of the top ten stocks with a widget.
  2. We want to reveal the stock metadata with the Bokeh hover tool in the same way as the previous notebook.

The next section will demonstrate one way this can be done.

Visualizing the top stocks interactively

Our prototype is generalized in three steps:

  1. First the hover behavior is reintroduced per symbol.
  2. Next the process of overlaying the visualizations for the different symbols is generalized.
  3. Finally panel is used to add an interactive widget to select from the top ten stocks.

Adding interactive hover for metadata

To enable the desired hover behavior, we shall create a custom Bokeh hover tool that formats the 'Symbol', 'Price' and 'Timestamp' columns of the dataframe nicely. In addition, a simple RangeX stream is declared:

In [ ]:
hover = HoverTool(tooltips=[
    ('Symbol', '@symbol'),
    ('Size', '@size'),
    ('Price', '@price'),
    ('Timestamp', '@timestamp{%F %H:%M %Ss %3Nms}')],
                  formatters={'timestamp': 'datetime'})

range_stream = hv.streams.RangeX()

Note that in this instance, the source of the RangeX stream is not defined upon construction: we will be setting it dynamically later. The xrange_filter function, however, is the same as the previous notebook (with the number of allowed hover spikes lowered to 200):

In [ ]:
def xrange_filter(spikes, x_range):
    low, high = (None, None) if x_range is None else x_range
    ranged = spikes[pd.to_datetime(low):pd.to_datetime(high)]
    return (ranged if len(ranged) < 200 else ranged.iloc[:0]).opts(spike_length=1, alpha=0)

The next function, visualize_symbol builds on the approach used in visualize_symbol_raster above by overlaying each raster with the appropriate x-range filtered (invisible) Spikes in order to enable hovering.

This is done using the same approach as the previous notebook where we use the apply method on the spikes to apply the xrange_filter function. Note that as the symbol argument changes, the Spikes object returned by the select method also changes. This is why we need to set the source on our stream dynamically.

In addition, to keep everything consistent, we want to use our single range_stream everywhere, including in the DynamicMap returned by spikes_aggregate. This is done by passing range_stream explicitly in the streams argument. This approach of using a single RangeX and setting the source ensures that you can zoom in and then select a different set of stocks to be displayed without resetting the zoom level.

Lastly, we need to pass expand=False to prevent datashader from filling the whole y-range (with NaN colors where there is no data) for each raster generated.

In [ ]:
def visualize_symbol(symbol, offset):
    selection = spikes.select(symbol=symbol)
    range_stream.source = selection
    rasterized = spikes_aggregate(selection, streams=[range_stream],
                                  offset=offset, expand=False,
                                  aggregator=ds.sum('size'),
                                  spike_length=1).opts(raster_opts)
    filtered = selection.apply(xrange_filter, streams=[range_stream])
    return rasterized * filtered.opts(tools=[hover], position=offset)

This visualize_symbol function simply adds hover behavior to visualize_symbol_raster: you can now use the former to visualize the PINS and CHK stocks in exactly the same way as was demonstrated above.

Building a dynamic overlay of stocks

The following overlay_symbols function is a trivial generalization of the prototype that overlays an arbitrary list of stocks according to their symbols. Each DynamicMap returned by visualize_symbol is collected into an Overlay and the corresponding yticks plot option is dynamically generated.

The only new concept is the call to .collate() which is necessary to convert an Overlay container of DynamicMaps into a DynamicMap of Overlays as required by the supported nesting hierarchy.

In [ ]:
def overlay_symbols(symbols):
    items = []
    for offset, symbol in enumerate(symbols):
        items.append(visualize_symbol(symbol, offset))
    yticks = [(i+0.5,sym) for i,sym in enumerate(symbols)]
    return hv.Overlay(items).collate().opts(
        yticks=yticks, yaxis=True).redim(y='Stock Symbol')

The prototype example could now be replicated (with hover) by calling overlay_symbols(['PINS', 'CHK']).

Adding a selector widget with panel

Using the panel library we can easily declare a cross-selector widget specified with the symbol_info dictionary, with our two top stocks selected by default:

In [ ]:
import panel as pn
cross_selector = pn.widgets.CrossSelector(options=symbol_info,
                                          sizing_mode='stretch_width',
                                          value=['PINS','CHK'])

Now we can wrap our overlay_symbols function in visualize_symbols and decorate it with @pn.depends before displaying both the widgets and visualization in a panel Column:

In [ ]:
@pn.depends(cross_selector)
def visualize_symbols(symbols):
    return overlay_symbols(symbols)

stock_selector = pn.Column(cross_selector, visualize_symbols)

We now have a handle called stock_selector on a visualization that allows you to zoom in to any time during the day and view the metadata for the selected stocks (once sufficiently zoomed in).

As a final step, we can build a small dashboard by adding the IEX logo and a short Markdown description to stock_selector and declaring it servable().

In [ ]:
dashboard_info = ('This dashboard allows exploration of the top ten stocks by volume '
                  'on the [IEX exchange](https://iextrading.com/) on Monday 21st '
                  'of October 2019. To view the metadata of individual trades, '
                  'enable the Bokeh hover tool and zoom in until you can '
                  'view individual trades.')
pn.Column(
    pn.pane.SVG('./assets/IEX_Group_Logo.svg', width=100),
    dashboard_info, stock_selector).servable()

This dashboard can now be served using panel serve IEX_stock.ipynb.

Conclusion

In this notebook, we have seen how a trade-by-trade stock explorer for arbitrarily large datasets can be built up incrementally using three of the HoloViz tools, namely by using the HoloViews Datashader API and Panel for the widgets.

This web page was generated from a Jupyter notebook and not all interactivity will work on this website. Right click to download and run locally for full Python-backed interactivity.

View a running version of this notebook. | Download this project.