2019-11-12 (Last Updated: 2024-04-11)

Portfolio Optimization#

Portfolio Optimization is used for risk-averse investors to construct portfolios to optimize or maximize expected return based on a given level of market risk, emphasizing that risk is an inherent part of higher reward

This notebook:

Runs an example Monte Carlo Simulation for an optimal portfolio with resulting returns
Creates an Efficient Frontier which is used to identify a set of optimal portfolios that offers the highest expected return for a defined level of risk or the lowest risk for a given level of expected return

Monte Carlo Simulation for Optimization Search#

Monte Carlo simulations are used by analyst to determine the expected value and optimal distribution of a portfolio.

import numpy as np
import pandas as pd
import hvplot.pandas  # noqa

stocks = pd.read_csv('./data/stocks.csv', index_col='Date', parse_dates=True)

stocks.head()

	aapl	cisco	ibm	amzn
Date
2012-01-03	53.063218	15.752778	160.830881	179.03
2012-01-04	53.348386	16.057180	160.174781	177.51
2012-01-05	53.940658	15.997991	159.415086	177.61
2012-01-06	54.504543	15.938801	157.584912	182.61
2012-01-09	54.418089	16.040268	156.764786	178.56

mean_daily_ret = stocks.pct_change(1).mean()
mean_daily_ret

aapl     0.000750
cisco    0.000599
ibm      0.000081
amzn     0.001328
dtype: float64

stocks.pct_change(1).corr()

	aapl	cisco	ibm	amzn
aapl	1.000000	0.301990	0.297498	0.235487
cisco	0.301990	1.000000	0.424672	0.284470
ibm	0.297498	0.424672	1.000000	0.258492
amzn	0.235487	0.284470	0.258492	1.000000

Simulating Thousands of Possible Allocations#

stocks.head()

	aapl	cisco	ibm	amzn
Date
2012-01-03	53.063218	15.752778	160.830881	179.03
2012-01-04	53.348386	16.057180	160.174781	177.51
2012-01-05	53.940658	15.997991	159.415086	177.61
2012-01-06	54.504543	15.938801	157.584912	182.61
2012-01-09	54.418089	16.040268	156.764786	178.56

stock_normed = stocks/stocks.iloc[0]
stock_normed.hvplot()

stock_daily_ret = stocks.pct_change(1)
stock_daily_ret.head()

	aapl	cisco	ibm	amzn
Date
2012-01-03	NaN	NaN	NaN	NaN
2012-01-04	0.005374	0.019324	-0.004079	-0.008490
2012-01-05	0.011102	-0.003686	-0.004743	0.000563
2012-01-06	0.010454	-0.003700	-0.011481	0.028152
2012-01-09	-0.001586	0.006366	-0.005204	-0.022178

Log Returns vs Arithmetic Returns#

We will now switch over to using log returns instead of arithmetic returns, for many of our use cases they are almost the same,but most technical analyses require detrending/normalizing the time series and using log returns is a nice way to do that. Log returns are convenient to work with in many of the algorithms we will encounter.

For a full analysis of why we use log returns, check this great article.

log_ret = np.log(stocks/stocks.shift(1))
log_ret.head()

	aapl	cisco	ibm	amzn
Date
2012-01-03	NaN	NaN	NaN	NaN
2012-01-04	0.005360	0.019139	-0.004088	-0.008526
2012-01-05	0.011041	-0.003693	-0.004754	0.000563
2012-01-06	0.010400	-0.003707	-0.011547	0.027763
2012-01-09	-0.001587	0.006346	-0.005218	-0.022428

log_ret.hvplot.hist(bins=100, subplots=True, width=400, group_label='Ticker', grid=True).cols(2)

log_ret.describe().transpose()

	count	mean	std	min	25%	50%	75%	max
aapl	1257.0	0.000614	0.016466	-0.131875	-0.007358	0.000455	0.009724	0.085022
cisco	1257.0	0.000497	0.014279	-0.116091	-0.006240	0.000213	0.007634	0.118862
ibm	1257.0	0.000011	0.011819	-0.086419	-0.005873	0.000049	0.006477	0.049130
amzn	1257.0	0.001139	0.019362	-0.116503	-0.008534	0.000563	0.011407	0.146225

log_ret.mean() * 252

aapl     0.154803
cisco    0.125291
ibm      0.002788
amzn     0.287153
dtype: float64

# Compute pairwise covariance of columns
log_ret.cov()

	aapl	cisco	ibm	amzn
aapl	0.000271	0.000071	0.000057	0.000075
cisco	0.000071	0.000204	0.000072	0.000079
ibm	0.000057	0.000072	0.000140	0.000059
amzn	0.000075	0.000079	0.000059	0.000375

log_ret.cov()*252 # multiply by days

	aapl	cisco	ibm	amzn
aapl	0.068326	0.017854	0.014464	0.018986
cisco	0.017854	0.051381	0.018029	0.019956
ibm	0.014464	0.018029	0.035203	0.014939
amzn	0.018986	0.019956	0.014939	0.094470

Single Run for Some Random Allocation#

# Set seed (optional)
np.random.seed(101)

# Stock Columns
print('Stocks')
print(stocks.columns)
print('\n')

# Create Random Weights
print('Creating Random Weights')
weights = np.array(np.random.random(4))
print(weights)
print('\n')

# Rebalance Weights
print('Rebalance to sum to 1.0')
weights = weights / np.sum(weights)
print(weights)
print('\n')

# Expected Return
print('Expected Portfolio Return')
exp_ret = np.sum(log_ret.mean() * weights) *252
print(exp_ret)
print('\n')

# Expected Variance
print('Expected Volatility')
exp_vol = np.sqrt(np.dot(weights.T, np.dot(log_ret.cov() * 252, weights)))
print(exp_vol)
print('\n')

# Sharpe Ratio
SR = exp_ret/exp_vol
print('Sharpe Ratio')
print(SR)

Stocks
Index(['aapl', 'cisco', 'ibm', 'amzn'], dtype='object')

Creating Random Weights
[0.51639863 0.57066759 0.02847423 0.17152166]

Rebalance to sum to 1.0
[0.40122278 0.44338777 0.02212343 0.13326603]

Expected Portfolio Return
0.15599272049632015

Expected Volatility
0.18502649565909485

Sharpe Ratio
0.843083148392604

Great! Now we can just run this many times over!

num_ports = 15000

all_weights = np.zeros((num_ports,len(stocks.columns)))
ret_arr = np.zeros(num_ports)
vol_arr = np.zeros(num_ports)
sharpe_arr = np.zeros(num_ports)

for ind in range(num_ports):

    # Create Random Weights
    weights = np.array(np.random.random(4))

    # Rebalance Weights
    weights = weights / np.sum(weights)
    
    # Save Weights
    all_weights[ind,:] = weights

    # Expected Return
    ret_arr[ind] = np.sum((log_ret.mean() * weights) *252)

    # Expected Variance
    vol_arr[ind] = np.sqrt(np.dot(weights.T, np.dot(log_ret.cov() * 252, weights)))

    # Sharpe Ratio
    sharpe_arr[ind] = ret_arr[ind]/vol_arr[ind]

sharpe_arr.max()

1.0303260551271078

sharpe_arr.argmax()

all_weights[1419,:]

array([0.26188068, 0.20759516, 0.00110226, 0.5294219 ])

max_sr_ret = ret_arr[1419]
max_sr_vol = vol_arr[1419]

Plotting the data#

import holoviews as hv

scatter = hv.Scatter((vol_arr, ret_arr, sharpe_arr), 'Volatility', ['Return', 'Sharpe Ratio'])
max_sharpe = hv.Scatter([(max_sr_vol,max_sr_ret)])

scatter.opts(color='Sharpe Ratio', cmap='plasma', width=600, height=400, colorbar=True, padding=0.1) *\
max_sharpe.opts(color='red', line_color='black', size=10)

Mathematical Optimization#

There are much better ways to find good allocation weights than just guess and check! We can use optimization functions to find the ideal weights mathematically!

Functionalize Return and SR operations#

def get_ret_vol_sr(weights):
    """
    Takes in weights, returns array or return,volatility, sharpe ratio
    """
    weights = np.array(weights)
    ret = np.sum(log_ret.mean() * weights) * 252
    vol = np.sqrt(np.dot(weights.T, np.dot(log_ret.cov() * 252, weights)))
    sr = ret/vol
    return np.array([ret,vol,sr])

from scipy.optimize import minimize

To fully understand all the parameters, check out the scipy.optimize.minimize documentation.

#help(minimize)

Optimization works as a minimization function, since we actually want to maximize the Sharpe Ratio, we will need to turn it negative so we can minimize the negative sharpe (same as maximizing the postive sharpe)

def neg_sharpe(weights):
    return  get_ret_vol_sr(weights)[2] * -1

# Contraints
def check_sum(weights):
    '''
    Returns 0 if sum of weights is 1.0
    '''
    return np.sum(weights) - 1

# By convention of minimize function it should be a function that returns zero for conditions
cons = ({'type':'eq','fun': check_sum})

# 0-1 bounds for each weight
bounds = ((0, 1), (0, 1), (0, 1), (0, 1))

# Initial Guess (equal distribution)
init_guess = [0.25,0.25,0.25,0.25]

# Sequential Least SQuares Programming (SLSQP).
opt_results = minimize(neg_sharpe,init_guess,method='SLSQP',bounds=bounds,constraints=cons)

opt_results

 message: Optimization terminated successfully
 success: True
  status: 0
     fun: -1.0307168703355527
       x: [ 2.663e-01  2.042e-01  1.098e-17  5.295e-01]
     nit: 7
     jac: [ 5.643e-05  4.183e-05  3.399e-01 -4.449e-05]
    nfev: 35
    njev: 7

opt_results.x

array([2.66289782e-01, 2.04189807e-01, 1.09775470e-17, 5.29520411e-01])

get_ret_vol_sr(opt_results.x)

array([0.21885916, 0.21233683, 1.03071687])

All Optimal Portfolios (Efficient Frontier)#

The efficient frontier is the set of optimal portfolios that offers the highest expected return for a defined level of risk or the lowest risk for a given level of expected return. Portfolios that lie below the efficient frontier are sub-optimal, because they do not provide enough return for the level of risk. Portfolios that cluster to the right of the efficient frontier are also sub-optimal, because they have a higher level of risk for the defined rate of return.

# Our returns go from 0 to somewhere along 0.3
# Create a linspace number of points to calculate x on
frontier_y = np.linspace(0,0.3,100) # Change 100 to a lower number for slower computers!

def minimize_volatility(weights):
    return  get_ret_vol_sr(weights)[1] 

frontier_volatility = []

for possible_return in frontier_y:
    # function for return
    cons = ({'type':'eq','fun': check_sum},
            {'type':'eq','fun': lambda w: get_ret_vol_sr(w)[0] - possible_return})
    
    result = minimize(minimize_volatility,init_guess,method='SLSQP',bounds=bounds,constraints=cons)
    
    frontier_volatility.append(result['fun'])

scatter * hv.Curve((frontier_volatility, frontier_y)).opts(color='green', line_dash='dashed')