Predict S&P 500 using machine learning

Forecasting the S&P 500 is an attractive pursuit of many financial market participants as it is undoubtedly the largest and most developed equity market in the world. The S&P 500 value is derived from taking the sum of the largest 500 capitalized companies in the US and divided with an index divisor. It is used as a litmus stick for risk sentiment for many around the world. We employ machine learning to work this.

Factors in play

We collect daily feature data from 1990 to 2018. Features include stock indices for FTSE 100, Nikkei 225 and Shanghai Composite. We also include macro variables such as GBPUSD, USDCNY, USDJPY, Gold and Crude Oil which are highly observed variables for traders. Next, we will include momentum indicators of S&P 500 to containerize its auto-correlation properties.

For the independent variable, we use the daily percentage change rather than the price index itself as the change in price is more useful than the price level itself. It tells us to stay long or short for any particular day. If we had used a fair value number, we will have to make the additional decisions on whether to enter the trade via the deviation from the threshold.

Our machine learning algorithm will iterate between multiple preprocessing methods and decide what to use. For the dependent variable, we use a binary classifier for S&P 500 directionality, +1 for a positive daily change and -1 for a negative daily change.

Data alignment adjustment is crucial as different instruments have different closing times. The 2 Asian markets — Nikkei 225 and Shanghai Composite close before the opening of the US markets, hence we use the date alignment as per our US stock index prices. In other words, we see how the earlier Asian session performed to decide how we would trade the US session after. For the rest of the features, we use the previous day’s value in the modelling.

Data Visualisation

For starters, we start by examining the trendiness of S&P 500 daily returns:

Autocorrelation plot. Forecasting the daily direction of S&P 500 using ensemble machine learning methods
Autocorrelation plot on SPX 500 daily returns. Forecasting the daily direction of S&P 500 using ensemble machine learning methods.

At lag 1 and 2, we see statistically significant mean reversion properties, even at the 99% level of significance. In other words, when S&P 500 closes higher today, it is more likely to come down tomorrow then continuing going higher.

 

Bar chart of correlation between S&P 500 and macro/technical variables. Forecasting the daily direction of S&P 500 using ensemble machine learning methods.
Correlation of SPX 500 features against feature varaibles. Forecasting the daily direction of S&P 500 using ensemble machine learning methods.

We then look at S&P 500 correlation against feature variables. We see the most distinct correlation against NKY. Reason being that Nikkei 225 is the key developed market US traders give most reference to account for risk sentiment over the Asian session.

Scatter plot of macro and technical variables. Forecasting the daily direction of S&P 500 using ensemble machine learning methods.
Scatter plots of features percentage change against SPX 500. Forecasting the daily direction of S&P 500 using ensemble machine learning methods.

 

Binarised S&P changes against macro and technical variables. Forecasting the daily direction of S&P 500 using ensemble machine learning methods.
How the same set of data would look like after binarization. Forecasting the daily direction of S&P 500 using ensemble machine learning methods.

Through the scatter plots, we can better see the correlations in the data. We perform data cleaning to exclude rows with missing data points that could be attributed to the difference between the difference in trading calendars of instruments. Outliers greater than 5 SD were also removed from the data to prevent skews in the results. Some of these outliers were observed in the CNY data and stock index value.

For algorithmic elegance, we used an automated machine learning package called auto-sklearn from the Python family. Using a fixed computational budget, it produces the best predictions for a provided dataset. The automated model made the decisions of deciding which classification algorithm should be employed, what kind of data preprocessing should be done and what hyper-parameters to be used. Total training time for this model took about 1 hour.

In producing effective results, ensembles often outperform individual models, especially in the cases where individual models are individually strong and they make uncorrelated errors. With our 18 years of daily data set, we different 75% as training and the remaining 25% as test data.

After learning and running through the test data, our model was able to make out-of-sample predictions with 55.78% accuracy.

Our final model was an ensemble method of the following individual components, weighted to 1 : 0.36 * stochastic_gradient_descent + 0.24 * stochastic_gradient_descent +0.18* gaussian_naivebayes+0.14 * passive_aggresive + 0.04 * linear_discriminant_analysis + 0.02 * sgd + 0.02* linear_discriminant_analysis. Exact hyperparameters used will be provided at the end of the article.

Machine learning prediction accuracy for SPX 500. Forecasting the daily direction of S&P 500 using ensemble machine learning methods.
Visually examine the positive and negative errors. Forecasting the daily direction of S&P 500 using ensemble machine learning methods.

Conclusion

We used a multitude of stock indexes, macro asset prices and technical indicators to predict future S&P movements. We also used an automated machine learning technique which the machine taught itself how to fine-tune its parameters. A predictive accuracy of 55.78% was attained in the out-of-sample testing. Such statistical edge can be interpreted by portfolio managers as an overlay to their positions to bring about greater profits.

Weights and hyperparameters used in the final ensemble
[(0.36000000000000004,
SimpleClassificationPipeline({'preprocessor:extra_trees_preproc_for_classification:max_depth': 'None', 'classifier:__choice__': 'sgd', 'preprocessor:extra_trees_preproc_for_classification:bootstrap': 'True', 'classifier:sgd:fit_intercept': 'True', 'categorical_encoding:__choice__': 'no_encoding', 'preprocessor:extra_trees_preproc_for_classification:min_samples_leaf': 1, 'classifier:sgd:loss': 'perceptron', 'classifier:sgd:penalty': 'elasticnet', 'preprocessor:extra_trees_preproc_for_classification:max_features': 0.7272215836101141, 'preprocessor:extra_trees_preproc_for_classification:n_estimators': 100, 'preprocessor:extra_trees_preproc_for_classification:min_weight_fraction_leaf': 0.0, 'imputation:strategy': 'most_frequent', 'preprocessor:extra_trees_preproc_for_classification:min_samples_split': 10, 'preprocessor:extra_trees_preproc_for_classification:min_impurity_decrease': 0.0, 'balancing:strategy': 'none', 'classifier:sgd:tol': 1.0509136658813787e-05, 'preprocessor:extra_trees_preproc_for_classification:criterion': 'entropy', 'classifier:sgd:eta0': 0.016340521198734054, 'preprocessor:__choice__': 'extra_trees_preproc_for_classification', 'classifier:sgd:alpha': 0.0011051939453437334, 'rescaling:__choice__': 'minmax', 'classifier:sgd:learning_rate': 'constant', 'preprocessor:extra_trees_preproc_for_classification:max_leaf_nodes': 'None', 'classifier:sgd:average': 'False', 'classifier:sgd:l1_ratio': 0.0001808802162346077},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.24000000000000002,
SimpleClassificationPipeline({'rescaling:quantile_transformer:output_distribution': 'normal', 'classifier:sgd:penalty': 'elasticnet', 'classifier:__choice__': 'sgd', 'balancing:strategy': 'none', 'imputation:strategy': 'median', 'rescaling:quantile_transformer:n_quantiles': 31569, 'classifier:sgd:fit_intercept': 'True', 'categorical_encoding:__choice__': 'no_encoding', 'classifier:sgd:eta0': 0.01902420330886627, 'preprocessor:__choice__': 'select_percentile_classification', 'classifier:sgd:tol': 0.00015092867817487565, 'classifier:sgd:alpha': 0.0015530751878415228, 'preprocessor:select_percentile_classification:score_func': 'mutual_info', 'rescaling:__choice__': 'quantile_transformer', 'classifier:sgd:learning_rate': 'constant', 'preprocessor:select_percentile_classification:percentile': 80.09400148727232, 'classifier:sgd:loss': 'perceptron', 'classifier:sgd:average': 'True', 'classifier:sgd:l1_ratio': 0.08599797547972958},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.18000000000000002,
SimpleClassificationPipeline({'preprocessor:__choice__': 'kernel_pca', 'rescaling:robust_scaler:q_max': 0.9826214080633513, 'rescaling:robust_scaler:q_min': 0.12185671565664284, 'preprocessor:kernel_pca:n_components': 1890, 'classifier:__choice__': 'gaussian_nb', 'balancing:strategy': 'weighting', 'preprocessor:kernel_pca:kernel': 'cosine', 'rescaling:__choice__': 'robust_scaler', 'imputation:strategy': 'mean', 'categorical_encoding:__choice__': 'one_hot_encoding', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'False'},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.14000000000000004,
SimpleClassificationPipeline({'classifier:passive_aggressive:loss': 'squared_hinge', 'classifier:passive_aggressive:C': 0.019936142191500958, 'classifier:passive_aggressive:average': True, 'classifier:__choice__': 'passive_aggressive', 'balancing:strategy': 'weighting', 'classifier:passive_aggressive:tol': 0.09947971183745015, 'preprocessor:fast_ica:fun': 'exp', 'categorical_encoding:__choice__': 'no_encoding', 'preprocessor:__choice__': 'fast_ica', 'preprocessor:fast_ica:whiten': 'True', 'classifier:passive_aggressive:fit_intercept': 'True', 'rescaling:__choice__': 'normalize', 'imputation:strategy': 'most_frequent', 'preprocessor:fast_ica:n_components': 787, 'preprocessor:fast_ica:algorithm': 'deflation'},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.04000000000000001,
SimpleClassificationPipeline({'classifier:lda:n_components': 228, 'preprocessor:select_rates:alpha': 0.4086724846658236, 'classifier:lda:shrinkage': 'None', 'categorical_encoding:one_hot_encoding:minimum_fraction': 0.010000000000000004, 'classifier:__choice__': 'lda', 'balancing:strategy': 'none', 'preprocessor:select_rates:score_func': 'f_classif', 'categorical_encoding:__choice__': 'one_hot_encoding', 'categorical_encoding:one_hot_encoding:use_minimum_fraction': 'True', 'preprocessor:__choice__': 'select_rates', 'rescaling:__choice__': 'minmax', 'preprocessor:select_rates:mode': 'fpr', 'imputation:strategy': 'mean', 'classifier:lda:tol': 0.0007094671192004056},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.020000000000000004,
SimpleClassificationPipeline({'rescaling:quantile_transformer:output_distribution': 'normal', 'classifier:sgd:penalty': 'l2', 'classifier:__choice__': 'sgd', 'balancing:strategy': 'none', 'imputation:strategy': 'mean', 'rescaling:quantile_transformer:n_quantiles': 57176, 'classifier:sgd:fit_intercept': 'True', 'categorical_encoding:__choice__': 'no_encoding', 'classifier:sgd:eta0': 0.09660393962402068, 'preprocessor:__choice__': 'fast_ica', 'classifier:sgd:tol': 0.002064399699541523, 'classifier:sgd:alpha': 9.173248006514544e-05, 'preprocessor:fast_ica:whiten': 'True', 'preprocessor:fast_ica:fun': 'exp', 'rescaling:__choice__': 'quantile_transformer', 'classifier:sgd:learning_rate': 'optimal', 'classifier:sgd:loss': 'log', 'classifier:sgd:average': 'True', 'preprocessor:fast_ica:n_components': 1215, 'preprocessor:fast_ica:algorithm': 'parallel'},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False})),
(0.020000000000000004,
SimpleClassificationPipeline({'preprocessor:__choice__': 'select_percentile_classification', 'classifier:lda:n_components': 78, 'preprocessor:select_percentile_classification:score_func': 'mutual_info', 'classifier:lda:shrinkage': 'auto', 'classifier:__choice__': 'lda', 'balancing:strategy': 'none', 'preprocessor:select_percentile_classification:percentile': 18.82176104412942, 'rescaling:__choice__': 'standardize', 'imputation:strategy': 'most_frequent', 'categorical_encoding:__choice__': 'no_encoding', 'classifier:lda:tol': 8.390369963558585e-05},
dataset_properties={
'multilabel': False,
'sparse': False,
'multiclass': False,
'task': 1,
'target_type': 'classification',
'signed': False}))]