30 days of backtesting

Guest Post by Nils-Bertil Wallin

It is a pleasure to host other data science and quantitative trading enthusiasts here at Prognostikon. Please read the very intertesting first post on backtesting by Nils-Bertil Wallin and visit his blog Option Stock Machines to keep track with this 30-days of backtesting education and more ideas for your strategies. Check the link below and keep tracking Nils' work - comments will follow.

Day 1. We begin our 30 days of backtesting by first establishing a baseline. Typically, you’d compare a single-asset strategy like the one we plan to use to buy-and-hold, but that assumes you bought on day one of the test, which is a bit unrealistic. As Marcos Lopéz de Prado warns of the randomness in backtests, there is also randomness when a benchmark starts. We’ll use buy-and-hold as a baseline, but also look at portfolios composed of the SPY and IEF (bond) ETFS. We show the 60/40 and 50/50 portfolios for the SPY and IEF with and without rebalancing. Here is the link.

Day 2. We examine the Hello World of quant strategies: the 200-day moving average and contrast it with the Buy-and-Hold benchmark. We submit the 200SMA as a separate benchmark because we believe very few investors actually buy-and-hold. Hence, a simple, rules-based strategy should offer a more realistic comparison. We then compare the 200SMA with Buy-and-Hold using different weights and rebalancing schemes. Our summary findings show that the 200SMA underperforms on a cumulative,  but outperforms Buy-and-Hold on a risk-adjusted basis. Here is the link.

Day 3. In Day 3 of 30 days of backtesting we catalog different performance metrics like cumulative returns, Sharpe ratios, and max drawdowns. The goal is to find a good balance between useful, real-world insights and in-depth analysis. We also reduce the number of benchmarks we plan to use going forward to buy-and-hold, the 200-day moving average strategy, and a 60-40 portfolio rebalanced at quarter end. We finish by graphing our chosen metrics in a tearsheet format, which is completely reproducible from the code provided. We'll examine the metrics in the next post. Here is the link.

Day 4. In Day 4, we resist the urge to jump into backtesting and focused on building a solid foundation instead. Using the 200-day simple moving average (200SMA) as our strategy benchmarked against a 60/40 SPY/IEF allocation, we find that while the strategy kept one out of the market in 2022, it faltered significantly in 2020 and underperformed overall. The Sharpe Ratio is decent, but the rolling information ratio is persistently negative. These results underscore the challenges of outperforming a simple buy-and-hold benchmark but also offer worthwhile comparisons to rules-based investing. Next up: crafting our trading hypothesis. Here is the link.

Day 5. We start backtesting today, but not with a stew of moving averages and namesake indicators. Instead, we use the famous Fama-French Factors as a launchpad to develop a hypothesis on what factors might predict market returns. Our cursory analysis suggests Momentum and Profitability stand out as promising. We opt to forego Profitability -- too error prone, in our view -- in favor of Momentum. In our next post, we’ll develop a hypothesis and start to test how Momentum might predict forward returns.  Here is the link.

Day 6. In Day 6 of our series, we explore momentum for predictive power. While the market risk premium proved significant in earlier analysis, we now examine how momentum. Originally highlighted by Jegadeesh and Titman and later added by Carhart to the Fama-French model, we look to this factor as a predictor of superior returns. We run 16 different combinations of weekly lookback and forward periods of 3, 6, 9, and 12 weeks, excluding data post-2019 to avoid snooping. Tomorrow, we’ll analyze these initial findings in greater detail. Spoiler alert: we find modest reversion in more than half of the models.  Here is the link.

Day 7. In Day 7 of our 30 days-of-backtesting series, we examine the results of our lookback-look-forward momentum combinations in greater detail. We discuss size effects, as represented by the coefficient on the lookback variable, and find that about 75% of values are negative, suggesting models modestly better at finding reversals rather than forecasting trend continuation. Only the 12-by-12-week lookback and look forward period exhibits a positive effect, an observation we might use when building a trading strategy. Most of the size effects are not statistically significant apart from the 12-by-12 and two others mentioned in the post. Our next post will explore baseline effects.  Here is the link.

Day 8. In Day 8, we delve into baseline effects using the the regression models we ran on the 16 different lookback and look forward momentum periods. This should prepare us to analyze "alpha" in the future. Analyzing weekly data from 2000 to 2018, we observe that baseline effects, while small (averaging under 1%), are relatively stable across the same look forward period regardless of the lookback period used. Such consistency amid different market regimes could be the result of an upwardly trending market in the period of analysis. But would warrant a more detailed investigation that is out of the scope of these blog posts. Tomorrow, we transition into the next stage: forecasting.  Here is the link.

Day 9. In today’s post, we discuss our reasoning behind the long lead up to forecasting forward returns. That is, we want to establish a testable market thesis that has a basis in logic rather than p-hacking indicators. We then apply walk-forward analysis using the 12-by-12 lookback/look forward momentum model to forecast returns. We train the model on 13 weeks and then predict the subsequent 12-week look forward momentum using the next week in the time series. We then repeat this process for our entire data set. We begin our analysis with the canonical graph of actual vs. predicted values, which we'll delve into in more detail in our next post. Here is the link.

Day 10. In our walk-forward analysis of a 12-week lookback/look forward model, we assess residuals to gauge model performance. The actual vs. predicted scatterplot suggests limited bias predicted values exceed actuals about  53% of the time. However, when we plot residuals against predictions, we notice that error variance increases for extreme predictions, especially during market stress, such as the Global Financial Crisis. While the model seems to perform well in -10% to 10% return range, the residual analysis calls for deeper inspection. Our next post will examine residual autocorrelation. Here is the link.

Day 11. We delve into in the 12-by-12 model's residuals further, finding significant autocorrelation at lags 1-7.  Rather than diving down a rabbit hole to identify time dependencies or try different models, we shift to iterating various train/forecast split combinations to find a model with the lowest forecast error. Using root mean-squared error to judge performance, we observe that the errors seem to valley in the 5-by-1 to 13-by-4 range.  We'll look at this approach in more detail in our next post. Here is the link.

Day 12. In today's post, we extend our analysis by iterating 320 different combinations of training and forecasting windows across the 16 momentum models we've built in our preceding updates. Assessing performance with the root mean-squared error (RMSE), we discover that the lowest error models typically use a 12-week forecast and a 5-week training period. Shorter lookback periods seem linked  to higher RMSEs, except for 3-by-12 lookback/look forward model with 5 training steps and one forecast step. This warrants further analysis, but could be due to noise. We'll these results to generate trading signals in our upcoming posts. Here is the link.