Share this Article & Support our Mission Alpha for Impact
This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.

Don't trust a backtest that you haven't faked yourself.

Post by 
Text Link
Many trading strategies based on market anomalies do not (or no longer) work in real life. This may be because they have disappeared through arbitrage. However, it is at least as likely that it was a false discovery from the start.

The perfect sniper

How hard is it to hit the exact center of a target drawn on a wall with a long-range shot? It might be about as difficult as finding a trading strategy that produces above-average returns in the markets.
But you can also make it easier on yourself with the target by shooting into the wall first and then drawing the mark around the shot. This may sound ridiculous, but it is regularly practiced in capital market research to produce bull's-eyes. US economics professor Gary Smith aptly describes this approach as the "Texas Sharpshooter Fallacy." [1]

Something extremely improbable is not improbable at all if it has already happened.

The example is of course exaggerated, but nevertheless makes an important point: examining common characteristics of companies that have already been selected as successful in advance is not particularly meaningful. According to Smith, a scientific method should be used instead:

  • Selecting in advance the characteristics to be studied and logically justifying why they predict subsequent success.
  • Selecting in advance companies that have these characteristics and those that do not.
  • Analysis of success in the years to come based on pre-determined criteria

False discoveries

If you search for successful trading strategies in this way, you will find that there are very few of them. The reason for this is the strong competition on the markets, which leads to a high degree of efficiency. If a profitable strategy is discovered, it usually only works for a limited time. At the same time, the danger of making false discoveries is very high. This is because, unlike in the natural sciences, statistical findings in the markets can hardly be verified by controlled experiments.
One example is alternative asset classes such as wine or art, which have allegedly performed better than the stock market. If one looks more closely, the data often comes from asset managers who are involved in these assets, which represents a clear conflict of interest. The following chart from Factor Research shows which data sources one should be cautious about when assessing reliability. [2]

Figure 1) Reliable backtest or marketing?
Source: Factor Research [2]

At the same time, many investors are not aware that high standards are necessary to really trust a particular approach. So, they often buy what looks good - under the assumption that the developers already know what they are doing.
In reality, however, the financial industry is dominated by over-optimised backtests that sooner or later lead to bitter disappointments in their implementation. But how can it be that such an unprofessional approach has become common practice?

The problem with backtests

One explanation is the way back-testing of investment strategies is done. Developers usually use historical market data, which are computer-assisted and analysed for a multitude of criteria, weightings and combinations. From this, an optimal design can be determined and a potential return can be indicated that can be expected based on the simulations.

As the paper "Finance Is Not Excused" describes, this results in over-optimised back-calculations that are not meaningful for the future. [3] The reason for this is that far too many variants are tried out in relation to the amount of data available, and thus random patterns are (unconsciously) taken into account as relevant. The result: seemingly good strategies disappoint in practical implementation, the most "honest" of all out-of-sample tests.

Figure 2) Example of data mining
Source: Huang, S. / Song, Y. / Xiang, H. (2020), The Smart Beta Mirage, p. 5. [4]



The chart above shows two smart beta indices relative to the market as an example. Both reflect the same factor - the grey curve in the original variant created in 1997 and the red curve in the variant improved in 2014. While the original variant did not outperform, the new version looked much better in retrospect. Then an ETF was launched, which of course tracked the new index. But its better performance probably only occurred in the backtest. The fund underperformed right from the start. [4]

The high computing capacity of modern computers has further exacerbated the problem. Today, millions or billions of parameter combinations can easily be examined. And if the developers then find "significant" statistical patterns, it is not difficult to draw a suitable explanation around them - similar to what the clever sniper did at the beginning. Unfortunately, according to the study, this also seems to be the rule rather than the exception in the development of investment strategies.

Over-optimising backtests in the development of investment strategies is probably the rule, not the exception.
Bailey, D. H. / de Prado, M. L. (2021), Finance Is Not Excused: Why Finance Should Not Flout Basic Principles of Statistics.

What interests are being pursued?

The whole thing draws even wider circles and also includes academic research. Here, dubious investment strategies are sometimes presented without a corresponding consideration of multiple tests. This cannot just be a matter of negligence. Instead, there are clear interests behind it on the part of both the journals and the researchers. This is pointed out by Canadian economics professor Campbell Harvey in the paper "Be Skeptical of Asset Management Research." [5]
Academic journals compete by measuring how often articles are cited by others. Studies with positive results, in which the hypothesis under investigation is confirmed, perform far better than studies without a clear result. At the same time, researchers need to have a certain number of publications in order to be hired or promoted - and know that they should deliver positive results for this, as these are in demand by journals as described. If you put one and one together, this situation results in a strong incentive to conjure them up somehow.

Subtle manipulation

The methods with which statistically significant results can be deliberately achieved are not only diverse, but also hardly comprehensible or verifiable in detail from the outside. In technical jargon, this is called "p-hacking". Some examples of this, according to Harvey, are:

  • Examination of a large number of variables, from which only the best are then selected for the study.
  • Transforming variables (logarithmising or volatility scaling) to achieve a better fit
  • Selection of certain data periods to maximise the significance level
  • Exclusion of certain extreme phases (global financial crisis or corona crash) for a higher significance of the results
  • Choosing the "better" method, for example weighted least squares instead of normal regression

The danger for p-hacking is even greater in the academic world than in practical applications, says Harvey. This is because there it is mainly about the positive incentives associated with publications, whereas in capital markets practice it is about real money.

The incentive problem, together with the misapplication of statistical methods, leads to the unfortunate conclusion that probably half of the empirical research results in finance are wrong. (Campbell Harvey)

For example, performance fees are to be earned, where the provider also profits from a good performance after the products have been launched. Therefore, many professionals are aware of the high risk of over-optimised backtests and choose moderate, realistic variants. Moreover, providers want to preserve their reputation, so they cannot be indifferent to performance even apart from fees.

Conclusion

One thing should be clear after this article: always be skeptical! You can never be 100 per cent sure whether the creator of a good-looking backtest has actually avoided all the pitfalls and adequately taken into account possible biasing factors. This is true not only with commercial providers, but especially in academia, where the pressure to publish is breathing down scientists' necks.

[1] Smith, G. (2018), Opinion: Calling a Company 'Great' Doesn't Make it a Good Stock, https://www.marketwatch.com/story/why-great-companies-dont-always-make-good-stocks-2018-06-13
[2] Factor Research (2022), Why Are All Illiquid Alts Outperforming?, https://insights.factorresearch.com/research-why-are-all-illiquid-alts-outperforming
[3] Bailey, D. H. / de Prado, M. L. (2021), Finance Is Not Excused: Why Finance Should Not Flout Basic Principles of Statistics.
[4] Huang, S. / Song, Y. / Xiang, H. (2020), The Smart Beta Mirage
[5] Harvey, C. R. (2021), Be Skeptical of Asset Management Research

Would you like to use this article - in full or in part - for your purposes? Then please consider the following Creative Commons-Lizenz.