A sub-optimal introduction to optimal predictive ability

By S. Arvanitis

The comparison between multiple forecasting models based on different statistical information and inference methods was facilitated by the formulation of the hypothesis of Superior Predictive Ability (SPA)-see for example Hansen (2005). There, given a loss function that represents the preferences of the analyst over the risk associated with forecasting error, the (empirically) superior among the competing models, is the one that minimizes (empirical) risk, i.e. the (empirically) expected loss w.r.t. the loss function at hand. 

 

Different loss functions may focus on different probabilistic properties of the underlying forecasting error distributions. The analyst could face uncertainty on the characteristics she wishes to focus, i.e. uncertainty about her preferences. In such a situation robustness (essentially conservativeness)  w.r.t. the choice of the loss function could be of interest.

 

To introduce suchlike robustness, Jin, Corradi and Swanson (2017-hereafter JCS17), generalize the SPA hypothesis by introducing a stochastic dominance relation based on a class of loss functions, and considering the hypothesis of whether a given forecast model is a maximum element. A maximum element would be a forecasting model that would be chosen over the competing ones by every loss function in the class, in terms of (empirical) risk, i.e. a globally dominant model. Such elements rarely exist especially if the class of loss functions and/or the number of competing models is non-trivial, hence this generalization could suffer from lack of discriminatory power.

 

In order to obtain some sort of robustness yet enhance discriminatory power, Arvanitis, Post, Poti and Karabati (2021-hereafter APPK21) use an alternative generalization of the hypothesis of Superior Predictive Ability, which transliterates the criterion of SD Portfolio Optimality (see for example Post (2017) and the references therein) to the forecasting background. This is labeled as Optimal Predictive Ability (OPA). The ingredients of the OPA hypothesis are similar to the JCS17; a class of loss functions that define a stochastic dominance relation on the set of competing forecasting models: model A dominates model B according to the particular loss function class if and only if A is chosen over B by every loss function in the class (Note: this is equivalent to that the difference between the expected loss of A and the one of B is non positive for every loss function in the class; the dominance relation is econometrically represented by a potentially large system of moment inequalities.) Optimality however is a property of the considered dominance relation, generally (and considerably) weaker than global dominance; a model is considered optimal if and only if (iff) it is selected over the competing models by at least one loss function in the class. Being a maximum element is generally quite stronger as it requires selection by every loss function in the class. Both concepts are stronger than (Pareto) efficiency; a model is efficient (non-dominated) iff there exists a loss function and a competing model compared to which the efficient one is selected. Efficient models could be ubiquitous, globally dominant models are generally rare or non-existent, optimal models provide an application-wise attractive compromise; APPK21 report significant refinements in the (potentially large) set of considered models in standard forecasting applications whenever the researcher discards from analysis models that are not inferred optimal.

 

The above naturally depend (and) on the choice of the class of loss functions the analyst is working with. A large class would be associated, as mentioned above, with the lack of maximum models, but very probably with a large number of efficient models, and hopefully with a moderate number of optimal models. APPK21 work with three loss function classes, each one consists of a refinement of the previous: a. the class of General Loss functions; it contains all right-continuous loss functions which achieve a minimum at zero and do not decrease as the error moves away from zero. b. the subclass of Convex Loss Functions, and c. the further subclass of Symmetric Convex Loss functions. (a.) and (b.) were initially considered by JSC17. APPK21 applied their empirical framework to the small-scale empirical study of exchange rate predictability by JCS17, and the larger study of inflation forecast models of Hansen (2005). A very large majority of thousands of inflation forecast models was found discardable for all general loss functions. Nevertheless, hundreds of inflation forecasts were

classified as optimal at conventional significance levels, even using the more restrictive Symmetric Convex Loss functions class. To avoid indecision and provide with additional practical improvements, further model set refinements could be desirable. One way to do this would be by focusing on improvements of the ordering strength by further restricting the loss functions in the last class. The spirit of higher-degree SD orders for utility functions could provide with a useful guide.

 

But what is exactly the empirical framework mentioned above? Given the latency of expected loss-due to the latency of the forecasting error distribution (as well as of the forecasting error per se, when parameters with unknown pseudo-true values appear in the relevant models), AAPK21 use an (Blockwise) Empirical Likelihood methodology in order to construct a statistical procedure that tests whether a given forecasting model is optimal; given the class of loss functions and the resulting dominance relation, as well as a time series of observable (approximate) forecasting errors for each of the competing models, a Block Empirical Likelihood Ratio statistic is considered that is obtained from piece-wise linear approximation of the empirical moment inequalities that define the empirical version of the dominance relation and bi-convex optimization. Then, via a moment selection methodology that is based on a slack augmentation of the moment inequalities involved, an asymptotically conservative rejection region is obtained via a chi-squared distribution that dominates the latent asymptotic distribution of the test statistic under the null.

 

Several improvements of the BELR procedure-associated with the mentioned above model set refinements and/or with further applicability of the methodology-could be of interest. For example, the limit theory derivations of APPK21 are involved and potentially boring. They are in any case based on an assumption framework involving: (a) stationarity and mixing properties of the predictive variables; (b) smoothness properties of functions of the unknown parameters inside the forecasting models at hand; (c) asymptotic representations for the estimators of the unknown parameters in the forecasting models; (d) asymptotic rates of several "fudge factors", like slacks, block sizes, etc. Considerations of expanding the results to broader frameworks, e.g. non or local stationarity, could enhance the applicability of the methodology. Work on optimal selection of the fudge factors could improve refinements.

 

The conservative character of the APPK21 testing procedure could imply poor power properties on the boundary of the hypothesis of optimality and meager refinements in the set of forecasting models. One way to circumvent it is to consider rejection regions based on subsampling; this would however result into further numerical complications and the introduction of additional fudge factors; the subsampling rates. An analytical approximation of the right tail of the null limiting distribution based on topological arguments-like the notion of Euler characteristic-seems a mathematically exciting path to get rid of conservativeness. If the relevant theory were generalized to include forecasting models without pseudo-true values for their parameters, it could also touch a broader class of applications that in some cases includes "asymptotically non-identifiable" mis-specified forecasting models.

 

Finally, the infusion of further information on the definition of the dominance relation at hand could be of interest. For example, such information could reflect preferences towards the choice of "statistically simpler" forecasting models; this could be achieved by augmenting the moment inequalities that define the dominance relations with penalties that reflect statistical complexity.

 

References

 

[1] Arvanitis, S., Post, T., Poti, V. and Karabati, S., 2021. Nonparametric tests for optimal predictive ability. International Journal of Forecasting, 37(2), pp.881-898. 

[2] Hansen, PR, 2005. A test for superior predictive ability, Journal of Business and Economics Statistics 23, 365–380.

[3] Jin, S., V. Corradi and N.R. Swanson, 2017. Robust Forecast Comparison, Econometric Theory 33, 1306-1351.

[4] Post, Th., 2017. Empirical Tests for Stochastic Dominance Optimality, Review of Finance 21, 793-810.