Twinkle,twinkle little STAR

At the recent R/Finance 2014 conference in Chicago I gave a talk on Smooth Transition AR models and a new package for estimating them called twinkle. In this blog post I will provide a short outline of the models and an introduction to the package and its features.

Financial markets have a strong cyclical component linked to the business cycle, and may undergo temporary or permanent shifts related to both endogenous and exogenous shocks. As a result, the distribution of returns is likely to be different depending on which state of the world we are in. It is therefore advantageous to model such dynamics within a framework which can accommodate and explain these different states. Within time series econometrics, the switching between states has been based either on unobserved components, giving rise to the Markov Switching (MS) models introduced by Hamilton (1989), or observed components leading to Threshold type Autoregressive (TAR) models popularized by Tong (1980). This post and the accompanying package it introduces considers only observed component switching, though there are a number of good packages in R for Markov Switching.

The Smooth Transition AR Model

The smooth transition AR model was introduced in Teräsvirta (1994), and has been used in numerous studies from GDP modelling to forex and equity market volatility. See the presentation from the talk above which includes a number of selected references.

The s-state model considered in the twinkle package takes the following form:
{y_t} = \sum\limits_{i = 1}^s {\left[ {\left( {{{\phi ‘}_i}y_t^{\left( p \right)} + {{\xi ‘}_i}{x_t} + {{\psi ‘}_i}e_t^{\left( q \right)}} \right){F_i}\left( {{z_{t – d}};{\gamma _i},{\alpha _i},{c_i},{\beta _i}} \right)} \right]} + {\varepsilon _t}
y_t^{\left( p \right)} = {\left( {1,\tilde y_t^{\left( p \right)}} \right)^\prime },\quad \tilde y_t^{\left( p \right)} = {\left( {{y_{t – 1}}, \ldots ,\quad {y_{t – p}}} \right)^\prime },{\phi _i} = {\left( {{\phi _{i0}},{\phi _{i1}}, \ldots ,\quad {\phi _{ip}}} \right)^\prime } \\
\varepsilon _t^{\left( q \right)} = {\left( {{\varepsilon _{t – 1}}, \ldots ,\quad {\varepsilon _{t – q}}} \right)^\prime },{{\psi ‘}_i} = {\left( {{\psi _{i1}}, \ldots ,{\psi _{iq}}} \right)^\prime }\\
{x_t}{\text{ }} = {\left( {{x_1}, \ldots ,{x_l}} \right)^\prime },\quad {{\xi ‘}_i}{\text{ }} = {\left( {{\xi _{i1}}, \ldots ,{\xi _{il}}} \right)^\prime } \\
and we allow for a variance mixture so that \( {\varepsilon _t} \sim iid\left( {0,{\sigma _i},\eta } \right) \) with \( \eta \) denoting any remaining distributional parameters which are common across states. The softmax function is used to model multiple states such that:
{F_i}\left( {{z_{t – d}};{\gamma _i},{\alpha _i},{c_i},{\beta _i}} \right) = \frac{{{e^{{\pi _{i,t}}}}}}
{{1 + \sum\limits_{i = 1}^{s – 1} {{e^{{\pi _{i,t}}}}} }} \\
{F_s}\left( {{z_{t – d}};{\gamma _i},{\alpha _i},{c_i},{\beta _i}} \right) = \frac{1}
{{1 + \sum\limits_{i = 1}^{s – 1} {{e^{{\pi _{i,t}}}}} }}\\
where the state dynamics \( \pi_{i,t} \) also include the option of lag-1 autoregression:
{\pi _{i,t}} = {\gamma _i}\left( {{{\alpha ‘}_i}{z_{t – d}} – {c_i}} \right) + {{\beta ‘}_i}{\pi _{i,t – 1}},\quad \gamma_i>0
with initialization conditions given by:

\[\pi _{i,0} = \frac{{{\gamma _i}\left( {{{\alpha ‘}_i}\bar z – {c_i}} \right)}}{{1 – {\beta _i}}},\quad \left| \beta  \right| < 1\]

The parameter \( \gamma_i \) is a scaling variable determining the smoothness of the transition between states, while \( c_i \) is the threshold intercept about which switching occurs.

The transition variable(s) \( z_t \) may be a vector of external regressors each with its own lag, or a combination of lags of \( y_t \) in which case the model is ‘rudely’, as one participant in the conference noted, called ‘self-exciting’. Should \( z_t \) be a vector, then for identification purposes \( \alpha_1 \) is fixed to 1. Additionally, as will be seen later, it is also possible to pass a function \( f\left(.\right) \) which applies a custom transformation to the lagged values in the case of the self-exciting model. While it may appear at first that the same can be achieved by pre-transforming those values and passing the transformed variables in the switching equation, this leaves you dead in the water when it comes to s(w)imulation and n-ahead forecasting where the value of the transition variable depends on the lagged simulated value of that same variable.

Finally, the transition function \( F_i \) has usually been taken to be either the logistic or exponential transform functions. As the 2 figures below illustrate, the logistic nests the TAR model as \( \gamma_i\to \infty \) and collapses to the linear case as \( \gamma_i\to 0 \). The exponential on the other hand collapses to the linear case as \( \gamma_i \) approaches the limits and has a symmetric shape which is sometimes preferred for exchange rate modelling because of the perceived symmetric exchange rate adjustment behavior. Currently, only the logistic is considered in the twinkle package.

Logistic Transition

Logistic Transition

Exponential Transition

Exponential Transition



I follow a similar construction paradigm for the twinkle package as I have in my other packages. Namely, an S4 framework which includes steps for specifying the model, estimation, inference and tests, visual diagnostics, filtering, forecasting, simulation and rolling estimation/forecasting. I consider these to be the minimum methods for creating a complete modelling environment for econometric analysis. Additional methods are summarized in the following table:

setfixed<-Set fixed parameters[1]>setfixed(spec)<-list(s1.phi0=0)
setstart<-Set starting parameters[1]>setstart(spec)<-list(s1.phi0=0)
setbounds<-Set parameter bounds[1]>setbounds(spec)<-list(s1.phi0=c(0,1))
nonlinearTestLuukkonen Test[1][2]>nonlinearTest(fit, robust=TRUE)
modelmatrixmodel matrix[1][2]>modelmatrix(fit, linear=FALSE)
coefcoef vector[2][3]>coef(fit)
fittedconditional mean[2][3][4][5][6]>fitted(fit)
statesconditional state probabilities[2][3][4][5][6]>states(fit)
likelihoodlog likelihood[2][3]>likelihood(fit)
infocriterianormalized information criteria[2][3][7]>infocriteria(fit)
vcovparameter covariance matrix[2]>vcov(fit)
convergencesolver convergence[2][7]>convergence(fit)
scorenumerical score matrix[2]>score(fit)
sigmaconditional sigma[2][3][4][5][6]>sigma(fit) density[7]>
quantileconditional quantiles[2][3][4][5][6][7]>quantile(fit)
pitconditional probability integral transformation[2][3][7]>pit(fit)

The classes in column 3 are: [1] STARspec, [2] STARfit, [3] STARfilter, [4] STARforecast, [5] STARsim, [6] STARpath, [7] rollSTAR.

Model Specification

The model specification function has a number of options which I will briefly discuss in the section.

## function (mean.model = list(states = 2, include.intercept = c(1,
##     1), arOrder = c(1, 1), maOrder = c(0, 0), matype = 'linear',
##     statevar = c('y', 's'), s = NULL, ylags = 1, statear = FALSE,
##     yfun = NULL, xreg = NULL, transform = 'logis'), variance.model = list(dynamic = FALSE,
##     model = 'sGARCH', garchOrder = c(1, 1), submodel = NULL,
##     vreg = NULL, variance.targeting = FALSE), distribution.model = 'norm',
## = list(), = list(), fixed.prob = NULL,
##     ...)
Mean Equation

Upto 4 states may be modelled, with a choice of inclusion or exclusion of an intercept in each state (include.intercept), the number of AR (arOrder) and MA (maOrder) parameters per state and whether to include external regressors in each state (xreg should be a prelagged xts matrix). Note that the default for the moving average terms is to include them outside the states, but this can be changed by setting matype=’state’. The statevar denotes what the state variable is, with “y” being the self-exciting model and “s” a set of external regressors passed as a prelagged xts matrix to s. If the choice is “y”, the ylags should be a vector of the lags for the variable with a choice like c(1,3) denoting lag-1 and lag-3. Finally, the yfun option was discussed in the previous section and is a custom transformation function for y returning the same number of points as given.

Variance Equation

The variance can be either static (default) or dynamic (logical), in which case it can be one of 3 GARCH models (‘sGARCH’,’gjrGARCH’ or ‘eGARCH’) or ‘mixture’ as discussed previously. The rest of the options follow from those in the rugarch package in the case of GARCH type variance.

Other options

The same distributions as those in rugarch are implemented, and there is the option of passing fixed or starting parameters to the specification, although the methods ‘setfixed< –‘ and ‘setstart<-‘ allow this to be set once the specification has been formed. There is also a ‘setbounds<-‘ method for setting parameter bounds which for the unconstrained solvers (the default to use with this type of model) means using a logistic bounding transformation. Finally, the fixed.probs option allows the user to pass a set of fixed state probabilities as an xts matrix in which case the model is effectively linear and may be estimated quite easily.

Parameter naming

It is probably useful to have the naming convention of the parameters used in the package should starting, fixed or bounds need to be set. These are summarized in the list below and generally follow the notation used in the representation of the model in the previous section:

  • All state based variables are preceded by their state number (s1.,s2.,s3.,s4.)
  • Conditional Mean Equation:
    • intercept: phi0 (e.g. s1.phi0, s2.phi0)
    • AR(p): phi1, …, phip (e.g. s1.phi1, s1.phi2, s2.phi1, s2.phi2)
    • MA(q): psi1, …, psiq (e.g. s1.psi1, s1.psi2, s2.psi1, s2.psi2). Note that in the case of matype=’linear’, the states are not used.
    • X(l): xi1, …, xil (e.g. s1.xi1, s2.xi2, x3.xi1)
  • State Equation:
    • scaling variable: gamma (e.g. s1.gamma)
    • Threshold: c (e.g. s1.c)
    • Threshold Variables (k): alpha2, …, alphak (e.g. s1.alpha2, s1.alpha3). Note that the first variable (alpha1) is constrained to be 1 for identification purposes so cannot be changed. This will always show up in the summary with NAs in the standard errors since it is not estimated.
    • Threshold AR(1): beta (e.g. s1.beta)
  • Variance Equation:
    • sigma (s): If dynamic and mixture then s1.sigma, s2.sigma etc. If static then just sigma.
    • GARCH parameters follow same naming as in the rugarch package
  • Distribution:
    • skew
    • shape

I’ll define a simple specification to use with this post and based on examples from the twinkle.tests folder in the src distribution (under the inst folder). This is based on a weekly realized measure of the Nasdaq 100 index for the period 02-01-1996 to 12-10-2001, and the specification is for a simple lag-1 self-exciting model with AR(2) for each state.

ndx.ret2 = ROC(Cl(ndx), na.pad = FALSE)^2
ndx.rvol = sqrt(apply.weekly(ndx.ret2, FUN = 'sum'))
colnames(ndx.rvol) = 'RVOL'
spec = starspec(mean.model = list(states = 2, arOrder = c(2, 2), statevar = 'y',
    ylags = 1))

Before proceeding, I also check the presence of STAR nonlinearity using the test of Luukkonen (1988) which is implemented in the nonlinearTest method with an option for also testing with the robust assumption (to heteroscedasticity):

tmp1 = nonlinearTest(spec, data = log(ndx.rvol))
tmp2 = nonlinearTest(spec, data = log(ndx.rvol), robust = TRUE)
testm = matrix(NA, ncol = 4, nrow = 2, dimnames = list(c('Standard', 'Robust'),
    c('F.stat', 'p.value', 'Chisq.stat', 'p.value')))
testm[1, ] = c(tmp1$F.statistic, tmp1$F.pvalue, tmp1$chisq.statistic, tmp1$chisq.pvalue)
testm[2, ] = c(tmp2$F.statistic, tmp2$F.pvalue, tmp2$chisq.statistic, tmp2$chisq.pvalue)
print(testm, digit = 5)
##          F.stat   p.value Chisq.stat   p.value
## Standard 3.7089 0.0014366     21.312 0.0016123
## Robust   2.5694 0.0193087     15.094 0.0195396

We can safely reject the linearity assumption under the standard test at the 1% significance level, and under the robust assumption at the 5% significance level. Note that this example is taken from the excellent book of Zivot (2007) (chapter on nonlinear models) and the numbers should also agree with what is printed there.

Model Estimation

Estimating STAR models is a challenging task, and for this purpose a number of options have been included in the package.

## function (spec, data, out.sample = 0, solver = 'optim', solver.control = list(),
##     fit.control = list(stationarity = 0, = 0, rec.init = 'all'),
##     cluster = NULL, n = 25, ...)

The data must be an xts object with the same time indices as any data already passed to the STARspec object and contain only numeric data without any missing values. The out.sample is used to indicate how many data points to optionally leave out in the estimation (from the end of the dataset) for use in out-of-sample forecasting later on when the estimated object is passed to the starforecast routine. Perhaps the most important choice to be made is the type of solver to use and it’s control parameters solver.control which should not be omitted. The following solvers and ‘strategies’ are included:

  • optim. The preferred choice is the BFGS solver. The choice of solver is controlled by the method option in the solver.control list. All parameter bounds are enforced through the use of a logistic transformation.
  • nlminb. Have had little luck getting the same performance as the BFGS solver.
  • solnp. Will most likely find a local solution.
  • cmaes. Even though it is a global solver, it requires careful tweaking of the control parameters (and there are many). This is the parma package version of the solver.
  • deoptim. Another global solver. May be slow and require tweaking of the control parameters.
  • msoptim. A multistart version of optim with option for using the cluster option for parallel evaluation. The number of multi-starts is controlled by the n.restarts option in the solver.control list.
  • strategy. A special purpose optimization strategy for STAR problems using the BFGS solver. It cycles between keeping the state variables fixed and estimating the linear variables (conditional mean, variance and any distribution parameters), keeping the linear variables fixed and estimating the state variables, and a random re-start optimization to control for possibly local solutions. The argument n in the routine controls the number of times to cycle through this strategy. The solver.control list should pass control arguments for the BFGS solver. This is somewhat related to concentrating the sum of squares methodology in terms of the estimation strategy, but does not minimize the sum of squares, opting instead for a proper likelihood evaluation.

The strategy and msoptim solver strategies should be the preferred choice when estimating STARMA models.

I continue with the example already covered in the specification section and estimate the model, leaving 50 points for out of sample forecasting and filtering later on:

mod = starfit(spec, data = log(ndx.rvol), out.sample = 50, solver = 'strategy',
    n = 8, solver.control = list(alpha = 1, beta = 0.4, gamma = 1.4, reltol = 1e-12))
## *---------------------------------*
## *          STAR Model Fit         *
## *---------------------------------*
## states       : 2
## statevar     : y
## statear      : FALSE
## variance     : static
## distribution : norm
## Optimal Parameters (Robust Standard Errors)
## ------------------------------------
##            Estimate  Std. Error    t value Pr(>|t|)
## s1.phi0    -3.54380    0.034260 -103.43760 0.000000
## s1.phi1    -0.64567    0.426487   -1.51393 0.130043
## s1.phi2     0.10950    0.319605    0.34262 0.731886
## s2.phi0    -2.51982    0.849927   -2.96475 0.003029
## s2.phi1     0.10902    0.214009    0.50944 0.610444
## s2.phi2     0.17944    0.062210    2.88447 0.003921
## s1.gamma    3.22588    1.941072    1.66190 0.096532
## s1.c       -2.52662    0.347722   -7.26620 0.000000
## s1.alpha1   1.00000          NA         NA       NA
## sigma       0.39942    0.019924   20.04776 0.000000
## LogLikelihood : -126.3
## Akaike       1.0738
## Bayes        1.1999
## Shibata      1.0714
## Hannan-Quinn 1.1246
## r.squared         :  0.3167
## r.squared (adj)   :  0.2913
## RSS               :  40.2
## skewness (res)    :  -0.235
## ex.kurtosis (res) :  0.4704
## AR roots
##         Moduli1 Moduli2
## state_1   1.274   7.170
## state_2   2.076   2.684



Model Filtering

Filtering a dataset with an already estimated set of parameters has been already extensively discussed in a related post for the rugarch package. The method takes the following arguments:

## function (spec, data, out.sample = 0, n.old = NULL, rec.init = 'all', ...)

The most important argument is probably n.old and denotes, in the case that the new dataset is composed of the old dataset (on which estimation took place) and the new data, the number of points composing the original dataset. This is so as to use the same initialization values for certain recursions and return the exact same results for those points in the original dataset. The following example illustrates:

specf = spec
setfixed(specf) < - as.list(coef(mod))
N = nrow(ndx.rvol) - 50
modf = starfilter(specf, data = log(ndx.rvol), n.old = N)
print(all.equal(fitted(modf)[1:N], fitted(mod)))
## [1] TRUE
print(all.equal(states(modf)[1:N], states(mod)))
## [1] TRUE


Model Forecasting

Nonlinear models are considerable more complex than their linear counterparts to forecast. For 1-ahead this is quite simple, but for n-ahead there is no closed form solution as in the linear case. Consider a general nonlinear first order autoregressive model:
{y_t} = F\left( {{y_{t – 1}};\theta } \right) + {\varepsilon _t}
The 1-step ahead forecast is simply:
{{\hat y}_{t + 1\left| t \right.}} = E\left[ {{y_{t + 1}}\left| {{\Im _t}} \right.} \right] = F\left( {{y_t};\theta } \right)
However, for n-step ahead, and using the Chapman-Kolmogorov relationship \( g\left( {{y_{t + h}}\left| {{\Im _t}} \right.} \right) = \int_{ – \infty }^\infty {g\left( {{y_{t + h}}\left| {{y_{t + h – 1}}} \right.} \right)g\left( {{y_{t + h – 1}}\left| {{\Im _t}} \right.} \right)d{y_{t + h – 1}}} \), we have:
\[E\left[ {{y_{t + h}}\left| {{\Im _t}} \right.} \right] = \int_{ – \infty }^\infty  {E\left[ {{y_{t + h}}\left| {{y_{t + h – 1}}} \right.} \right]g\left( {{y_{t + h – 1}}\left| {{\Im _t}} \right.} \right)d{y_{t + h – 1}}}\]
where there is no closed form relationship since \( E\left[ {F\left( . \right)} \right] \ne F\left({E\left[ . \right]} \right) \).
The trick is to start at h=2:
{{\hat y}_{t + 2\left| t \right.}} = \frac{1}{T}\sum\limits_{i = 1}^T {F\left( {{{\hat y}_{t + 1\left| t \right.}} + {\varepsilon _i};\theta } \right)}
and using either quadrature integration or monte carlo summation obtain the expected value. Use that value for the next step, rinse and repeat.

In the twinkle package, both quadrature and monte carlo summation are options in the starforecast method:

## function (fitORspec, data = NULL, n.ahead = 1, n.roll = 0, out.sample = 0,
##     external.forecasts = list(xregfor = NULL, vregfor = NULL,
##         sfor = NULL, probfor = NULL), method = c('an.parametric',
##         'an.kernel', 'mc.empirical', 'mc.parametric', 'mc.kernel'),
##     mc.sims = NULL, ...)

with added options for either parametric, empirical or kernel fitted distribution for the residuals. The method also allows for multiple dispatch methods by taking either an object of class STARfit or one of class STARspec (with fixed parameters and a dataset). The example below illustrates the different methods:

forc1 = starforecast(mod, n.roll = 2, n.ahead = 20, method = 'an.parametric',
    mc.sims = 10000)
forc2 = starforecast(mod, n.roll = 2, n.ahead = 20, method = 'an.kernel', mc.sims = 10000)
forc3 = starforecast(mod, n.roll = 2, n.ahead = 20, method = 'mc.empirical',
    mc.sims = 10000)
forc4 = starforecast(mod, n.roll = 2, n.ahead = 20, method = 'mc.parametric',
    mc.sims = 10000)
forc5 = starforecast(mod, n.roll = 2, n.ahead = 20, method = 'mc.kernel', mc.sims = 10000)
par(mfrow = c(2, 3))
## *------------------------------------*
## *        STAR Model Forecast         *
## *------------------------------------*
## Horizon        : 20
## Roll Steps     : 2
## STAR forecast  : an.parametric
## Out of Sample  : 20
## 0-roll forecast [T0=2000-10-27]:
##      Series
## T+1  -2.684
## T+2  -2.820
## T+3  -2.948
## T+4  -3.061
## T+5  -3.157
## T+6  -3.231
## T+7  -3.286
## T+8  -3.324
## T+9  -3.350
## T+10 -3.368
## T+11 -3.379
## T+12 -3.387
## T+13 -3.392
## T+14 -3.395
## T+15 -3.397
## T+16 -3.398
## T+17 -3.399
## T+18 -3.400
## T+19 -3.400
## T+20 -3.400


The nice thing about the monte carlo method is that the density of each point forecast is now available, and used in the plot method to draw quantiles around that forecast. It can be extracted by looking at the slot object@forecast$yDist, which is list of length n.roll+1 of matrices of dimensions mc.sims by n.ahead.


Model Simulation

Simulation in the twinkle package, like in rugarch, can be carried out directly on the estimated STARfit object else on a specification object of class STARspec with fixed parameters. Achieving equivalence between the two relates to start-up initialization conditions and is always a good check on reproducibility and code correctness, and shown in the example that follows:

sim = starsim(mod, n.sim = 1000, rseed = 10)
path = starpath(specf, n.sim = 1000, prereturns = tail(log(ndx.rvol)[1:N], rseed = 10)
all.equal(fitted(sim), fitted(path))
all.equal(states(sim), states(path))

The fitted method extracts the simulated series as an n.sim by m.sim matrix, while the states method extracts the simulated state probabilities (optionally takes “type” argument with options for extracting the simulated raw dynamics or conditional simulated mean per state) and can be passed the argument sim to indicate which m.sim run to extract (default: 1). Passing the correct prereturns value and the same seed as in starsim, initializes the simulation from the same values as the test of equality shows between the 2 methods. Things become a little more complicated when using external regressors or GARCH dynamics, but with careful preparation the results should again be the same.

Rolling estimation and 1-step ahead forecasting

The final key modelling method, useful for backtesting, is that of the rolling estimation and 1-step ahead forecasting which has a number of options to define the type of estimation window to use as well as a resume method which re-estimates any windows which did not converge during the original run. This type of method has already been covered in related posts of rugarch so I will reserve a more in-depth demo for a later date.

Final Thoughts

This post provided an introduction to the use of the twinkle package which should hopefully make it to CRAN from bitbucket soon. It is still in beta, and it will certainly take some time to mature, so please report bugs or feel free to contribute patches. The package departs from traditional implementations, sparse as they are, in the area of STAR models by offering extensions in the form of (MA)(X) dynamics in the conditional mean, (AR) dynamics in the conditional state equation, a mixture model for the variance, and a softmax representation for the multi-state model. It brings a complete modelling framework, developed in the rugarch package, to STAR model estimation with a set of methods which are usually lacking elsewhere. It also brings, at least for the time being, a promise of user engagement (via the R-SIG-FINANCE mailing list) and maintenance.


Download and installation instructions can be found here.


[1] Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica: Journal of the Econometric Society, 357-384.

[2] Tong, H., & Lim, K. S. (1980). Threshold Autoregression, Limit Cycles and Cyclical Data. Journal of the Royal Statistical Society. Series B (Methodological), 245-292.

[3] Teräsvirta, T. (1994). Specification, estimation, and evaluation of smooth transition autoregressive models. Journal of the American Statistical association, 89(425), 208-218.

[4] Luukkonen, R., Saikkonen, P., & Teräsvirta, T. (1988). Testing linearity against smooth transition autoregressive models. Biometrika, 75(3), 491-499.

[5] Zivot, E., & Wang, J. (2007). Modeling Financial Time Series with S-PLUS® (Vol. 191). Springer.

The realized GARCH model

The last model added to the rugarch package dealt with the modelling of intraday volatility using a multiplicative component GARCH model. The newest addition is the realized GARCH model of Hansen, Huang and Shek (2012) (henceforth HHS2012) which relates the realized volatility measure to the latent volatility using a flexible representation with asymmetric dynamics. This short article discusses the model, its implementation in rugarch and a short empirical application.

The latest version of rugarch (1.3-0) is available from the teatime repository on r-forge.

The Model

The realized GARCH (realGARCH) model of HHS2012 provides for an excellent framework for the joint modelling of returns and realized measures of volatility. Unlike the naive augmentation of GARCH processes by a realized measures, the realGARCH model relates the observed realized measure to the latent volatility via a measurement equation, which also includes asymmetric reaction to shocks, making for a very flexible and rich representation. Formally, let:

\[ \begin{gathered}
{y_t} = {\mu _t} + {\sigma _t}{z_t},{\text{ }}{z_t} \sim i.i.d\left( {0,1} \right) \\
\log \sigma _t^2 = \omega + \sum\limits_{i = 1}^q {{\alpha _i}\log {r_{t – i}} + \sum\limits_{i = 1}^p {{\beta _i}\log \sigma _{t – i}^2} } \\
\log r_t = \xi + \delta \log \sigma _t^2 + \tau \left( {{z_t}} \right) + {u_t},{\text{ }}{u_t} \sim N\left( {0,\lambda } \right) \\

where we have defined the dynamics for the returns (\( y_t \)), the log of the conditional variance (\( \sigma^2_t \)) and the log of the realized measure (\( r_t \)). Note that the notation varies slightly from that of HHS2012, and the rugarch vignette contains a footnote of the differences. The asymmetric reaction to shocks comes via the \( \tau\left(.\right) \) function which is based on the Hermite polynomials and truncated at the second level to give a simple quadratic form:

\[ \tau\left(z_t\right) = \eta_1z_t + \eta_2\left(z^2_t-1\right)
which has the very convenient property that \( E\tau\left(z_t\right)=0 \). The function also forms the basis for the creation of a type of news impact curve \( \nu \left( z \right) \), defined as:
\[ \begin{gathered}
\nu \left( z \right) = E\left[ {\log {\sigma _t}\left| {{z_{t – 1}} = z} \right.} \right] – E\left[ {\log {\sigma _t}} \right] = \delta \tau \left( z \right) \\
so that \( 100\times\nu\left(z\right) \) is the percent change in volatility as a function of the standardized innovations.

A key feature of this model is that it preserves the ARMA structure which characterize many standard GARCH models and adapted here from Proposition 1 of the HHS2012 paper:
\[ \begin{gathered}
\log \sigma _t^2 = {\mu _\sigma } + \sum\limits_{i = 1}^{p \vee q} {\left( {\delta {\alpha _i} + {\beta _i}} \right)\log \sigma _{t – 1}^2 + } \sum\limits_{j = 1}^q {{\alpha _j}{w_{t – j}}} \\
\log {r_t} = {\mu _r} + \sum\limits_{i = 1}^{p \vee q} {\left( {\delta {\alpha _i} + {\beta _i}} \right)\log {r_{t – 1}} + {w_t}} – \sum\limits_{j = 1}^{p \vee q} {{\beta _j}{w_{t – j}}} \\
{\mu _\sigma } = \omega + \xi \sum\limits_{i = 1}^q {{\alpha _i}} ,{\mu _r} = \delta \omega + \left( {1 – \sum\limits_{i = 1}^p {{\beta _i}} } \right)\xi \\
where \( w_t=\tau\left(z_t\right)+u_t \), \( {\mu _\sigma } = \omega + \xi \sum\limits_{i = 1}^q {{\alpha _i}} \) and \( {\mu _r} = \delta \omega + \left( {1 – \sum\limits_{i = 1}^p {{\beta _i}} } \right)\xi \), and the convention \( {\beta _i} = {\alpha _j} = 0 \) for \( i>p \) and \( j\lt p \). It is therefore a simple matter to show that the persistence \(\hat P\) of the process is given by:
\hat P = \sum\limits_{i = 1}^p {{\beta _i} + } \delta \sum\limits_{i = 1}^q {{\alpha _i}}
while the unconditional (long-run) variance may be written as:
{\hat \sigma ^2} = \frac{{\omega + \xi \sum\limits_{i = 1}^q {{\alpha _i}} }}
{{1 – \hat P}}

Demonstration: Estimation

For the demonstration, I will use the SPY dataset which accompanied the paper of HHS2012 including the realized volatility based on the realized kernel (parzen) method of Barndorff-Nielsen et al (2008) (this dataset is now included in rugarch as of version 1.3-0). In general, the highfrequency package can be used on intraday data to generate any number of realized volatility measures (and is currently the only package AFAIK that provides this functionality in R).

spec = ugarchspec(mean.model = list(armaOrder = c(0, 0), include.mean = FALSE), variance.model = list(model = 'realGARCH', garchOrder = c(2, 1)))
fit = ugarchfit(spec, spyreal[, 1] * 100, solver = 'hybrid', realizedVol = spyreal[,2] * 100)

Notice how the additional realized measure is passed via the realizedVol argument which must be an xts object with matching index as the returns data. Also note the use of the hybrid solver which is the recommended method for solving this type of model.

I’m going to compare the results with those of HHS2012:

cf = coef(fit)
se = fit@fit$matcoef[, 2]
names(se) = names(cf)
# HHS2012 divide the period into in-(T=1492) and out- of sample, even though they are estimating the model using the entire dataset (T=1659) comment.
benchmark.LL = c('logL' = -2388.75, 'pLogL' = -1710.34) = c('logh0' = -0.289213754, 'omega' = 0.041246175, 'alpha1' = 0.450676773, 'alpha2' = -0.176042911, 'beta1' = 0.701210369, 'xi' = -0.179994885, 'delta' = 1.037491982, 'eta1' = -0.067809509, 'eta2' = 0.07015778, 'lambda' = sqrt(0.145369801)) = c('logh0' = 0.278787604, 'omega' = 0.015515016, 'alpha1' = 0.038027571, 'alpha2' = 0.06335417, 'beta1' = 0.055974942, 'xi' = 0.043611245, 'delta' = 0.057726499, 'eta1' = 0.010309694, 'eta2' = 0.006620828, 'lambda' = 0.00594321)
# rugarch does not estimate h0, instead uses either mean(residuals^2), else a choice of variance targeting with options
rugarch.LL = c('logL' = sum(-fit@fit$log.likelihoods[1:1492]), 'pLogL' = sum(-fit@fit$partial.log.likelihoods[1:1492])) = c('logh0' = NA, 'omega' = cf['omega'], 'alpha1' = cf['alpha1'], 'alpha2' = cf['alpha2'], 'beta1' = cf['beta1'], 'xi' = cf['xi'], 'delta' = cf['delta'], 'eta1' = cf['eta11'], 'eta2' = cf['eta21'], 'lambda' = cf['lambda']) = c('logh0' = NA, 'omega' = se['omega'], 'alpha1' = se['alpha1'], 'alpha2' = se['alpha2'], 'beta1' = se['beta1'], 'xi' = se['xi'], 'delta' = se['delta'], 'eta1' = se['eta11'], 'eta2' = se['eta21'], 'lambda' = se['lambda'])
names( = names( = c('logh0', 'omega', 'alpha1', 'alpha2', 'beta1', 'xi', 'delta', 'eta1', 'eta2', 'lambda')
parsdf = cbind(,, -
sedf = cbind(,, -
LRE.vars = -log(abs( -, base = 10) = -log(abs( -, base = 10)
test = cbind(LRE.vars,
tmp1 = t(cbind(rugarch =, benchmark =
tmp2 = t(cbind(rugarch =, benchmark =
# print the results:
## parameters:
##             logh0  omega alpha1  alpha2  beta1      xi delta    eta1   eta2  lambda
## rugarch        NA 0.0451 0.4766 -0.2027 0.7050 -0.1898 1.027 -0.0616 0.0718  0.3815
## benchmark -0.2892 0.0412 0.4507 -0.1760 0.7012 -0.1800 1.038 -0.0678 0.0702  0.3813
## standard errors:
##            logh0  omega alpha1 alpha2  beta1     xi  delta   eta1   eta2 lambda
## rugarch       NA 0.0137 0.0289 0.0450 0.0381 0.0390 0.0403 0.0097 0.0062 0.0066
## benchmark 0.2788 0.0155 0.0380 0.0634 0.0560 0.0436 0.0577 0.0103 0.0066 0.0059
## Log Relative Error Test:
##          logh0  omega alpha1 alpha2  beta1     xi delta  eta1  eta2 lambda
## LRE.vars    NA 1.0293 1.2408 0.8191 2.2627 1.2640 2.012 1.041 1.623 3.1413
##      NA 0.9355 0.6212 0.5374 0.4947 0.9733 0.519 1.204 1.208 0.9435

The results are pretty close, and all parameters are found to be highly significant. Note that unlike HHS2012, rugarch does not include \( h_0 \) in the parameter estimation set but instead, like the other models, estimates it based on the filtered squared residuals. Finally, we reproduce the news impact curve from their paper:

ni = newsimpact(fit, z = seq(-2, 2, length.out = 100))
plot(ni$zx, (ni$zy), ylab = ni$yexpr, xlab = ni$xexpr, type = 'l', main = 'News Impact realGARCH')
abline(v = 0)
abline(h = 0)

The interested reader should read the HHS2012 paper which has some very good insights. In particular, the coefficient on the log conditional variance in the measurement equation (\(\delta\)) provides interesting insights depending on whether open-close or close-close returns are used.

Demonstration: Forecasting

In rugarch, it is possible to create both rolling 1-ahead forecasts (assuming the out.sample option was used in the estimation) and long-run n-ahead forecasts, and combinations of both. In the realGARCH model, n.ahead>1 forecasts contain uncertainty because \( u_t \) is not known. As such, simulation methods are used and the ugarchforecast routine takes on additional arguments related to this. These are n.sim for the number of random samples to use per period for the generation of the discrete time density of the forecast variance (and also the realized volatility), and returnDistribution (logical) indicates whether to include these estimates of the density in the returned object. Whatever the case, the point estimate is always returned and based on the mean of the sampled values.

The first example below demonstrates the equivalence of the ugarchfilter and ugarchforecast (using either a uGARCHfit or uGARCHspec objects) methods.

fit = ugarchfit(spec, spyreal[, 1] * 100, out.sample = 25, solver = 'hybrid', realizedVol = spyreal[, 2] * 100)
specf = spec
setfixed(specf) < - as.list(coef(fit))
filt = ugarchfilter(specf, data = spyreal[, 1] * 100, n.old = nrow(spyreal) - 25, realizedVol = spyreal[, 2] * 100)
forc1 = ugarchforecast(fit, n.ahead = 1, n.roll = 25)
# forecast from spec
forc2 = ugarchforecast(specf, n.ahead = 1, n.roll = 25, data = spyreal[, 1] * 100, out.sample = 25, realizedVol = spyreal[, 2] * 100)
filts = tail(sigma(filt), 25)
colnames(filts) = 'filter'
forcs1 = xts(sigma(forc1)[1, ], move(as.Date(names(sigma(forc1)[1, ])), by = 1))
forcs2 = xts(sigma(forc2)[1, ], move(as.Date(names(sigma(forc2)[1, ])), by = 1))
colnames(forcs1) = 'fit2forecast'
colnames(forcs2) = 'spec2forecast'
ftest = cbind(filts, forcs1, forcs2)
# last forecast is completely out of sample, so not available from the
# filter method (which filters given T-1)
print(round(ftest, 5))
##            filter fit2forecast spec2forecast
## 2008-07-28 1.1065       1.1065        1.1065
## 2008-07-29 1.0131       1.0131        1.0131
## 2008-07-30 0.9885       0.9885        0.9885
## 2008-07-31 1.0828       1.0828        1.0828
## 2008-08-01 1.0685       1.0685        1.0685
## 2008-08-04 1.1434       1.1434        1.1434
## 2008-08-05 1.0460       1.0460        1.0460
## 2008-08-06 1.0351       1.0351        1.0351
## 2008-08-07 0.9206       0.9206        0.9206
## 2008-08-08 0.9933       0.9933        0.9933
## 2008-08-11 1.0083       1.0083        1.0083
## 2008-08-12 0.9368       0.9368        0.9368
## 2008-08-13 0.9564       0.9564        0.9564
## 2008-08-14 1.0243       1.0243        1.0243
## 2008-08-15 0.9903       0.9903        0.9903
## 2008-08-18 0.9432       0.9432        0.9432
## 2008-08-19 0.9751       0.9751        0.9751
## 2008-08-20 0.9453       0.9453        0.9453
## 2008-08-21 1.0326       1.0326        1.0326
## 2008-08-22 0.9930       0.9930        0.9930
## 2008-08-25 0.8638       0.8638        0.8638
## 2008-08-26 0.9082       0.9082        0.9082
## 2008-08-27 0.9154       0.9154        0.9154
## 2008-08-28 0.8658       0.8658        0.8658
## 2008-08-29 0.8235       0.8235        0.8235
## 2008-09-01     NA       0.8103        0.8103

The advantage of being able to call the ugarchforecast method with a uGARCHspec object (with fixed pars) should be obvious. Anytime we have new data arriving, we can simply augment the old data and create new forecasts without having to re-estimate the model (unless we wish to update the parameter set). In the next example, a long-horizon forecast is created an compared to the unconditional value:

forc3 = ugarchforecast(fit, n.ahead = 400, n.sim = 5000)
plot(sigma(forc3), type = 'l', main = 'realGARCH long-run forecast')
abline(h = sqrt(uncvariance(fit)), col = 2)
legend('topright', c('long-run forecast', 'unconditional value'), col = 1:2, lty = c(1, 1), bty = 'n')


The final example shows how the forecast density is extracted from the forecasted object. This is an array of dimensions (n.ahead x n.sim x n.roll+1).

forc4 = ugarchforecast(fit, n.ahead = 25, n.sim = 10000)
# dim(forc4@forecast$sigmaDF)
sigmaDF = forc4@forecast$sigmaDF
meansig = sqrt(exp(rowMeans(log(sigmaDF[, , 1]^2))))
boxplot(t(sigmaDF[, , 1]), main = '25-ahead volatility forecast (realGARCH)', col = 'orange')
points(as.numeric(meansig), col = 'green')
# note that for the 1-ahead there is no uncertainty (unless we were doing this Bayes-style so that parameter uncertainty would have an impact).



Demonstration: Simulation

There are 2 ways to simulate from GARCH models in the rugarch package: either directly from an estimated object (uGARCHfit class) or from a GARCH spec (uGARCHspec with fixed parameters), similarly to the ugarchforecast method. However, instead of having a single dispatch method, there is ugarchsim (for the uGARCHfit class) and ugarchpath (for the uGARCHspec class). In the example that follows I show how to generate equivalent values from the two methods:

spec = ugarchspec(mean.model = list(armaOrder = c(1, 1), include.mean = TRUE), variance.model = list(model = 'realGARCH', garchOrder = c(1, 1)))
fit = ugarchfit(spec, spyreal[, 1] * 100, out.sample = 25, solver = 'hybrid', realizedVol = spyreal[, 2] * 100)
specf = spec
setfixed(specf) < - as.list(coef(fit))
T = nrow(spyreal) - 25
sim1 = ugarchsim(fit, n.sim = 1000, m.sim = 1, n.start = 0, startMethod = 'sample', rseed = 12)
sim2 = ugarchsim(fit, n.sim = 1000, m.sim = 1, n.start = 0, startMethod = 'sample', rseed = 12, prereturns = as.numeric(tail(spyreal[1:T, 1] * 100, 1)), presigma = as.numeric(tail(sigma(fit), 1)), preresiduals = as.numeric(tail(residuals(fit), 1)), prerealized = as.numeric(tail(spyreal[1:T, 2] * 100, 1)))
sim3 = ugarchpath(specf, n.sim = 1000, m.sim = 1, n.start = 0, rseed = 12, prereturns = as.numeric(tail(spyreal[1:T, 1] * 100, 1)), presigma = as.numeric(tail(sigma(fit), 1)), preresiduals = as.numeric(tail(residuals(fit), 1)), prerealized = as.numeric(tail(spyreal[1:T, 2] * 100, 1)))
print(cbind(head(sigma(sim1)), head(sigma(sim2)), head(sigma(sim3))))
##       [,1]   [,2]   [,3]
## T+1 1.0735 1.0735 1.0735
## T+2 0.9836 0.9836 0.9836
## T+3 1.1091 1.1091 1.1091
## T+4 1.0306 1.0306 1.0306
## T+5 0.9609 0.9609 0.9609
## T+6 0.8771 0.8771 0.8771

Again, it should be obvious what the applications are for ugarchpath in a live environment.

Application: Rolling Forecasts and Value at Risk

The final part of this article provides a small empirical application comparing the realGARCH and eGARCH models for risk management.

cl = makePSOCKcluster(5)
spec1 = ugarchspec(mean.model = list(armaOrder = c(1, 1), include.mean = TRUE), variance.model = list(model = 'realGARCH', garchOrder = c(1, 1)))
spec2 = ugarchspec(mean.model = list(armaOrder = c(1, 1), include.mean = TRUE), variance.model = list(model = 'eGARCH', garchOrder = c(1, 1)))
roll1 = ugarchroll(spec1, spyreal[, 1] * 100, forecast.length = 500, solver = 'hybrid', refit.every = 25, refit.window = 'recursive', realizedVol = spyreal[, 2] * 100, cluster = cl)
roll2 = ugarchroll(spec2, spyreal[, 1] * 100, forecast.length = 500, refit.every = 25, refit.window = 'recursive', cluster = cl)
## VaR Backtest Report
## ===========================================
## Model:           realGARCH-norm
## Backtest Length: 500
## Data:
## ==========================================
## alpha:               1%
## Expected Exceed:     5
## Actual VaR Exceed:   10
## Actual %:            2%
## Unconditional Coverage (Kupiec)
## Null-Hypothesis: Correct Exceedances
## LR.uc Statistic: 3.914
## LR.uc Critical:  3.841
## LR.uc p-value:   0.048
## Reject Null:     YES
## Conditional Coverage (Christoffersen)
## Null-Hypothesis: Correct Exceedances and
##                  Independence of Failures
## Statistic: 4.323
## Critical:  5.991
## p-value:   0.115
## Reject Null:     NO
## VaR Backtest Report
## ===========================================
## Model:           eGARCH-norm
## Backtest Length: 500
## Data:
## ==========================================
## alpha:               1%
## Expected Exceed:     5
## Actual VaR Exceed:   16
## Actual %:            3.2%
## Unconditional Coverage (Kupiec)
## Null-Hypothesis: Correct Exceedances
## LR.uc Statistic: 15.467
## LR.uc Critical:  3.841
## LR.uc p-value:   0
## Reject Null:     YES
## Conditional Coverage (Christoffersen)
## Null-Hypothesis: Correct Exceedances and
##                  Independence of Failures
## Statistic: 16.527
## Critical:  5.991
## p-value:   0
## Reject Null:     YES
plot(as.xts([, 'Sigma', drop = FALSE]), main = 'realGARCH vs eGARCH\n(out-of-sample volatility forecast)',  auto.grid = FALSE, minor.ticks = FALSE)
lines(as.xts([, 'Sigma', drop = FALSE]), col = 2)
legend('topleft', c('realGARCH', 'eGARCH'), col = 1:2, lty = c(1, 1), bty = 'n')





As widely reported in various applications and papers, there is no denying the value of including realized measures in the modelling process, particularly when done in a flexible and rich framework as offered by the realized GARCH model.

Happy 2014.


Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., & Shephard, N. (2008). Designing realized kernels to measure the ex post variation of equity prices in the presence of noise. Econometrica, 76(6), 1481-1536.

Hansen, P. R., Huang, Z., & Shek, H. H. (2012). Realized GARCH: a joint model for returns and realized measures of volatility. Journal of Applied Econometrics, 27(6), 877-906.

A Review of Risk Parity

What is risk parity (RP)?

Simply put, it is a method of allocating equal risk shares to each asset in the portfolio. In more traditional allocation schemes, equity, being the riskiest asset (and hence providing the highest reward), has typically received the lion’s share. With RP, equalization of risk contribution means that equity and other similarly risky assets receive a reduced allocation and low risk assets such as government bonds an increased allocation. As a result, in order to achieve ‘equity-like’ total returns, leverage has typically been used in this context, possibly using a target risk level. Under certain circumstances, namely when all assets have the same risk-to-reward ratios, the RP allocation is equivalent to the tangent portfolio. In the case where only the risks are equal, then the RP allocation is generally equivalent to the equal weight (1/n) portfolio, whilst it coincides with the minimum risk portfolio when cross diversification is highest (though that may be a tricky concept to describe when going beyond the mean-variance paradigm).

Of course, like all portfolio problems, we are speaking of our expectations of future risk. And this is one of the key benefits, argue the proponents of RP, since variance is much more stable and easier to forecast than the returns. Therefore, lying somewhere between the minimum risk and optimal risk portfolios, making use of only the ‘risk’ rather than the reward in the calculation, and backed up by an apparently stellar performance resulting from the recent decline in US Treasury rates, one might wonder why portfolio allocation is still taught at university.

In this author’s opinion, RP is not a proper model of asset allocation, does not contain that key ingredient called an active forecast which is after all what managers are rewarded for producing, and is likely to create complacency because of its oversimplified approach to the problem of forecast uncertainty. In this article I provide 2 applications based on different datasets with different characteristics in order to highlight some issues with the RP approach. I examine different risk\(^1\) measures, within a simple parametric DGP framework for generating forecasts, and find that even with such a simple approach RP cannot outperform the optimal risk-reward strategy or even the minimum risk strategy in a well diversified universe.

Initial Formalities

Formally, consider the marginal contribution to risk (MCR) of each asset (\(i\) of \(n\)) given a risk measure \( \rho\left(x\right) \):
MC{R_i} = \frac{{\partial \rho \left( x \right)}}{{\partial {x_i}}}
which when multiplied by the asset’s share and summed leads to the total portfolio risk (TR):
TR = \sum\limits_{\forall i} {{x_i}MC{R_i}}
One way to solve the general RP problem was already suggested by Maillard, Roncalli and Teïletche (2010) as a squared error minimization type problem which can be easily solved with an SQP solver:
\mathop {{\text{min}}}\limits_{\mathbf{x}} {\text{ }}\sum\limits_{j = 1}^n {\sum\limits_{i = 1}^n {{{\left( {{x_i}MC{R_i} – {x_j}MC{R_j}} \right)}^2}} }\\
s.t. \\
\sum\limits_{\forall i} {{x_i} = b} \\
{\mathbf{x}} \geqslant 0 \\
where \( b \) is the budget constraint. It should be clear that MCR is just the gradient of the risk function. Since all NLP risk functions in the parma package have analytic gradients (see the vignette for details), then the extension of RP to other measures beyond variance is quite trivial. For the case of variance, a fast algorithm using Newton’s method proposed by Chaves, Hsu and Shakernia (2012) is also available and tests conducted indicate that it is upto 100x faster than the equivalent SQP optimization (though we are already talking about tenths of a second anyway).

Set 1: Low Diversity Universe

The dataset consists of weekly (Friday) returns of international equity ETFs for the period November-2000 to December 2013. It is a highly correlated dataset with low diversification possibilities in a long-only setup. As such, it serves to illustrate the similarities between the equally weighted and RP portfolios.

Figure 1 aptly illustrates the very high correlation of the dataset whilst Figure 2 provides an indication of the risk and risk-return profile of the dataset, though one should bear in mind that this is for the entire period and does not necessarily reflect the situation at any particular point in time.







The backtest uses a static normal copula to model the covariance, with first stage AR(1)-GARCH(1,1)-JSU dynamics. The choice of a static rather than dynamic correlation model was motivated by the size of the dataset which is quite small, whilst for the conditional first stage GARCH dynamics the JSU distribution was used to account for univariate higher moments. Finally, and again motivated by the size of the dataset, an empirical transformation was used (see Genest, Ghoudi and Rivest (1995) and the rmgarch vignette for more details). At each time T (a Friday), data from 1:T was used (hence an expanding window choice) for the modelling, after which the T+1 simulated forecast density (the scenario) was generated for use in the optimization model. For the Mean-Variance model, the covariance of the simulated T+1 forecast density was used, whilst for all other risk measures the actual scenario was used in the optimization.
Three models were evaluated: the minimum risk (MR), risk parity (RP) and optimal risk-reward (OPT) using fractional programming. Within those models, four risk measures were evaluated: mean-variance (EV), mean absolute deviation (MAD) and lower partial moments of orders 2 and 4 (LPM2 and LPM4) representing different aversions to below threshold losses\( ^{2} \).
The weights generated by each model where then used to buy the underlying assets at the close of the day after (i.e. Monday) and held until the following re-formation period (i.e. the next Monday). In this way, there was no contemporaneous signal-portfolio formation bias, which may be significant for weekly and higher frequency models. Trading costs were not included nor was price impact and other related costs.

Table 1 provides a summary of the results (using the equal weight as the benchmark), with the key points being:

  • Among the allocation models, there is a negligible difference between the different risk measures. This may represent the frequency of the dataset and/or the DGP used (normal copula).
  • The RP and equal weight portfolios appear very close, as might be expected from the very close ex-ante variance of the assets and high correlation.
  • The MR portfolio shows dismal performance, and this is in line with the little diversification potential of this dataset.
  • The OPT portfolio appears superior to the RP model on all measures. Statistical significance of this statement was not checked, and with the exception of the Sharpe ratio one is challenged to do so for a large number of measures even on moderately sized datasets.



Finally, Figure 3 provides the ‘eye-catching’ illustration of terminal wealth trajectories of the various allocation models under the variance risk measure.




Set 2: Higher Diversity Universe

This is a more typical dataset for active asset allocation based on a more diverse universe of asset classes, though we are still limited in this application by readily available instruments and history within an open-data paradigm. The set covers the period June-2006 to December-2013, and I have again used weekly (Friday) returns. The backtest follows the exact methodology of Set 1.

Figure 4 displays good diversification potential in terms of correlations, whilst Figure 5 provides an indication of the risk and risk-return profile of the dataset, with the same caveats as mentioned for Figure 2.







The results in Table 2 provide for some very interesting insights. The higher diversification potential of this dataset has resulted in an exact opposite ranking for the MR portfolio which now has the highest Sharpe ratio, though a much lower CAGR, and the OPT portfolio again beats the RP portfolio. However, the RP portfolio now comfortably outperforms the equal weight portfolio, in line with expectations, given that the risks in the unconditional dataset were not very close (unlike set 1).






It is certainly fashionable to publish about uncertainty of inputs in the portfolio literature. Michaud (1989) did it, DeMiguel, Garlappi and Uppal (2009) did it (see this blog for a critique), and so have the proponents of Risk Parity. I don’t see how an investment manager can justify the use of any of these methodologies since he is rewarded for the quality of his inputs (active bets), though they can and certainly should serve a role in sensitivity analysis. If everyone could just allocate resources without a forecast then we would not need investment/resource managers. Even with simple econometric based forecasts such as those generated from a ‘simple’ AR(1) model we still managed to beat the RP and EW portfolios. Perhaps a little more attention should be given to the modelling of the underlying dynamics and a little less to ‘fashionable’ trends in asset allocation and catchy phrases meant to provide yet another sales drive, particularly given the resulting highly levered bets on a certain asset class whose rally depends heavily on a printing press which is fast running out of steam.


If you want the code to replicate the results or the RP code for use with parma, contact me by email stating your real name and affiliation.


Charnes, A., & Cooper, W. W. (1962). Programming with linear fractional functionals. Naval Research logistics quarterly, 9(3-4), 181-186.

Chaves, D., Hsu, J., Li, F., & Shakernia, O. (2012). Efficient Algorithms for Computing Risk Parity Portfolio Weights. The Journal of Investing, 21(3), 150-163.

DeMiguel, V., Garlappi, L., & Uppal, R. (2009). Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy?. Review of Financial Studies, 22(5), 1915-1953.

Genest, C., Ghoudi, K., & Rivest, L. P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika, 82(3), 543-552.

Maillard, S., Roncalli, T., & Teïletche, J. (2010). The Properties of Equally Weighted Risk Contribution Portfolios. The Journal of Portfolio Management, 36(4), 60-70.

Michaud, R. O. (1989). The Markowitz optimization enigma: is’ optimized’optimal?. Financial Analysts Journal, 31-42.

Stoyanov, S. V., Rachev, S. T., & Fabozzi, F. J. (2007). Optimal financial portfolios. Applied Mathematical Finance, 14(5), 401-436.


\( ^{1} \) Risk, despite the numerous papers written on risk parity including the seminal one by Maillard, Roncalli and Teïletche (2010) (although the concept appears to have originated as far back as 1996 with Ray Dalio of Bridgewater Associates, with the actual term ‘Risk Parity’ coined by E.Qian of PanAgora in 2006) is certainly not equivalent to variance. It can be any number of measures deemed appropriate (see the parma vignette section for a discussion of risk and deviation measures), and this blog article shows how the parma package can be used to calculate RP under a number of different measures in addition to variance, including Mean Absolute Deviation (MAD) and Lower Partial Moments (LPM).
\( ^{2} \) The threshold was based on the portfolio mean which is a case with especially desirable properties as discussed in the parma vignette.

A note on the co-moments in the IFACD model

The Independent Factor Autoregressive Conditional Density (IFACD) model of Ghalanos, Rossi and Urga (2014) uniquely, in its class of parametric models, generates time varying higher co-moment forecasts, as a consequence of the ACD specification of the conditional density of the standardized innovations. In this short note I discuss in more detail the properties of the conditional co-moments of this model, certain interesting properties as relates to the higher moment CAPM and a fast algorithm for populating these very large flattened tensors.

Conditional Co-Moments

The conditional co-moments of \( \mathbf{r}_t \) of order 3 and 4 are represented as tensor matrices
\[ \label{eq1}
\mathbf{M}_{t}^3 = \mathbf{A} \mathbf{M}_{f,t}^3(\mathbf{A} \otimes \mathbf{A})’, \quad \\
\mathbf{M}_{t}^4 = \mathbf{A} \mathbf{M}_{f,t}^4(\mathbf{A} \otimes \mathbf{A} \otimes \mathbf{A})’
where \( \mathbf{M}_{f,t}^3 \) and \( \mathbf{M}_{f,t}^4 \) are the \( N \times N^2 \) conditional third comoment matrix and the \( N \times N^3 \) conditional fourth comoment matrix of the factors, respectively. \( \mathbf{M}_{f,t}^3 \) and \( \mathbf{M}_{f,t}^4 \), are defined as
\[ \begin{eqnarray}
\mathbf{M}_{f,t}^3 & =&
\mathbf{M}_{f,t}^4 & = &
where \( \mathbf{M}_{k,f,t}^3, k=1,\ldots,N \) and \( \mathbf{M}_{kl,f,t}^4, k,l=1,\ldots,N \) are the \( N\times N \) submatrices of \( \mathbf{M}_{f,t}^3 \) and \( \mathbf{M}_{f,t}^4 \), respectively, with elements
m_{ijk,f,t}^3 & = & E[f_{i,t}f_{j,t}f_{k,t}|\mathfrak{F}_{t-1}] \\
m_{ijkl,f,t}^4 & = & E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}].
Since the factors \( f_{it} \) can be decomposed as \( z_{it}\sqrt{h_{it}} \), and given the assumptions on \( z_{it} \), then \( E[f_{i,t}f_{j,t}f_{k,t}|\mathfrak{F}_{t-1}] = 0\ \). It is also true that for \( i \neq j\neq k \neq l \), then \( E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}] = 0 \) and when \( i=j \) and \( k=l \), then \( E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}] = h_{it}h_{kt}. \)
Thus, under the assumption of mutual independence, all elements in the conditional co-moments matrices with at least 3 different indices are zero. Finally, we standardize the conditional co-moments to obtain conditional coskewness and cokurtosis of \( \mathbf{r}_t \)
\mathbf{S}_{ijk,t} = \frac{m_{ijk,t}^3}{({\sigma_{i,t}}{\sigma _{j,t}}{\sigma _{k,t}})}, \quad \\
\mathbf{K}_{ijkl,t} = \frac{m_{ijkl,t}^4}{({\sigma_{i,t}}{\sigma _{j,t}}{\sigma _{k,t}}{\sigma _{l,t}})},
where \( \mathbf{S}_{ijk,t} \) represents the coskewness between elements \( i,j,k \) of \( \mathbf{r}_t \), \( \sigma_{i,t} \) the standard deviation of \( \mathbf{r}_{i,t} \), and in the case of \( i=j=k \) represents the skewness of asset \( i \) at time \( t \), and similarly for the cokurtosis tensor \( \mathbf{K}_{ijkl,t} \)

Location of non-zero entries

The flattened tensors grow quite quickly in size as n (factors) and m (moment) become larger. Populating the factor matrices with the values from the ACD dynamics in a fast and efficient manner is key if this model is to be called ‘feasible’ and estimation ‘large-scale’. For the third co-moment matrix, this is very simply since the column based vectorized index of the location of non-zero entries is found to be \( ((1:n)-1)n^2 + ((1:n)-1)n + (1:n) \).
These represent the (column based vectorized) location in the third co-moment matrix of the factor unstandardized skewness estimated from the ACD model (and determined jointly by the skew and shape dynamics). In the case of the fourth co-moment matrix of the factors, the situation is slightly more involved since in addition to the entries \( \{i=j=k=l\} \) for which the column based vectorized location is found to be \( ((1:n)-1)n^3 + ((1:n)-1)n^2 + ((1:n)-1)n + (1:n) \), there are also the entries \( \{i=j,k=l\} \) to consider as discussed in the previous section. To this end, consider the \( n\times n^3 \) matrix of the flattened tensor \( M_{f,ijkl}^4 \), with the indices illustrated as in the \( 4\times4^3 \) matrix below:

\[ \small
{j:} & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & {…} & 4 & 4 \\
{k:} & 1 & 1 & 1 & 1 & 2 & 2 & 2 & 2 & {…} & 4 & 4 \\
{l:} & 1 & 2 & 3 & 4 & 1 & 2 & 3 & 4 & {…} & 3 & 4 \\
{i:1} & {\{ 1111\} } & {\{ 1112\} } & {\{ 1113\} } & {\{ 1114\} } & {\{ 1121\} } & {\{ 1122\} } & {\{ 1123\} } & {\{ 1124\} } & {…} & {\{ 1443\} } & {\{ 1444\} } \\
{i:2} & {\{ 2111\} } & {\{ 2112\} } & {\{ 2113\} } & {\{ 2114\} } & {\{ 2121\} } & {\{ 2122\} } & {\{ 2123\} } & {\{ 2124\} } & {…} & {\{ 2443\} } & {\{ 2444\} } \\
{i:3} & {\{ 3111\} } & {\{ 3112\} } & {\{ 3113\} } & {\{ 3114\} } & {\{ 3121\} } & {\{ 3122\} } & {\{ 3123\} } & {\{ 3124\} } & {…} & {\{ 3443\} } & {\{ 3444\} } \\
{i:4} & {\{ 3111\} } & {\{ 4112\} } & {\{ 4113\} } & {\{ 4114\} } & {\{ 3121\} } & {\{ 4122\} } & {\{ 4123\} } & {\{ 4124\} } & {…} & {\{ 4443\} } & {\{ 4444\} } \\
We are interested in finding the index location of all the pairs where \( \{i=j,k=l\} \) and their permutations, e.g. {1122},{1212},{2211},{2112},{1221},{2121}.
To do this, we first note the following:

  • There are \( \frac{n!}{2(n-2)!} \) unique pairs.
  • Each pair has 6 permutations.
  • The number of unique pairs whose difference is \( d \) is \( n-d \) e.g. for \( d=1 \),\( n=4 \), we have 3 unique pairs {2,1},{3,2} and {4,3}.
  • The first pair has columnwise vector based index \( v=n+2 \) e.g. for \( n=4 \), the first pair is {2112} in the example matrix above with columnwise vector based index of 6.
  • The \( p^{th} \) unique pair, representing the first in the set of pairs with \( n-d \) differences, has vector based index \( (p-1)+(n+1) \), which given the starting pair with index \( n+2 \) means that we can calculate the indices of each unique pair in the matrix given the previous unique pair’s position.
  • Within each \( p^{th} \) unique pair, there are 6 permutations. Starting from the index of the first pair in the series of 6, denoted \( v_1 \), the second permutation has index \( v_1+((n-1)n)d \), the third \( v_2+(n-1)d \), the fourth \( v_3+d(n+1)(n-1)^2 \), the fifth \( v_4+(n-1)d \) and the sixth \( v_5+((n-1)n)d \).
  • The first pair of the next set of pairs whose difference is \( n-d \) has and index which is equal to \( n^3+n^2+n+1 \) more than the previous pair.

This fast method for calculating the location of each entry and populating it accordingly is already implemented in the rmgarch package for the GO-GARCH (NIG/GH) model. In addition, in order to avoid memory problems when it comes time to perform the kronecker multiplications using the mixing matrix \( \mathbf{A} \) on the factor higher co-moments to arrive at the asset higher co-moments, the method of Buis amd Dyksen (1996) is used which is also implemented in the klin package.

The Higher Moment Time Varying Statistical Factor CAPM

A very interesting application made possible as a result of the model’s properties is the estimation of the (higher moment) time varying betas from the CAPM model. Consider a universe of \( n \) assets with returns \( \mathbf{r}_t \), with benchmark \( b \) whose return is \( r_{b,t} \) and determined by a linear combination based on a pre-specified weighting vector \( \mathbf{w}_t \):
r_{b,t} = \mathbf{w}_t’ \mathbf{r}_t
such that \( \mathbf{w}_t’\mathbf{1}=1 \). It could be the case that \( \mathbf{w}_t \) represents the weights of the benchmark index or some other pre-deteremined weighting scheme (e.g. equal weight). To calculate the CAPM betas with respect to the benchmark, we need not model the pairwise combination of assets-benchmark but instead only the assets which comprise the benchmark. The following formulae outline the steps:

  • Define the conditional covariance: \( M_t^2=AH_tA’ \), where \( {{\mathbf{H}}_{\mathbf{t}}} = E\left[ {{f_t}{{f’}_t}{{\left| \Im \right.}_{t – 1}}} \right], \in {\mathbb{R}^{N \times N}} \)
  • Define the conditional 3rd co-moment: \( {\mathbf{M}}_{\mathbf{t}}^3{\mathbf{ = AM}}_{{\mathbf{f,t}}}^{\mathbf{3}}\left( {{\mathbf{A’}} \otimes {\mathbf{A’}}} \right), \in {\mathbb{R}^{N \times {N^2}}} \)
  • Define the conditional 4th co-moment: \( {\mathbf{M}}_{\mathbf{t}}^4{\mathbf{ = AM}}_{{\mathbf{f,t}}}^4\left( {{\mathbf{A’}} \otimes {\mathbf{A’}} \otimes {\mathbf{A’}}} \right), \in {\mathbb{R}^{N \times {N^3}}} \)

The conditional beta Covariance:
{\beta _{i,t}} = \frac{{E\left[ {\left( {{R_{i,t}} – {{\bar R}_{i,t}}} \right)\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)} \right]}}
{{E\left[ {{{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^2}} \right]}} = \frac{{{\mathbf{m}}_{i,t}^2{\mathbf{w}}}}
where \( {\mathbf{m}}_{i,t}^2: = ro{w_i}{\mathbf{M}}_{\mathbf{t}}^{\mathbf{2}}\left( {i = 1,…,N} \right) \).

The conditional beta Coskewness:
{s_{i,t}} = \frac{{E\left[ {\left( {{R_{i,t}} – {{\bar R}_{i,t}}} \right){{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^2}} \right]}}
{{E\left[ {{{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^3}} \right]}} = \frac{{{\mathbf{m}}_{i,t}^3\left( {{\mathbf{w}} \otimes {\mathbf{w}}} \right)}}
{{{\mathbf{w’M}}_{\mathbf{t}}^{\mathbf{3}}\left( {{\mathbf{w}} \otimes {\mathbf{w}}} \right)}}
where \( {\mathbf{m}}_{i,t}^3: = ro{w_i}{\mathbf{M}}_{\mathbf{t}}^3\left( {i = 1,…,N} \right) \).

The conditional beta Cokurtosis:
{{\text{k}}_{i,t}} = \frac{{E\left[ {\left( {{R_{i,t}} – {{\bar R}_{i,t}}} \right){{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^3}} \right]}}
{{E\left[ {{{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^4}} \right]}} = \frac{{{\mathbf{m}}_{i,t}^4\left( {{\mathbf{w}} \otimes {\mathbf{w}} \otimes {\mathbf{w}}} \right)}}
{{{\mathbf{w’M}}_{\mathbf{t}}^4\left( {{\mathbf{w}} \otimes {\mathbf{w}} \otimes {\mathbf{w}}} \right)}}
where \( {\mathbf{m}}_{i,t}^4: = ro{w_i}{\mathbf{M}}_{\mathbf{t}}^4\left( {i = 1,…,N} \right) \)

Thus, considering that we can include dimensionality reduction at the PCA whitening stage (the n.comp argument in the rmgarch::fastica algorithm), the IFACD (and the non-time varying higher moments restricted GO-GARCH sub-model) provides a clear avenue for estimating a large scale statistical factor based time varying higher moment CAPM model. A demonstration is available here.

Fast weighted VaR calculation using the Cornish-Fisher Expansion

In the IFACD model, the conditional weighted density can be calculated using the inversion of the NIG/GH characteristic function via FFT as discussed in Ghalanos, Rossi and Urga (2014) and the rmgarch vignette (for the GO-GARCH model). Methods for working with the weighted density are already available in the rmgarch package including methods for the density, distribution and quantile (dfft, pfft and qfft) on estimated, forecast and simulated objects. An alternative avenue for the calculation of the weighted quantile is to use the Cornish-Fisher expansion which makes use of the conditional higher moments:
Va{R_{t,\alpha }} = {\mu _t} + {\sigma _t}\left( {\phi + (1 – {\phi ^2})\frac{{{S_t}}}
{6} + \left( {{\phi ^3} – 3\phi } \right)\frac{{{K_t}}}
{{24}} + \left( {5\phi – 2{\phi ^3}} \right)\frac{{S_t^2}}
{{36}}} \right)
where \( \phi={\Phi ^{ – 1}}\left( \alpha \right) \), represents the quantile of the standard normal distribution evaluated at the coverage level \( \alpha \), \( S_t \) the skewness at time \( t \) and \( K_t \) the excess kurtosis at time \( t \). The weighted moments \( (\mu_t,\sigma_t,S_t,K_t) \) can be calculated as follows:
\[ \begin{gathered}
\mu_t = \mathbf{w}_t’ \mathbf{M_{t}^1},\\
\sigma_{{t}}^2 = \mathbf{w}_t'{\mathbf{\Sigma}_{t}}\mathbf{w}_t, \\
{S_{t}} = \frac{{\mathbf{w}_t’\mathbf{M}_{_t}^3(\mathbf{w}_t \otimes \mathbf{w}_t)}}{{{{(\mathbf{w}_t'{\mathbf{\Sigma}_{t}}\mathbf{w}_t)}^{3/2}}}}, \\
{K_{t}} = \frac{{\mathbf{w}_t’\mathbf{M}_{_t}^4(\mathbf{w}_t \otimes \mathbf{w}_t \otimes \mathbf{w}_t)}}{{{{(\mathbf{w}_t'{\mathbf{\Sigma}_{t}}\mathbf{w}_t)}^2}}}, \\
where \( \mathbf{M}_{t}^1 \) is the conditional mean vector, \( \mathbf{M}_{t}^3 \) and \( \mathbf{M}_{t}^4 \) are the third and fourth co-moment matrices described in the previous section, and \( \mathbf{\Sigma}_{t} \) the conditional covariance (the notation \( \mathbf{M}_{t}^2 \) has also been used).
A demonstration is available here.


Buis, P. E., & Dyksen, W. R. (1996). Efficient vector and parallel manipulation of tensor products. ACM Transactions on Mathematical Software (TOMS), 22(1), 18-23.

Ghalanos, A., Rossi, E. & Urga G. (2014). Independent Factor Autoregressive Conditional Density model. Econometric Reviews (forthcoming).

Development Update

This is a quick update regarding the status of my R packages on google code. Since google decided to disallow uploads from Jan-2014 for existing projects, and immediately for new ones (meaning that the tarballs and zips could not be hosted on their servers anymore), I have had no choice but to return the development of the packages to r-forge. All packages are now hosted in the teatime (Tools for Econometric Analysis of TIME series) project.

A new package for dynamic binary models (dbm) is also out and a blog post will follow shortly.

As of 2014, the development repository has moved to bitbucket (see the downloads page for how to download and install).


The Fallacy of 1/N and Static Weight Allocation

In the last few years there has been a increasing tendency to ignore the value of a disciplined quantitative approach to the portfolio allocation process in favor of simple and static weighting schemes such as equal weighting or some type of adjusted volatility weighting. The former simply ignores the underlying security dynamics, assuming equal risk-return, whilst the latter assumes risk is equal to variance in which case it also ignores covariance between securities and other pertinent features of observed security dynamics. It is true that a simplified approach is easier to explain to clients (and easier to understand by most practitioners), and may be justified in cases when there is little data available for modelling the underlying dynamics.

In a much cited paper DeMiguel, Garlappi, and Uppal (2009) argued that it is difficult to beat the equal weighting strategy, considering a number of mostly mean-variance related models on limited history monthly data. The premise of their argument was based on the uncertainty in estimation of quantities such as covariance. While it is certainly true that uncertainty about such quantities is related to the size of the dataset, using a very short history of monthly data to make that point and publish the paper does not appear to me to provide for a scholarly answer to the paper’s title (“Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?”). Additionally, they do not even test the significance in the difference of the Sharpe ratios which makes it rather difficult to draw any conclusions.

In defense of optimization, and a direct reply to that study, which was criticized for using very short histories for the estimation of the covariance, Kritzman, Page, and Turkington (2010) provide a very thorough empirical application across a number of portfolios using the mean-variance criterion, including one comprising daily returns of the point in time constituents of the S&P500 from 1998 to 2008, and show substantially large differences (though again not statistically tested) in Sharpe ratios against the equal weighting strategy. The arguments against 1/N are readily generalized to any static weight allocation scheme, such as employed by most benchmark indices, with clear implications for benchmark policy choice and investment manager reward schemes. The question of the efficiency of benchmark indices was addressed some time ago by Grinold (1992) who asked whether index benchmarks are efficient, using the test of Gibbons, Ross, and Shanken (1989) on the country indices of Australia, Germany, Japan, U.K. and U.S. for the period ending 1991 (with start dates as early as 1973 for the U.S.). He found that 4 out of the 5 indices were not efficient ex-post. More recently, Demey, Maillard, and Roncalli (2010) also argued that neither capitalization nor price weighted indices are efficient, exposing investors to what should be diversifiable risk.

In this blog article, I will try to provide one empirical application, using a popular Global Tactical Asset Allocation model based on an approach proposed by Faber(2007), to show the value of a quantitive approach to asset allocation over the naive 1/N strategy. I make use of the rmgarch and parma packages, together with the blotter package for realistic portfolio accounting. Because of the complexity of the setup, the code to generate the tables and figures will be provided seperately as a link (see bottom of page) rather than displayed inline and there are a number of custom functions written on top of blotter. While the quantstrat package provides one excellent avenue for implementing such strategies, I have not used it in this case, opting for a more custom solution which provides for more transparency in what is actually happening under the hood. Feel free to drop me a note if you find any errors or have any suggestions.

Data Universe Setup

One of the issues in many of the backtests one reads about is a failure to take into account companies which have delisted. This leads to the so called survivorship bias problem when backtesting equities, and is even more pronounced in hedge fund databases where reporting is voluntary (e.g. funds which do badly in one year may decide not to report). A realistic test of the efficiency of a benchmark index would require having access to the point in time constituents of that index, which would probably require access to a proprietary database. In this application, I will instead consider a set of market aggregates, which alleviates this problem, and also provides for two additional benefits. First, the implementation of the strategy becomes scalable and there are no issues of liquidity (for most of the indices), and second, the use of aggregates eliminates some of the idiosyncratic noise found in individual securities which enables the identification of trends using a moving average filter much easier. Because ETF’s usually have a limited history going back to the early 2000’s, I have opted to use where appropriate a Vanguard fund as the instrument of choice for testing. Nevertheless, almost half of the securities do not trade for most of 1990 and early 2000 making it a little difficult to draw firm conclusions for the whole period since portfolio optimization benefits from a large and diversified group of securities.

The universe is comprised of the following assets:

  • VFINX (US Equity s:1987), VEURX (EU Equity s:1990), VPACX (Asia Pacific Equity s:1990), VEIEX (Emerging Market Equity s:1995)
  • SH (ProShares Short S&P 500 s:2006)
  • VGSIX (REIT s:1996)
  • IXC (Global Energy s:2001), GLD (Gold s:2004)
  • VWEHX (High Yield Corporate s:1989), VCLT (Long Term Corporate s:2009)
  • DBV (Carry Trade G10 s:2006)

This represents a select exposure to global Equities, US Corporates, Commodities, US Real Estate and Rates/Currencies. The use of the Short S&P 500 index is to obtain some additional diversification in times of global crises, since the strategy is long only.

Data Universe Characteristics

A first look at some of the statistics of this dataset reveals that it exhibits the typical set of stylized facts.


There is serial correlation (Ljung-Box test) and ARCH effects (ARCH-LM test) in addition to clear rejection of normality (Jarque-Bera test). This is of course not surprising for daily returns, though the question of such effects at lower frequencies is debatable (there is usually not enough data to form adequate tests, whilst aggregation methods require a host of additional assumptions e.g. weak GARCH etc). Given the presence of such such dynamics in the dataset, with varying degrees of skewness and kurtosis across the securities, it is certainly not reasonable to opt for a 1/N approach to the allocation of capital.


The figure shows the pairwise complete correlations of the securities used in this application which range from the highly correlated (Equities, Energy, REITS and Carry Trade) to the low correlation (Fixed Income and Precious Metals) and the negatively correlated (Equity Short).

The Modelling and Optimization Process

There are 2 competing considerations to take into account when modelling security dynamics and the generation of forecasts. Obviously, more data is preferred to less when using such models as GARCH in order to obtain good estimates of the persistence. On the other hand, the higher the frequency of the data the more the noise in any signal generating mechanism such as the moving average system. In addition, we want to optimize and rebalance based on a lower frequency (monthly) basis so as to eliminate transaction costs and turnover. The signal generating mechanism I apply is based on the crossover of a slow and fast exponential moving average on monthly prices, the length of which is determined dynamically at each point in time. This filter is applied to all securities with a minimum of 600 daily data points, and those with a positive signal are then considered part of the modelling universe. If there are 3 or less securities in the universe, the optimization is not undertaken and instead a fixed weight is assigned to each security (and this may add up to less than 100% investment). The modelling of the universe of positive signal prospects is undertaken using an AR(1)-GARCH(1,1)-student model and the joint dynamics using a static student Copula model (using Kendall’s tau for the correlation) and a parametric transformation (effectively a CCC-Student model). In order to generate the monthly forecast scenario from the higher frequency daily data modelling process, 6000 25-ahead simulated points are generated and collapsed by summation to approximate the monthly forecast return distribution. This scenario matrix of size 6000 x n.securities is then optimized using a fractional mean-LPM(4) model with short-sale constraints and position bounds. The LPM(4) model was chosen in order to capture extreme losses below the portfolio mean (the LPM threshold is the portfolio mean), and a minimum bound of 2.5% was also imposed in order that all securities with a positive signal belong to the final trade set. An upper bound of 50% was also imposed on each security as a concentration limit (this is high because of the small universe of securities used and the effective universe once filtered for positive trends). The fractional optimization model and the LPM measure is described in detail in the vignette of the parma package which also provides some insights into the scaling property of this measure in the case when the threshold is the portfolio mean. The steps are detailed below:

  • Convert prices to monthly.
  • Determine, for each security, and each time point, the optimal slow and fast window lengths of the EMA filter using a Total Return to Volatility metric, and generate the signals at the end of each month.
  • Model those securities with positive signals and generate the month ahead simulated scenario density.
  • Optimize the scenario using a fractional mean-LPM(4) model and obtain the allocation weights.


Once the weights are dynamically determined, the model is ready to backtest. The allocation/rebalancing is performed the day after the weights are determined, which in this case is the first tradeable day of the month. In order to provide for a realistic backtest, a $20 roundtrip commission is charged per trade and 1% yearly management fee deducted monthly on a pro-rata basis. The blotter package is used for the order generation and tracking of the portfolio equity, with a number of custom wrapper functions. The same approach is applied to the equal weighted portfolio so that the strategies are directly comparable and individually provide for a realistic representation of their respective performance. In addition to the optimal (OP) and equal weight (EQ) models, an alternative (OPalt) model is also tested which is similar to the OP model but takes into account the forecasted portfolio reward/risk portfolio measure which is generated by the model scenario and optimal weights in order to ‘deleverage’ from a full investment constraint to a 2/3 investment in times when the reward/risk ratio is below ½. The reason I have included this portfolio is in order to gain some additional insight into the timing value of the information in the econometric model as opposed to that of the EMA filter which is known to lag turning points (it is the compromise made for eliminating costly noise in terms of whipsaw patterns).

Figure 1 shows the total number of securities at any given time in the OP portfolio. Notice the red bars represent those times that there were less than 4 securities in the portfolio in which case the optimization was not run and equal weighting given to each security.

Results and Discussion


1994-2013OPaltOPEQS&P500 2000-2013OPaltOPEQS&P500
LW p-value0.1130.154LW p-value0.030.078
2003-2013OPaltOPEQS&P500 2010-2013OPaltOPEQS&P500
LW p-value0.0650.1LW p-value0.0540.098

Tables 1 and 2 present the results for various subperiods of the 2 optimal portfolios, the equal weighted portfolio and the S&P 500 index (as captured by the VFINX fund). I use subperiod analysis since half of the securities in the universe only start in the early to mid 2000s, which means that the 1990-2000 period is contaminated with a small number of assets and hence more likely to have equal weights (when the number of signals is less than 4 the model allocates equally to each). The results provide for a very clear picture of the incremental value of each portfolio. The OP and EW portfolios, over the whole sample appear to have the same drawdowns. This is likely related to the fact that both portfolios have the same signal generating mechanism which somewhat lags the turning points and is based on monthly data (so that daily P&L accounting is likely to reveal larger intra-month fluctuations). The OPalt portfolio on the other hand, which makes use of information from the DGP and optimal weight vector has lower drawdowns which indicates that there is value in making use of information from this model beyond what the signals provide for timing purposes. The Sharpe ratios for the optimal portfolios are higher than that of the equal weighted portfolio in all subperiods, but significantly different at the 10% level only the period after 2000 (the ‘LW p-value’ row contains the p-values from the test of Ledoit and Wolf (2008) of the optimal portfolio against the equal weighted portfolio under the null hypothesis that their Sharpe ratio differences are zero), again for the reasons hypothesized earlier. Overall, one may cautiously conclude from this small example that there is value in adopting a disciplined quantitative approach to the allocation of capital beyond the naive 1/N approach.

Figure 2a gives the contribution to P&L of each security over 2 subperiods, whilst Figure 2b gives the active contribution (over the equal weighted portfolio) of each security to P&L, which gives some additional insight to where the bets where concentrated.



Figure 3a shows the average yearly allocation per security of the optimal portfolio, whilst Figure 3b the active allocation (over the equal weighted portfolio). The figures show a clear difference between the optimal and equal allocation methods, with the former having made positive active bets on Gold and High Yield Corporates over the entire period, and underweighted Asia Pacific Equities in the 1990s.



Finally, Figure 4 provides the evolution of Terminal Wealth for the 3 portfolios, though this is always somewhat less informative than the subperiod analysis since it always depends on the starting point and will not usually be a good indicator of the average risk-return profile, as can be deduced from the OPalt portfolio which has the lowest Terminal Wealth but the highest Sharpe Ratio and lowest drawdowns.



There is really no substitute for a disciplined approach to the modelling of security dynamics and stochastic programming to obtain the optimal weights. Even when there is not enough data, it is certainly possible to map the data available to either a set of higher frequency factors or closely related prospects with more data. This article provided for a very simple example of what is possible. Interesting extensions to the approach adopted might include:

  • A more sophisticated signal generating mechanism, taking into account explanatory variables (not just looking at the price trend) for turning point or directional analysis.
  • More advanced money management overlay to limit intra-month losses. The blotter wrapper function allows for this and a simple example is included (though not used) in the code. To my knowledge it remains an open question whether stop losses add or subtract value, and it is my experience that including them within a strategy for intra-month risk reduction should be done with caution.
  • No short sale constraint. Technically, there is no reason not to remove the constraint, but the realistic modelling of this is more complicated because of the need to track margin etc. The blotter wrapper is not yet setup to deal with this case.


There are 5 files in the zip (unstarched_code). Place them in an empty directory which you will designate as your working directory in R (‘setwd’). The main file which you will need to run is the ‘unstarched_script_run.R’ file. Be patient, it takes some time to complete, mainly because of the portfolio trades/accounting, the progress of which will be displayed on screen. I estimate about 60 minutes to complete.


Demey, P., Maillard, S., & Roncalli, T. (2010). Risk-based indexation. Available at SSRN 1582998.

DeMiguel, V., Garlappi, L., & Uppal, R. (2009). Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy?. Review of Financial Studies, 22(5), 1915-1953.

Faber, M. T. (2007). A quantitative approach to tactical asset allocation. Journal of Wealth Management.

Gibbons, M. R., Ross, S. A., & Shanken, J. (1989). A test of the efficiency of a given portfolio. Econometrica: Journal of the Econometric Society, 1121-1152.

Grinold, R. C. (1992). Are Benchmark Portfolios Efficient?. The Journal of Portfolio Management, 19(1), 34-40.

Kritzman, M., Page, S., & Turkington, D. (2010). In defense of optimization: the fallacy of 1/N. Financial Analysts Journal, 66(2), 31.

Ledoit, O., & Wolf, M. (2008). Robust performance hypothesis testing with the Sharpe ratio. Journal of Empirical Finance, 15(5), 850-859.

Time Varying Higher Moments with the racd package.

The Autoregressive Conditional Density (ACD) model of Hansen (1994) extended GARCH models to include time variation in the higher moment parameters. It was a somewhat natural extension to the premise of time variation in the conditional mean and variance, though it probably raised more questions than it, or subsequent research have been able to answer. In my mind, some of these questions are :

  • What is the rationale and source for time variation in the skew and shape parameters of a conditional density?
  • Can one single source of randomness adequately model both the odd and even moment variation?
  • What is the impact on the coefficients of the lower moments?
  • What are the implications of the time varying density for the standardized residuals on long run relationships, simulation methods and the consistency of the parameters?
  • Do the marginal benefits, in applied work, outweigh the costs of the difficult estimation?

The first question is quite open and difficult to answer. In a GARCH model, most of the variation in the underlying hypothesized density, its expansion and contraction with the news arrival, is captured by the conditional variance dynamics. Given a suitably chosen skewed and shaped distribution, extreme events can also be accommodated within this framework as can changes in the asymmetry of the distribution. If we believe that there is value in allowing higher moment parameters to vary, which we can at times infer from the estimation and forecast performance, is this the result of some underlying inadequacy of the GARCH model or does the ‘true’ and unknown data generating process actually contain such dynamics (and why)?

For the second question, stochastic volatility models have already provided one way to test this at least for GARCH models, though at a considerable computational cost. For ACD models, this is probably still an open research question.

For the third question, some evidence was presented in Harvey and Siddique (1999) where the inclusion of time varying skewness affected the persistence of the conditional variance and caused some of the asymmetries in the variance to disappear (through a reduced asymmetry coefficient in the variance dynamics). I would also actually expect that for standard GARCH models with very high persistence, the inclusion of time variation in the shape parameter (and hence more directly on the higher even moments) to lead to a reduction in such persistence as some of the extreme variations from shocks could be better accommodated. I also wonder whether the inclusion of time variation in the skew parameter would have an impact on the first moment parameters, particularly when their dynamics might include threshold effects.

The fourth question is very difficult to answer, particularly the consistency of the parameters. There is little theoretical work in the ACD literature on this and there have only been some footnotes in the research about simulations confirming that the parameters of ACD models tested had the standard \( \sqrt(N) \) consistency. I remain very skeptical on this point considering the ‘3-dimensional’ nature of ACD simulation. That is, for each period \( t \) that is generated in a simulation, the density of the standardized residuals is varying, unlike GARCH model simulation where a long path results in a sampling from the standardized distribution.

The final question is the one most recent papers have focused on, and an extensive literature is available in the vignette of the racd package. The results are mixed, though one would not necessarily conclude this from just reading the abstract of any one of the papers where there is the natural tendency to portray the results in the best possible light.

Following the rather long intro, the remaining article will provide for an introduction to the usage of the racd package which shares many features with the rugarch package (since it extends it). If you have some unique insights into these models, would like to add something or comment, feel free to drop me an email. The racd package is currently hosted in the teatime repository on r-forge and there are no plans at present to release it to CRAN (see r-downloads).

The Model Specification

The model specification follows closely the implementation in the rugarch package with the exception of the ‘distribution.model’ option which is now a list with a number of choices for defining the conditional skew and shape dynamics. For the conditional variance equation, the GARCH models which are included are the standard GARCH (‘sGARCH’), component GARCH (‘csGARCH’) and the multiplicative component GARCH (‘mcsGARCH’) for intraday data. The reason for only including these models has to do with the calculation of their long run properties and persistence which do not require simulation in light of the time variation of the higher moments.

# setup
plot2xts = function(x, ...) {
 plot(x, y = NULL, type = 'l', auto.grid = FALSE, major.ticks = 'auto', minor.ticks = FALSE,
 major.format = FALSE, bar.col = 'grey', candle.col = 'white', ann = TRUE,
 axes = TRUE, cex.main = 0.9, cex.axis = 0.9, ...)
## function (variance.model = list(model = 'sGARCH', garchOrder = c(1,
## 1), external.regressors = NULL, variance.targeting = FALSE),
## mean.model = list(armaOrder = c(1, 1), include.mean = TRUE,
## archm = FALSE, arfima = FALSE, external.regressors = NULL),
## distribution.model = list(model = 'snorm', skewOrder = c(1,
## 1, 1), skewshock = 1, skewshocktype = 1, skewmodel = 'quad',
## skew.regressors = NULL, shapeOrder = c(0, 1, 1), shapeshock = 1,
## shapeshocktype = 1, shapemodel = 'quad', shape.regressors = NULL,
## exp.rate = 1), = list(), = list())

The distribution.model list contains the details of the conditional higher moment specification:

  • model. This is the conditional distribution, with the same choice of skewed and shaped distributions as in the rugarch package.
  • skewOrder (skewmodel) and shapeOrder (shapemodel). Denote the order of the skew and shape (models) under the different parameterizations available and described in the package’s vignette. Not all models have 3 parameters. For example, the ‘xar’ shape model has 2 parameters whilst the ‘xar’ in skew has 3 (since it is based on the signed rather than absolute shocks).
  • skewshocktype and shapeshocktype. A value of 1 denotes the use of squared shocks while any other value denotes absolute value shocks.
  • skewshock and shapeshock. A value of 1 denotes the use of the standardized residuals as shocks while any other value denotes the use of the residuals.
  • skew.regressors and shape.regressors. A matrix of regressors for the skew and shape dynamics. This should be considered experimental at present, until further testing.
  • exp.rate. The scaling value for the exponential transformation of the unbounded shape dynamics (the skew dynamics use a logistic transformation without extra parameters.)

For the test example, I’ll use the following specification:

spec = acdspec(mean.model = list(armaOrder = c(0, 1)), variance.model = list(variance.targeting = TRUE),
distribution.model = list(model = 'nig', skewOrder = c(1, 1, 1), shapeOrder = c(1,1,1), skewmodel = 'quad', shapemodel = 'pwl'))

The Estimation

As described in the vignette, estimation in ACD models is highly nonlinear and there is no guarantee of a global optimum. For this reason, it is suggested that the problem is estimated from different starting parameters which is why the routine includes a number of solvers preceded by ‘ms’ to denote a multistart strategy. The included solvers are optim (and msoptim), ucminf (and msucminf), solnp (and mssolnp), nlminb (and msnlminb) and a local implementation of cmaes (bound constrained global solver). For the unconstrained solvers (optim and ucminf), the parameters are transformed into a bounded domain via the logistic map. The use of parallel evaluation in the multistart solvers is enabled by passing a cluster object from the parallel package, as the example which follows illustrates:

cl = makePSOCKcluster(10)
fit = acdfit(spec, sp500ret, solver = 'msoptim', solver.control = list(restarts = 10),
    cluster = cl)
## *---------------------------------*
## *          ACD Model Fit          *
## *---------------------------------*
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model  : sGARCH(1,1)
## Mean Model   : ARFIMA(0,0,1)
## Distribution : nig
## Conditional Skew Dynamics
## -----------------------------------
## ACD Skew Model   : quad(1,1,1)
## Shock type:  Std.Residuals
## Conditional Shape Dynamics
## -----------------------------------
## ACD Shape Model  : pwl(1,1,1)
## Shock type:  Std.Residuals
## Optimal Parameters
## ------------------------------------
##           Estimate  Std. Error  t value Pr(>|t|)
## mu        0.000579    0.000109   5.3235 0.000000
## ma1       0.009191    0.014274   0.6439 0.519643
## alpha1    0.089522    0.008213  10.8999 0.000000
## beta1     0.902153    0.009429  95.6765 0.000000
## skcons   -0.080024    0.031859  -2.5118 0.012011
## skalpha1  0.284640    0.055745   5.1061 0.000000
## skgamma1  0.027768    0.013604   2.0412 0.041233
## skbeta1   0.570018    0.093124   6.1210 0.000000
## shcons    0.443515    0.193872   2.2877 0.022157
## shalpha1  0.014926    0.018471   0.8081 0.419033
## shgamma1  0.783159    0.119464   6.5556 0.000000
## shbeta1   0.657811    0.088193   7.4588 0.000000
## omega     0.000001          NA       NA       NA
## Robust Standard Errors:
##           Estimate  Std. Error  t value Pr(>|t|)
## mu        0.000579    0.000107  5.43342 0.000000
## ma1       0.009191    0.014895  0.61709 0.537178
## alpha1    0.089522    0.010646  8.40916 0.000000
## beta1     0.902153    0.012237 73.72243 0.000000
## skcons   -0.080024    0.030544 -2.61994 0.008795
## skalpha1  0.284640    0.060626  4.69498 0.000003
## skgamma1  0.027768    0.015036  1.84676 0.064782
## skbeta1   0.570018    0.087479  6.51609 0.000000
## shcons    0.443515    0.318005  1.39468 0.163112
## shalpha1  0.014926    0.009775  1.52695 0.126773
## shgamma1  0.783159    0.144317  5.42667 0.000000
## shbeta1   0.657811    0.146672  4.48492 0.000007
## omega     0.000001          NA       NA       NA
## LogLikelihood : 18154
## Information Criteria
## ------------------------------------
##                  ACD   GARCH
## Akaike       -6.5697 -6.5540
## Bayes        -6.5554 -6.5480
## Shibata      -6.5698 -6.5540
## Hannan-Quinn -6.5647 -6.5519
## [truncated remaining output]
## Elapsed time : 1.818

As can be inferred from the robust standard error, the majority of the higher moment parameters are significant, and the information criteria indicate an improvement over the non time varying GARCH equivalent model.
There are a number of extractor functions, in addition to the standard ones such as ‘fitted’,’sigma’,’residuals’, ‘quantile’ and ‘pit’, such as ‘shape’ and ‘skew’ which extract the conditional time varying skew and shape xts vectors with option for returning either the ‘transformed’ (default TRUE) or unbounded values. Additionally, the methods ‘skewness’ and ‘kurtosis’ return the implied conditional time varying skewness and excess kurtosis xts vectors. The following figure provides a visual illustration of the estimated dynamics:

par(mfrow = c(3, 2), mai = c(0.75, 0.75, 0.3, 0.3))
plot2xts(fitted(fit), col = 'steelblue', main = 'Conditional Mean')
plot2xts(abs(as.xts(sp500ret)), col = 'grey', main = 'Conditional Sigma')
lines(sigma(fit), col = 'steelblue')
plot2xts(skew(fit, transform = FALSE), col = 'grey', main = 'Skew')
lines(skew(fit), col = 'steelblue')
legend('topleft', c('Unbounded', 'Bounded'), col = c('grey', 'steelblue'), lty = c(1,1), bty = 'n', cex = 0.8)
plot2xts(shape(fit, transform = FALSE), col = 'grey', main = 'Shape', ylim = c(0,10))
lines(shape(fit), col = 'steelblue')
legend('topleft', c('Unbounded', 'Bounded'), col = c('grey', 'steelblue'), lty = c(1,1), bty = 'n', cex = 0.8)
plot2xts(skewness(fit), col = 'steelblue', main = 'Skewness')
plot2xts(kurtosis(fit), col = 'steelblue', main = 'Kurtosis (ex)')


Forecasting and Filtering

Forecasting in the racd package can only be done on an estimated (ACDfit) object (unlike rugarch where a specification object can also be used), but for 1-ahead forecasting it is possible to use the acdfilter method instead. For n-ahead (n>1) forecasts, for the higher moment dynamics, this is done via simulation as there is no closed form solution as explained in the vignette. There is nothing particularly interesting to note about the acdforecast method here so I will go directly to the rolling forecast and backtesting method (acdroll). This has a number of extra options compared to the ugarchroll method and these are explained in detail in the vignette. Suffice to say, these extra options are related to restrictions on the dynamics to facilitate convergence.

roll = acdroll(spec, sp500ret, n.start = 2000, refit.every = 100, refit.window = 'recursive',
    solver = 'msoptim', solver.control = list(restarts = 10), cluster = cl,
    fixARMA = TRUE, fixGARCH = TRUE, fixUBShape = TRUE, UBSHapeAdd = 3, compareGARCH = 'LL')
# check convergence(roll) if it is not zero, resubmit via the &apos;resume&apos;
# method.
gspec = ugarchspec(mean.model = list(armaOrder = c(0, 1)), variance.model = list(variance.targeting = TRUE), distribution = 'nig')
rollg = ugarchroll(gspec, sp500ret, n.start = 2000, refit.every = 100, refit.window = 'recursive')
## *-------------------------------------*
## *              ACD Roll               *
## *-------------------------------------*
## No.Refits        : 36
## Refit Horizon    : 100
## No.Forecasts     : 3523
## GARCH Model      : sGARCH(1,1)
## Mean Model       : ARFIMA(0,0,1)
## ACD Skew Model   : pwl(1,1,1)
## ACD Shape Model  : pwl(1,1,1)
## Distribution     : nig
## Forecast Density:
##               Mu  Sigma   Skew  Shape Shape(GIG) Realized
## 1995-02-03 2e-04 0.0055 0.0501 0.6855          0   0.0123
## 1995-02-06 2e-04 0.0060 0.3130 0.1493          0   0.0052
## 1995-02-07 2e-04 0.0060 0.3292 0.4318          0  -0.0007
## 1995-02-08 3e-04 0.0059 0.2219 0.8460          0   0.0008
## 1995-02-09 3e-04 0.0058 0.1515 0.9846          0  -0.0021
## 1995-02-10 3e-04 0.0057 0.0580 1.0217          0   0.0026
## ..........................
##                 Mu  Sigma    Skew  Shape Shape(GIG) Realized
## 2009-01-23  0.0008 0.0273 -0.1161 1.2626          0   0.0054
## 2009-01-26  0.0003 0.0264 -0.1010 1.6388          0   0.0055
## 2009-01-27  0.0003 0.0255 -0.0890 1.7417          0   0.0109
## 2009-01-28  0.0001 0.0248 -0.0623 1.6254          0   0.0330
## 2009-01-29 -0.0004 0.0253  0.0364 0.6087          0  -0.0337
## 2009-01-30  0.0013 0.0258 -0.1027 1.3595          0  -0.0231
## Elapsed: 31.52 mins
vartable = rbind( = 0.01, actual = roll@forecast$VaR[,
    'realized'], VaR = roll@forecast$VaR[, 'alpha(1%)'])[c(1, 2, 11, 12)], row.names = c('ACD(1%)')), = 0.05, actual = roll@forecast$VaR[, 'realized'],
        VaR = roll@forecast$VaR[, 'alpha(5%)'])[c(1, 2, 11, 12)], row.names = c('ACD(5%)')), = 0.01, actual = rollg@forecast$VaR[, 'realized'],
        VaR = rollg@forecast$VaR[, 'alpha(1%)'])[c(1, 2, 11, 12)], row.names = c('GARCH(1%)')), = 0.05, actual = rollg@forecast$VaR[, 'realized'],
        VaR = rollg@forecast$VaR[, 'alpha(5%)'])[c(1, 2, 11, 12)], row.names = c('GARCH(5%)')))
print(vartable, digits = 3)
##           expected.exceed actual.exceed cc.LRp       cc.Decision
## ACD(1%)                35            28  0.357 Fail to Reject H0
## ACD(5%)               176           203  1.000 Fail to Reject H0
## GARCH(1%)              35            26  0.215 Fail to Reject H0
## GARCH(5%)             176           191  0.571 Fail to Reject H0
print(rbind(, tail.test = TRUE, alpha = 0.01)[c(1, 2, 4, 6)], row.names = 'ACD'),, tail.test = TRUE, alpha = 0.01)[c(1, 2, 4, 6)], row.names = 'GARCH')), digits = 4)
##          uLL    rLL     LRp            Decision
## ACD   -163.2 -164.3 0.35633 fail to reject NULL
## GARCH -157.8 -160.3 0.07761 fail to reject NULL
HL = cbind(HLTest(as.numeric(pit(roll)))$statistic, HLTest(as.numeric(pit(rollg)))$statistic)
colnames(HL) = c('ACD', 'GARCH')
print(HL, digits = 4)
##            ACD  GARCH
## M(1,1)  0.2349  5.787
## M(2,2)  6.2283 19.640
## M(3,3)  9.6166 26.599
## M(4,4) 11.2113 28.929
## M(1,2)  0.8564  4.306
## M(2,1)  8.4187 24.476
## W      18.3033 11.897

At the 1% and 5% coverage levels, neither the ACD nor GARCH models can be rejected, where the null is the correct number of and independence of the VaR exceedances, with higher p-values for the ACD model. In terms of the goodness of fit of the tail of the density, and using the tail test of Berkowitz (2001) at the 1% quantile, the ACD model appears to generate a significantly higher p-value than the GARCH model which can be rejected at the 10% level of significance. Note that the ‘pit’ method returns the probability integral transformation of the realized data given the conditional forecasted density. Another test which uses the ‘pit’ method is that of Hong and Li (2005), a Portmanteau type test, which evaluates the goodness of fit on the whole density. While both models are rejected as providing a ‘correct fit’ (W statistic), the indications from the moment based statistics (M statistics) indicate that the ACD model has significantly better fit, as can be inferred from the lower values (the statistics are distributed as N(0,1)).


Unlike GARCH models where there is one call to the random number of size n.sim, in ACD models there are n.sim calls of size 1 because of the time variation in the conditional standardized residuals density. Simulation may be carried out either from a fitted object (using acdsim) or from a specification object (using acdpath). For the latter, this is not enabled for the mcsGARCH model. The following example provides a short illustration of the method and shows how to obtain equivalence between the simulation from fit and spec:

sim1 = acdsim(fit, n.sim = 250, m.sim = 5, rseed = 100:104)
# re-define the spec without variance targeting (only used in estimation
# routine)
spec = acdspec(mean.model = list(armaOrder = c(0, 1)), variance.model = list(variance.targeting = FALSE),
    distribution.model = list(model = 'nig', skewOrder = c(1, 1, 1), shapeOrder = c(1,
        1, 1), skewmodel = 'quad', shapemodel = 'pwl'))
setfixed(spec) < - as.list(coef(fit))
sim2 = acdpath(spec, n.sim = 250, m.sim = 5, rseed = 100:104, prereturns = tail(sp500ret[,1], 1), presigma = as.numeric(tail(sigma(fit), 1)), preshape = as.numeric(tail(shape(fit),1)), preskew = as.numeric(tail(skew(fit), 1)))
# check
c(all.equal(fitted(sim1), fitted(sim2)), all.equal(sigma(sim1), sigma(sim2)), all.equal(skew(sim1), skew(sim2)), all.equal(shape(sim1), shape(sim2)))
S = skewness(sim1)
K = kurtosis(sim1)
R = fitted(sim1)
V = sigma(sim1)
par(mfrow = c(2, 2), mai = c(0.75, 0.75, 0.3, 0.3), cex.axis = 0.8)
matplot(S, type = 'l', col = 1:5, main = 'Simulated Skewness', xlab = '')
matplot(K, type = 'l', col = 1:5, main = 'Simulated Kurtosis (ex)', xlab = '')
matplot(apply(R, 2, 'cumsum'), type = 'l', col = 1:5, main = 'Simulated Paths',
    ylab = '', xlab = '')
matplot(V, type = 'l', col = 1:5, main = 'Simulated Sigma', xlab = '')



Time varying higher moments within a GARCH modelling framework (ACD) provide for a natural extension to time variation in the conditional mean and variance. Whether the added marginal benefits of their inclusion justify the complexity in their estimation remains an open empirical question which hopefully the racd package will enable to be researched in greater depth and transparency.


Berkowitz, J. (2001). Testing density forecasts, with applications to risk management. Journal of Business & Economic Statistics, 19(4), 465-474.

Hansen, B. E. (1994). Autoregressive Conditional Density Estimation. International Economic Review, 35(3), 705-30.

Harvey, C. R., & Siddique, A. (1999). Autoregressive conditional skewness. Journal of financial and quantitative analysis, 34(4), 465-497.

Hong, Y., and Li, H. (2005), Nonparametric specification testing for continuous-time models with applications to term structure of interest rates, Review of Financial Studies, 18(1), 37-84.

High Frequency GARCH: The multiplicative component GARCH (mcsGARCH) model

The interest in high frequency trading and models has grown exponentially in the last decade. While I have some doubts about the validity of any signals emerging from all the noise at higher and higher frequencies, I have nevertheless decided to look at the statistical modelling of intraday returns using GARCH models. Unlike daily and lower frequency returns, intraday data has certain particular characteristics which make the use of standard modelling approaches invalid. For the purpose of this article, I will use 1-min returns for Citigroup for the period 02-Jan-2008 to 29-Feb-2008, which is available to download here with permission from QuantQuote. This dataset is supplied dividend and split adjusted and with outliers removed. The intraday time range considered is 09:30 to 16:00, the official NYSE trading hours. As with most such studies on intraday data modelling, the first return of the day is removed. Daily data for the symbol is downloaded from Yahoo finance.

Sys.setenv(TZ = 'GMT')
R_i = read.csv('C_2008_1minret.csv')
R_i = xts(R_i[, 2], as.POSIXct(R_i[, 1]))
getSymbols('C', from = '2000-01-01')
C = adjustOHLC(C, use.Adjusted = TRUE)
R_d = ROC(Cl(C), na.pad = FALSE)

Consider the correlogram of the absolute 1-min returns for Citigroup during the sample period in question:

par(cex.main = 0.85, col.main = 'black')
acf(abs(as.numeric(R_i)), lag.max = 4000, main = '1-min absolute returns\nCitigroup (2008 Jan-Feb)', cex.lab = 0.8)


The regular pattern is quite clear, repeating approximately every 390 periods (1-day) and showing an increase in volatility around the opening and closing times. GARCH, and more generally ARMA type models can only handle an exponential decay, and not the type of pattern seen here. Several approaches have been suggested in the literature in order to de-seasonalize the absolute returns such as the flexible Fourier method of Andersen and Bollerslev (1997), and the periodic GARCH model of Bollerslev and Ghysels (1996). Unfortunately I have found none of these, or closely related models particularly easy to work with. More recently, Engle and Sokalska (2012) (henceforth ES2012) introduced the multiplicative component GARCH model as a parsimonious alternative, which I have now included in the rugarch package (ver 1.01-6). This article discusses its implementation, challenges and specific details of working with this model, which allows a rather simple but powerful way to use GARCH for regularly spaced intraday returns.

The Model

Consider the continuously compounded return \( r_{t,i} \), where \( t \) denotes the day and \( i \) the regularly spaced time interval at which the return was calculated. Under this model, the conditional variance is a multiplicative product of daily, diurnal and stochastic (intraday) components, so that the return process may be represented as:
\[ \begin{gathered}
{r_{t,i}} = {\mu _{t,i}} + {\varepsilon _{t,i}}\\
{\varepsilon _{t,i}} = \left( {{q_{t,i}}{\sigma _t}{s_i}} \right){z_{t,i}}
\end{gathered} \]
where \( q_{t,i} \) is the stochastic intraday volatility, \( \sigma_t \) a daily exogenously determined forecast volatility, \( s_i \) the diurnal volatility in each regularly spaced interval \( i \), \( z_{t,i} \) the i.i.d (0,1) standardized innovation which conditionally follows some appropriately chosen distribution. In ES2012, the forecast volatility \( \sigma_t \) is derived from a multifactor risk model externally, but it is just as possible to generate such forecasts from a daily GARCH model. The seasonal (diurnal) part of the process is defined as:
\[ {s_i} = \frac{1}{T}\sum\limits_{t = 1}^T {\left( {\varepsilon _{_{t,i}}^2/\sigma _t^2} \right)}. \]

Dividing the residuals by the diurnal and daily volatility gives the normalized residuals (\( \bar\varepsilon \)):
\[ {{\bar \varepsilon }_{t,i}} = {\varepsilon _{t,i}}/\left( {{\sigma _t}{s_i}} \right) \]
which may then be used to generate the stochastic component of volatility \( q_{t,i} \) with GARCH motion dynamics. In the rugarch package, unlike the paper of ES2012, the conditional mean and variance equations (and hence the diurnal component on the residuals from the conditional mean filtration) are estimated jointly. Furthermore, and unlike ES2012, it is possible to include ARMAX dynamics in the conditional mean, though because of the complexity of the model and its use of time indices, ARCH-m is not currently allowed. Finally, as an additional departure from ES2012, the diurnal component in the rugarch package is estimated using the median rather than the mean function (since version 1.2-3), providing a more robust alternative given the type and length of the data typically used. The next sections provide a demonstration of the model using the Citigroup dataset.


Unlike all other GARCH models implemented in rugarch, the mcsGARCH model requires the user to pass an xts object of the forecast daily variance of the data for the period under consideration.

# Find the unique days in the intraday sample
n = length(unique(format(index(R_i), '%Y-%m-%d')))
# define a daily spec
spec_d = ugarchspec(mean.model = list(armaOrder = c(1, 1)), variance.model = list(model = 'eGARCH', garchOrder = c(2, 1)), distribution = 'nig')
# use the ugarchroll method to create a rolling forecast for the period in
# question:
roll = ugarchroll(spec_d, data = R_d['/2008-02-29'], forecast.length = n, refit.every = 5, refit.window = 'moving', moving.size = 2000, calculate.VaR = FALSE)
# extract the sigma forecast
df =
f_sigma = as.xts(df[, 'Sigma', drop = FALSE])
# now estimate the intraday model
spec = ugarchspec(mean.model = list(armaOrder = c(1, 1), include.mean = TRUE), variance.model = list(model = 'mcsGARCH'), distribution = 'nig')
# DailyVar is the required xts object of the forecast daily variance
fit = ugarchfit(data = R_i, spec = spec, DailyVar = f_sigma^2)

The following plots show the decomposition of the volatility into its various components. The regular pattern of the Total volatility would have been impossible to capture using standard GARCH models. Note that although the volatility series are stored as xts objects, they cannot be properly plotted using the standard plot.xts function which is why I make use of the axis function with a numeric series.

ep < - axTicksByTime(fit@model$DiurnalVar)
par(mfrow = c(4, 1), mar = c(2.5, 2.5, 2, 1))
plot(as.numeric(fit@model$DiurnalVar^0.5), type = 'l', main = 'Sigma[Diurnal]', col = 'tomato1', xaxt = 'n', ylab = 'sigma', xlab = ' ')
axis(1, at = ep, labels = names(ep), tick = TRUE)
plot(as.numeric(fit@model$DailyVar^0.5), type = 'l', main = 'Sigma[Daily-Forecast]', col = 'tomato2', xaxt = 'n', ylab = 'sigma', xlab = ' ')
axis(1, at = ep, labels = names(ep), tick = TRUE)
plot(fit@fit$q, type = 'l', main = 'Sigma[Stochastic]', col = 'tomato3', xaxt = 'n', ylab = 'sigma', xlab = ' ')
axis(1, at = ep, labels = names(ep), tick = TRUE)
plot(as.numeric(sigma(fit)), type = 'l', main = 'Sigma[Total]', col = 'tomato4', xaxt = 'n', ylab = 'sigma', xlab = ' ')
axis(1, at = ep, labels = names(ep), tick = TRUE)



The biggest challenge in writing code for the forecast was dealing with the aligning and matching of times, particularly future time/dates, since the model depends on the diurnal component which is time specific. As a key component of the forecast routine, I wrote a little function which creates a sequence of time/dates, similar to seq.POSIXt, but with the extra option of defining the time interval which dictates the start and end of the trading day. For example, considering the opening and closing times of the NYSE, 09:30 to 16:00, I would like to be able to create a set of n future periods starting from T0 within only this interval, and excluding weekends. The function is defined as:
ftseq(T0, length.out, by, interval, exclude.weekends = TRUE)
where T0 is a POSIXct date/time of the starting period, length.out the periods ahead to consider, by the difftime (e.g. “mins”), and interval a character vector of the start and end times which T0 must belong to and is a multiple of by.

# create the interval
interval = format(seq(as.POSIXct('2008-01-02 09:31:00'), as.POSIXct('2008-01-02 16:00:00'), by = 'min'), '%H:%M:%S')
ForcTime = ftseq(T0 = as.POSIXct('2008-02-29 16:00:00'), length.out = 390 * 2 + 1, by = 'mins', interval = interval)
tail(ForcTime, 25)
##  [1] '2008-03-04 15:37:00 GMT' '2008-03-04 15:38:00 GMT'
##  [3] '2008-03-04 15:39:00 GMT' '2008-03-04 15:40:00 GMT'
##  [5] '2008-03-04 15:41:00 GMT' '2008-03-04 15:42:00 GMT'
##  [7] '2008-03-04 15:43:00 GMT' '2008-03-04 15:44:00 GMT'
##  [9] '2008-03-04 15:45:00 GMT' '2008-03-04 15:46:00 GMT'
## [11] '2008-03-04 15:47:00 GMT' '2008-03-04 15:48:00 GMT'
## [13] '2008-03-04 15:49:00 GMT' '2008-03-04 15:50:00 GMT'
## [15] '2008-03-04 15:51:00 GMT' '2008-03-04 15:52:00 GMT'
## [17] '2008-03-04 15:53:00 GMT' '2008-03-04 15:54:00 GMT'
## [19] '2008-03-04 15:55:00 GMT' '2008-03-04 15:56:00 GMT'
## [21] '2008-03-04 15:57:00 GMT' '2008-03-04 15:58:00 GMT'
## [23] '2008-03-04 15:59:00 GMT' '2008-03-04 16:00:00 GMT'
## [25] '2008-03-05 09:31:00 GMT'

As can be seen, the first time is immediately after T0 (T0 is not included), and the sequence only runs for the defined interval, and optionally (default TRUE) skips weekends. This comes in very handy in the forecast routine.

Like the estimation method, the forecast routine also requires that you supply the forecast volatility for the period under consideration. However, because of the possible use of the out.sample in the estimation routine, it is not known beforehand whether this will eventually be needed since there may be enough intraday data in the out.sample period and the combination of n.ahead+n.roll chosen so that this user supplied forecast is not required. If you do not supply it and it is needed the routine will check and let you know with an error message. Finally, the presence of the diurnal component complicates the long run unconditional forecast of the underlying variance, so that the use of the uncvariance and related methods will always return the value for the component variance rather than the actual total variance (unlike the csGARCH model which returns both).

fit2 = ugarchfit(data = R_i, spec = spec, DailyVar = f_sigma^2, out.sample = 300)
# won't supply DailyVar to get an error
forc = ugarchforecast(fit2, n.ahead = 10, n.roll = 299)
## Error: DailyVar requires forecasts for: 2008-03-03 ...resubmit.

Notice the error which indicates we need 2008-03-03 forecast. Since we don’t have it, we re-estimate with ugarchforecast:

fit_d = ugarchfit(spec_d, data = R_d['2002/2008-02-29'])
forc_d = ugarchforecast(fit_d, n.ahead = 1)
f_sigma = xts(as.numeric(sigma(forc_d)), as.POSIXct('2008-03-03'))
# intraday forecast
forc = ugarchforecast(fit2, n.ahead = 10, n.roll = 299, DailyVar = f_sigma^2)
## *------------------------------------*
## *       GARCH Model Forecast         *
## *------------------------------------*
## Model: mcsGARCH
## Horizon: 10
## Roll Steps: 299
## Out of Sample: 10
## 0-roll forecast [T0=2008-02-29 11:00:00]:
##          Series Sigma[Total] Sigma[Stochastic]
## T+1  -7.681e-06    0.0015132            0.8702
## T+2  -7.664e-06    0.0057046            0.8718
## T+3  -7.657e-06    0.0055551            0.8734
## T+4  -7.654e-06    0.0058834            0.8750
## T+5  -7.653e-06    0.0063295            0.8766
## T+6  -7.653e-06    0.0013036            0.8781
## T+7  -7.653e-06    0.0012846            0.8797
## T+8  -7.653e-06    0.0011227            0.8812
## T+9  -7.652e-06    0.0008177            0.8827
## T+10 -7.652e-06    0.0009259            0.8842

Note that plot methods for this model are not yet fully implemented for reasons described previously.


Unlike standard GARCH simulation, the interval time is important in intraday GARCH since we are generating paths which follow very specific regularly sampled time points. Additionally, simulated or forecast daily variance needs to again be supplied for the simulation period under consideration. This is an xts object, and can also optionally have m.sim columns so that each independent simulation is based on the adjusted residuals by an independent simulation of the daily variance. The following example code shows the simulation of 10,000 points at 1-min intervals into the future and illustrates the effect of the seasonality component:

T0 = tail(index(R_i), 1)
# model$dtime contains the set of unique interval points in the dataset
# (and available from all rugarch objects for the mcsGARCH model)
# model$dvalues contains the diurnal component for each interval
ftime = ftseq(T0, length.out = 10000, by = fit@model$modeldata$period, interval = fit@model$dtime)
dtime = unique(format(ftime, '%Y-%m-%d'))
sim_d = ugarchsim(fit_d, n.sim = length(dtime), m.sim = 1)
var_sim = xts(as.matrix(sigma(sim_d)^2), as.POSIXct(dtime))
sim = ugarchsim(fit, n.sim = 10000, n.start = 0, m.sim = 1, DailyVar = var_sim, rseed = 10)
ep < - axTicksByTime(sim@simulation$DiurnalVar)
par(mfrow = c(4, 1), mar = c(2.5, 2.5, 2, 1))
plot(as.numeric(sim@simulation$DiurnalVar^0.5), type = 'l', main = 'Sigma[Diurnal]', col = 'tomato1', xaxt = 'n', ylab = 'sigma', xlab = ' ')
axis(1, at = ep, labels = names(ep), tick = TRUE)
plot(as.numeric(sim@simulation$DailyVar^0.5), type = 'l', main = 'Sigma[Daily-Simulated]', col = 'tomato2', xaxt = 'n', ylab = 'sigma', xlab = ' ')
axis(1, at = ep, labels = names(ep), tick = TRUE)
plot(sim@simulation$qSim[, 1], type = 'l', main = 'Sigma[Stochastic]', col = 'tomato3', xaxt = 'n', ylab = 'sigma', xlab = ' ')
axis(1, at = ep, labels = names(ep), tick = TRUE)
plot(as.numeric(sigma(sim)), type = 'l', main = 'Sigma[Total]', col = 'tomato4',
    xaxt = 'n', ylab = 'sigma', xlab = ' ')
axis(1, at = ep, labels = names(ep), tick = TRUE)



A rolling backtest and Value at Risk

The ugarchroll function is quite useful for testing a model’s adequacy in a backtest application, and the code below illustrates this for the mcsGARCH model for the data and period under consideration.

n = length(index(R_d['2008-01-01/2008-03-01']))
spec_d = ugarchspec(mean.model = list(armaOrder = c(1, 1)), variance.model = list(model = 'sGARCH'), distribution = 'std')
roll = ugarchroll(spec_d, data = R_d['/2008-02-29'], forecast.length = n, refit.every = 5, refit.window = 'moving', moving.size = 2000, calculate.VaR = FALSE)
df =
f_sigma = as.xts(df[, 'Sigma', drop = FALSE])
spec = ugarchspec(mean.model = list(armaOrder = c(1, 1), include.mean = TRUE), variance.model = list(model = 'mcsGARCH'), distribution = 'std')
roll = ugarchroll(spec, data = R_i, DailyVar = f_sigma^2, forecast.length = 3000, refit.every = 390, refit.window = 'moving', moving.size = 3000, calculate.VaR = TRUE)
# Generate the 1% VaR report
## VaR Backtest Report
## ===========================================
## Model: mcsGARCH-std
## Backtest Length: 3000
## ==========================================
## alpha:             1%
## Expected Exceed:   30
## Actual VaR Exceed: 33
## Actual %:          1.1%
## Unconditional Coverage (Kupiec)
## Null-Hypothesis: Correct Exceedances
## LR.uc Statistic: 0.294
## LR.uc Critical:  3.841
## LR.uc p-value:   0.588
## Reject Null:     NO
## Conditional Coverage (Christoffersen)
## Null-Hypothesis: Correct Exceedances and
##                  Independence of Failures
## Statistic: 1.028
## Critical:  5.991
## p-value:   0.598
## Reject Null:     NO

Not bad at all. Who say’s GARCH models are not good!?

The VaRplot function has been adjusted to work nicely with intraday data as shown below. The spikes in the VaR observed are the result of the seasonal component around the opening of trading.

D = as.POSIXct(rownames(roll@forecast$VaR))
VaRplot(0.01, actual = xts(roll@forecast$VaR[, 3], D), VaR = xts(roll@forecast$VaR[,1], D))


Further Developments

It is quite ‘easy’ to add additional GARCH flavors to the multiplicative model such as the eGARCH, GJR etc, and might do so in due course, time permitting. Another possible direction for expansion would be to treat the diurnal effect separately for each day of the week. I estimate that this would not be too hard to implement, providing a marginal slowdown in the estimation as a result of the increased lookup time for the matching of time and days.

Finally, this model is not ‘plug-and-play’, requiring some thought in the use of time & dates, and the preparation of the intraday returns. The garbage-in garbage-out rule clearly applies.

Please remember that questions about this or other issues with rugarch should be addressed to the r-sig-finance mailing list.


Bollerslev, T., & Ghysels, E. (1996). Periodic autoregressive conditional heteroscedasticity. Journal of Business & Economic Statistics, 14(2), 139–151.

Andersen, T. G., & Bollerslev, T. (1997). Intraday periodicity and volatility persistence in financial markets. Journal of Empirical Finance, 4(2), 115–158.

Engle, R. F., & Sokalska, M. E. (2012). Forecasting intraday volatility in the US equity market. Multiplicative component GARCH. Journal of Financial Econometrics, 10(1), 54–83.

Whats new in rugarch (ver 1.01-5)

Since the last release of rugarch on CRAN (ver 1.0-16), there have been many changes and new features in the development version of the package (ver 1.01-5). First, development of the package (and svn) has been moved to google code from r-forge. Second, the package now features exclusive use of xts based time series for input data and also outputs some of the results as xts as well. The sections that follow highlight some of the key changes to the package which I hope will make it easier to work with.

Extractor Methods

sigma, fitted and residuals

The main extractor method is no longer the, but instead, sigma and fitted will now extract the conditional sigma (GARCH) and mean (ARFIMA) values from all objects. The old extractor methods are now mostly deprecated, with the exception of certain classes where it still makes sense to use them (i.e. uGARCHroll, uGARCHdistribution, uGARCHboot).

fit = ugarchfit(ugarchspec(), sp500ret, out.sample = 10)
c(is(sigma((fit))), is(fitted(fit)), is(residuals(fit)))
## [1] 'xts' 'xts' 'xts'
plot(xts(fit@model$modeldata$data, fit@model$modeldata$index), auto.grid = FALSE,
    minor.ticks = FALSE, main = 'S&P500 Conditional Mean')
lines(fitted(fit), col = 2)


plot(xts(abs(fit@model$modeldata$data), fit@model$modeldata$index), auto.grid = FALSE,
    minor.ticks = FALSE, main = 'S&P500 Conditional Sigma', col = 'grey')
lines(sigma(fit), col = 'steelblue')

Apart from getting the nice xts charts, rugarch can now handle any type of time-formatted data (which can be coerced to xts) including intraday.

A key change has also been made to the output of the forecast class (uGARCHforecast) which merits particular attention. Because rugarch allows to combine both rolling and unconditional forecasts, this creates a rather challenging problem in how to meaningfully output the results, with simple extractor methods such as sigma and fitted. The output in this case will be an n.ahead by (n.roll+1) matrix, where the column headings are now the T+0 dates, since they will always exist to use within a forecast framework, whilst the row names are labelled as T+1, T+2, …,T+n.ahead. The following example illustrates, and shows some simple solutions to deal with portraying n.ahead dates which are completely in the future (i.e. not in the available out of sample testing period).

f = ugarchforecast(fit, n.ahead = 25, n.roll = 9)
sf = sigma(f)
##     2009-01-15 2009-01-16 2009-01-20 2009-01-21 2009-01-22 2009-01-23  2009-01-26 2009-01-27 2009-01-28 2009-01-29
## T+1    0.02176    0.02078    0.02587    0.02730    0.02646    0.02519     0.02400    0.02301    0.02388    0.02487
## T+2    0.02171    0.02073    0.02580    0.02722    0.02638    0.02512     0.02393    0.02295    0.02382    0.02480
## T+3    0.02166    0.02069    0.02573    0.02714    0.02630    0.02505     0.02387    0.02289    0.02375    0.02473
## T+4    0.02160    0.02064    0.02565    0.02706    0.02623    0.02498     0.02380    0.02283    0.02369    0.02466
## T+5    0.02155    0.02059    0.02558    0.02697    0.02615    0.02491     0.02374    0.02277    0.02363    0.02459
## T+6    0.02150    0.02054    0.02550    0.02689    0.02607    0.02484     0.02368    0.02271    0.02356    0.02452

Therefore, the column headings are the T+0 dates i.e. the date at which the forecast was made. There are 10 columns since the n.roll option is zero based. To create an xts representation of only the rolling output is very simple:

print(sf[1, ])
## 2009-01-15 2009-01-16 2009-01-20 2009-01-21 2009-01-22 2009-01-23 2009-01-26 2009-01-27 2009-01-28 2009-01-29
##    0.02176    0.02078    0.02587    0.02730    0.02646    0.02519    0.02400    0.02301    0.02388    0.02487
print(f.roll < - as.xts(sf[1, ]))
##               [,1]
## 2009-01-15 0.02176
## 2009-01-16 0.02078
## 2009-01-20 0.02587
## 2009-01-21 0.02730
## 2009-01-22 0.02646
## 2009-01-23 0.02519
## 2009-01-26 0.02400
## 2009-01-27 0.02301
## 2009-01-28 0.02388
## 2009-01-29 0.02487

This gives the T+0 dates. If you want to generate the actual T+1 rolling dates, use the move function on the T+0 dates which effectively moves the dates 1-period ahead, and appends an extra 1 day to the end (which is not in the sample date set, therefore this is a ‘generated’ date):

print(f.roll1 < - xts(sf[1, ], move(as.POSIXct(colnames(sf)), by = 1)))
##               [,1]
## 2009-01-16 0.02176
## 2009-01-20 0.02078
## 2009-01-21 0.02587
## 2009-01-22 0.02730
## 2009-01-23 0.02646
## 2009-01-26 0.02519
## 2009-01-27 0.02400
## 2009-01-28 0.02301
## 2009-01-29 0.02388
## 2009-01-30 0.02487

For the n.ahead forecasts, I have included in the rugarch package another simple date function called generatefwd which can be used to generate future non-weekend dates. To use this, you will need to know that the model slot of any class object in rugarch which took a method with a data option will keep that data, its format and periodicity, which can be used as follows:

## function (T0, length.out = 1, by = 'days')
## Time difference of 1 days
i = 1
DT0 = generatefwd(T0 = as.POSIXct(colnames(sf)[i]), length.out = 25, by = fit@model$modeldata$period)
print(fT0 < - xts(sf[, i], DT0))
##               [,1]
## 2009-01-16 0.02176
## 2009-01-19 0.02171
## 2009-01-20 0.02166
## 2009-01-21 0.02160
## 2009-01-22 0.02155
## 2009-01-23 0.02150
## 2009-01-26 0.02145
## 2009-01-27 0.02139
## 2009-01-28 0.02134
## 2009-01-29 0.02129
## 2009-01-30 0.02124
## 2009-02-02 0.02119
## 2009-02-03 0.02114
## 2009-02-04 0.02109
## 2009-02-05 0.02104
## 2009-02-06 0.02099
## 2009-02-09 0.02094
## 2009-02-10 0.02089
## 2009-02-11 0.02084
## 2009-02-12 0.02079
## 2009-02-13 0.02075
## 2009-02-16 0.02070
## 2009-02-17 0.02065
## 2009-02-18 0.02060
## 2009-02-19 0.02055

The same applies to the fitted method. This concludes the part on the uGARCHforecast class and the changes which have been made. More details can always be found in the documentation (if you are wondering how to obtain documentation for an S4 class object such as uGARCHforecast, you need to append a ‘-class’ to the end and enclose the whole thing in quotes e.g. help(‘uGARCHforecast-class’)).

quantile and pit

The quantile method extracts the conditional quantiles of a rugarch object, subject to an extra option (probs) as in the S3 class method in stats, while pit calculates and returns the probability integral transformation of a fitted, filtered or rolling object (objects guaranteed to have ‘realized’ data to work with).

head(quantile(fit, c(0.01, 0.025, 0.05)))
##             q[0.01] q[0.025]  q[0.05]
## 1987-03-10 -0.02711 -0.02276 -0.01901
## 1987-03-11 -0.02673 -0.02247 -0.01881
## 1987-03-12 -0.02549 -0.02141 -0.01791
## 1987-03-13 -0.02449 -0.02058 -0.01722
## 1987-03-16 -0.02350 -0.01972 -0.01647
## 1987-03-17 -0.02271 -0.01903 -0.01586
##               pit
## 1987-03-10 0.7581
## 1987-03-11 0.4251
## 1987-03-12 0.5973
## 1987-03-13 0.3227
## 1987-03-16 0.2729
## 1987-03-17 0.9175

These should prove particularly useful in functions which require the quantiles or PIT transformation, such as the risk (VaRTest and VaRDurTest) and misspecification tests (BerkowitzTest and HLTest).

Distribution Functions

rugarch exports a set of functions for working with certain measures on the conditional distributions included in the package. Some of these functions have recently been re-written to take advantage of vectorization, whilst others re-written to take advantage of analytical representations (e.g. as regards to skewness and kurtosis measures for the sstd and jsu distributions). In the latest release, I’ve also included some functions which provide for a graphical visualization of the different distributions with regards to their higher moment features, and demonstrated below:

## Warning: skew lower bound below admissible region...adjusting to
## distribution lower bound.


The distplot function, available for all skewed and/or shaped distributions provides a visual 3D representation of the interaction of the skew and shape parameters in determining the Skewness and Kurtosis. In cases where the distribution is only skewed or shaped, a simpler 2D plot is created as in the case of the std distribution:


Another interesting plot is that of a distribution’s ‘authorized domain’, which shows the region of Skewness-Kurtosis for which a density exists. This is related to the Hamburger moment problem, and the maximum attainable Skewness (S) given kurtosis (K) ( see Widder (1946) ) . The skdomain function returns the values needed to visualize these relationships, and demonstrated below:

# plot the first one
XYnig = skdomain('nig', legend = FALSE)
XYsstd = skdomain('sstd', plot = FALSE)
XYhyp = skdomain('ghyp', plot = FALSE, lambda = 1)
XYjsu = skdomain('jsu', plot = FALSE)
# the values returned are the bottom half of the domain (which is
# symmetric)
lines(XYjsu$Kurtosis, XYjsu$Skewness, col = 3, lty = 2)
lines(XYjsu$Kurtosis, -XYjsu$Skewness, col = 3, lty = 2)
lines(XYsstd$Kurtosis, XYsstd$Skewness, col = 4, lty = 3)
lines(XYsstd$Kurtosis, -XYsstd$Skewness, col = 4, lty = 3)
lines(XYhyp$Kurtosis, XYhyp$Skewness, col = 5, lty = 4)
lines(XYhyp$Kurtosis, -XYhyp$Skewness, col = 5, lty = 4)
legend('topleft', c('MAX', 'NIG', 'JSU', 'SSTD', 'HYP'), col = c(2, 'steelblue', 3, 4, 5), lty = c(1, 1, 2, 3, 4), bty = 'n')

From the plot, it is clear that the skew-student has the widest possible combination of skewness and kurtosis for values of kurtosis less than ~6, whilst the NIG has the widest combination for values greater than ~8. It is interesting to note that the Hyperbolic distribution is not defined for values of kurtosis greater than ~9.

Model Reduction

The reduce function eliminates non-significant coefficients (subject to a pvalue cut-off argument) from a model by fixing them to zero (in rugarch this is equivalent to eliminating them) and re-estimating the model with the remaining parameters as the following example illustrates:

spec = ugarchspec(mean.model = list(armaOrder = c(4, 4)), variance.model = list(garchOrder = c(2, 2)))
fit = ugarchfit(spec, sp500ret[1:1000, ])
round(fit@fit$robust.matcoef, 4)
##         Estimate  Std. Error  t value Pr(>|t|)
## mu        0.0005      0.0004    1.250   0.2111
## ar1       0.2250      0.0164   13.677   0.0000
## ar2       1.6474      0.0175   94.046   0.0000
## ar3      -0.0661      0.0188   -3.517   0.0004
## ar4      -0.8323      0.0164  -50.636   0.0000
## ma1      -0.2328      0.0065  -35.825   0.0000
## ma2      -1.6836      0.0039 -430.423   0.0000
## ma3       0.0980      0.0098    9.953   0.0000
## ma4       0.8426      0.0075  112.333   0.0000
## omega     0.0000      0.0000    1.527   0.1267
## alpha1    0.2153      0.1265    1.702   0.0888
## alpha2    0.0000      0.0000    0.000   1.0000
## beta1     0.3107      0.1149    2.705   0.0068
## beta2     0.3954      0.0819    4.828   0.0000
# eliminate alpha2
fit = reduce(fit, pvalue = 0.1)
## *---------------------------------*
## *          GARCH Model Fit        *
## *---------------------------------*
## Conditional Variance Dynamics
## -----------------------------------
## GARCH Model  : sGARCH(2,2)
## Mean Model   : ARFIMA(4,0,4)
## Distribution : norm
## Optimal Parameters
## ------------------------------------
##         Estimate  Std. Error   t value Pr(>|t|)
## mu       0.00000          NA        NA       NA
## ar1      0.23153    0.006583   35.1698 0.000000
## ar2      1.63558    0.018294   89.4050 0.000000
## ar3     -0.07264    0.018277   -3.9743 0.000071
## ar4     -0.82110    0.014031  -58.5206 0.000000
## ma1     -0.23883    0.009659  -24.7253 0.000000
## ma2     -1.68338    0.005401 -311.6821 0.000000
## ma3      0.11434    0.017035    6.7121 0.000000
## ma4      0.83198    0.013140   63.3179 0.000000
## omega    0.00000          NA        NA       NA
## alpha1   0.14532    0.008997   16.1530 0.000000
## alpha2   0.00000          NA        NA       NA
## beta1    0.26406    0.086397    3.0564 0.002240
## beta2    0.58961    0.088875    6.6342 0.000000
## Robust Standard Errors:
##         Estimate  Std. Error  t value Pr(>|t|)
## mu       0.00000          NA       NA       NA
## ar1      0.23153    0.061687   3.7533 0.000175
## ar2      1.63558    0.093553  17.4829 0.000000
## ar3     -0.07264    0.024012  -3.0251 0.002485
## ar4     -0.82110    0.028395 -28.9168 0.000000
## ma1     -0.23883    0.037806  -6.3172 0.000000
## ma2     -1.68338    0.021478 -78.3771 0.000000
## ma3      0.11434    0.066989   1.7068 0.087851
## ma4      0.83198    0.052093  15.9711 0.000000
## omega    0.00000          NA       NA       NA
## alpha1   0.14532    0.057150   2.5428 0.010996
## alpha2   0.00000          NA       NA       NA
## beta1    0.26406    0.129725   2.0356 0.041793
## beta2    0.58961    0.103879   5.6759 0.000000
## LogLikelihood : 3091
## #####


Hansen, P. R., Lunde, A., & Nason, J. M. (2011). The model confidence set. Econometrica, 79(2), 453-497.
Widder, D. V. (1959). The Laplace Transform. 1946. Princeton University Press, Princeton, New Jersey. Supplementary Bibliography Is. McShane, EJ,“ A canonical form for antiderivatives,” Illinois Jour, of Math, 3, 334-351.

Does anything NOT beat the GARCH(1,1)?

In their paper on GARCH model comparison, Hansen and Lunde (2005) present evidence that among 330 different models, and using daily data on the DM/$ rate and IBM stock returns, no model does significantly better at predicting volatility (based on a realized measure) than the GARCH(1,1) model, for an out of sample period of about 250 days (between 1992/1993 for the currency, and 1999/2000 for the stock). In this demonstration, I instead ask if anything does NOT beat the GARCH(1,1) using a range of operational tests such as VaR exceedances, and model comparison using the model confidence set of Hansen, Lunde and Nason (2011) on a loss function based on the conditional quantiles. Using the S&P 500 index, and the latest 1500 days as the out of sample forecast period, I find that there are few models that do not beat the GARCH(1,1)-Normal.

Data Setup

Given that it is the most widely cited and followed equity index in the world, the S&P 500 is a natural choice for testing a hypothesis, and for the purposes of this demonstration the log returns of the SPY index tracker were used. Starting on 23-01-2007, a rolling forecast and re-estimation scheme was adopted, with a moving window size of 1500 periods, and a re-estimation period of 50. The data was obtained from Yahoo finance and adjusted for dividends and splits.

getSymbols('SPY', from = '1997-01-01')
SPY = adjustOHLC(SPY, use.Adjusted = TRUE)
R = ROC(Cl(SPY), type = 'continuous', na.pad = FALSE)

The following plot highlights the importance of the forecast period under consideration, which includes the credit crisis and an overall exceptional testing ground for risk model backtesting.

plot(Cl(SPY), col = 'black', screens = 1, blocks = list(start.time = paste(head(tail(index(SPY),1500), 1)), end.time = paste(index(tail(SPY, 1))), col = 'whitesmoke'), main = 'SPY', minor.ticks = FALSE)


Model Comparison

For the backtest the following GARCH models were chosen:

  • GJR
  • component GARCH

with four distributions, the Normal, Skew-Student (Fernandez and Steel version), Normal Inverse Gaussian (NIG) and Johnson’s SU (JSU). The conditional mean was based on an ARMA(2,1) model, while the GARCH order was set at (1,1) and (2,1), giving a total combination of 64 models. To create the rolling forecasts, the ugarchroll function was used, which has been completely re-written in the latest version of rugarch and now also includes a resume method for resuming from non-converged estimation windows.

model = c('sGARCH', 'gjrGARCH', 'eGARCH', 'apARCH', 'csGARCH', 'fGARCH', 'fGARCH', 'fGARCH')
submodel = c(NA, NA, NA, NA, NA, 'AVGARCH', 'NGARCH', 'NAGARCH')
spec1 = vector(mode = 'list', length = 16)
for (i in 1:8) spec1[[i]] = ugarchspec(mean.model = list(armaOrder = c(2, 1)), variance.model = list(model = model[i], submodel = if (i > 5) submodel[i] else NULL))
for (i in 9:16) spec1[[i]] = ugarchspec(mean.model = list(armaOrder = c(2, 1)), variance.model = list(garchOrder = c(2, 1), model = model[i - 8], submodel = if ((i - 8) > 5) submodel[i - 8] else NULL))
spec2 = vector(mode = 'list', length = 16)
for (i in 1:8) spec2[[i]] = ugarchspec(mean.model = list(armaOrder = c(2, 1)), variance.model = list(model = model[i], submodel = if (i > 5) submodel[i] else NULL), distribution = 'sstd')
for (i in 9:16) spec2[[i]] = ugarchspec(mean.model = list(armaOrder = c(2, 1)), variance.model = list(garchOrder = c(2, 1), model = model[i - 8], submodel = if ((i - 8) > 5) submodel[i - 8] else NULL), distribution = 'sstd')
spec3 = vector(mode = 'list', length = 16)
for (i in 1:8) spec3[[i]] = ugarchspec(mean.model = list(armaOrder = c(2, 1)), variance.model = list(model = model[i], submodel = if (i > 5) submodel[i] else NULL), distribution = 'nig')
for (i in 9:16) spec3[[i]] = ugarchspec(mean.model = list(armaOrder = c(2, 1)), variance.model = list(garchOrder = c(2, 1), model = model[i - 8], submodel = if ((i - 8) > 5) submodel[i - 8] else NULL), distribution = 'nig')
spec4 = vector(mode = 'list', length = 16)
for (i in 1:8) spec4[[i]] = ugarchspec(mean.model = list(armaOrder = c(2, 1)), variance.model = list(model = model[i], submodel = if (i > 5) submodel[i] else NULL), distribution = 'jsu')
for (i in 9:16) spec4[[i]] = ugarchspec(mean.model = list(armaOrder = c(2, 1)), variance.model = list(garchOrder = c(2, 1), model = model[i - 8], submodel = if ((i - 8) > 5) submodel[i - 8] else NULL), distribution = 'jsu')
spec = c(spec1, spec2, spec3, spec4)
cluster = makePSOCKcluster(15)
clusterExport(cluster, c('spec', 'R'))
clusterEvalQ(cluster, library(rugarch))
# Out of sample estimation
n = length(spec)
fitlist = vector(mode = 'list', length = n)
for (i in 1:n) {
    tmp = ugarchroll(spec[[i]], R, n.ahead = 1, forecast.length = 1500, refit.every = 50, refit.window = 'moving', windows.size = 1500, solver = 'hybrid', calculate.VaR = FALSE, cluster = cluster, keep.coef = FALSE)
    if (!is.null(tmp@model$noncidx)) {
        tmp = resume(tmp, solver = 'solnp', fit.control = list(scale = 1), solver.control = list(tol = 1e-07, delta = 1e-06), cluster = cluster)
        if (!is.null(tmp@model$noncidx))
            fitlist[[i]] = NA
    } else {
        fitlist[[i]] =, which = 'density')

The method of using with option “density”, returns a data.frame with the forecast conditional mean, sigma, any skew or shape parameters for the estimation/forecast window, and the realized data for the period. Because I have made use of a time based data series, the returned data.frame has rownames the forecast dates.

DateMuSigmaSkewShapeShape (GIG)Realized

In order to compare the models, the conditional coverage test for VaR exceedances of Christoffersen (1998) and the tail test of Berkowitz (2001) were used. The following code calculates the quantiles and applies the relevant tests described above.

vmodels = c('sGARCH(1,1)', 'gjrGARCH(1,1)', 'eGARCH(1,1)', 'apARCH(1,1)', 'csGARCH(1,1)','AVGARCH(1,1)', 'NGARCH(1,1)', 'NAGARCH(1,1)', 'sGARCH(2,1)', 'gjrGARCH(2,1)','eGARCH(2,1)', 'apARCH(2,1)', 'csGARCH(2,1)', 'AVGARCH(2,1)', 'NGARCH(2,1)','NAGARCH(2,1)')
modelnames = c(paste(vmodels, '-N', sep = ''), paste(vmodels, '-sstd', sep = ''), paste(vmodels, '-nig', sep = ''), paste(vmodels, '-jsu', sep = ''))
q1 = q5 = px = matrix(NA, ncol = 64, nrow = 1500)
dist = c(rep('norm', 16), rep('sstd', 16), rep('nig', 16), rep('jsu', 16))
# use apply since nig and gh distributions are not yet vectorized
for (i in 1:64) {
    q1[, i] = as.numeric(apply(fitlist[[i]], 1, function(x) qdist(dist[i], 0.01, mu = x['Mu'], sigma = x['Sigma'], skew = x['Skew'], shape = x['Shape'])))
    q5[, i] = as.numeric(apply(fitlist[[i]], 1, function(x) qdist(dist[i], 0.05, mu = x['Mu'], sigma = x['Sigma'], skew = x['Skew'], shape = x['Shape'])))
    px[, i] = as.numeric(apply(fitlist[[i]], 1, function(x) pdist(dist[i], x['Realized'], mu = x['Mu'], sigma = x['Sigma'], skew = x['Skew'], shape = x['Shape'])))
VaR1cc = apply(q1, 2, function(x) VaRTest(0.01, actual = fitlist[[1]][, 'Realized'], VaR = x)$cc.LRp)
VaR5cc = apply(q5, 2, function(x) VaRTest(0.05, actual = fitlist[[1]][, 'Realized'], VaR = x)$cc.LRp)
BT1 = apply(px, 2, function(x) BerkowitzTest(qnorm(x), tail.test = TRUE, alpha = 0.01)$LRp)
BT5 = apply(px, 2, function(x) BerkowitzTest(qnorm(x), tail.test = TRUE, alpha = 0.05)$LRp)
VTable = cbind(VaR1cc, VaR5cc, BT1, BT5)
rownames(VTable) = modelnames
colnames(VTable) = c('VaR.CC(1%)', 'VaR.CC(5%)', 'BT(1%)', 'BT(5%)')

From a summary look at the head and tails of the tables, it is quite obvious that the GARCH(1,1)-Normal model is NOT hard to beat. More generally, the (1,1)-Normal models fare worst, and this is not surprising since financial markets, particularly for this crisis prone period under study, cannot possibly be well modelled without fat tailed and asymmetric distributions. A more interesting result is that the (2,1)-Non Normal models beat (1,1)-Non Normal models on average. Given the widely adopted (1,1) approach almost everywhere, this is a little surprising. It should however be noted that 1500 points where used for each estimation window…with less data, parameter uncertainty is likely to induce noise in a higher order GARCH model.

Top 10


Bottom 10



The following plots provide for a clearer (hopefully!) visual representation of the results.

The models with conditional distribution the Normal are excluded since none of them are above the 5% significance cutoff. All other barplots start at 5% so that any bar which shows up indicates acceptance of the null of the hypothesis (for both tests this is phrased so that it indicates a correctly specified model). A careful examination of all the models indicates that the GJR (1,1) and (2,1) model from any of the 3 skewed and shaped conditional distributions passes all 4 tests. The SSTD distribution fares worst among the three, with the NIG and JSU appearing to provide an equally good fit. The final plot shows the VaR performance of the GARCH(1,1)-Normal and the GJR(2,1)-NIG for the period.

MCS Test on Loss Function

It is often said that the VaR exceedance tests are rather crude and require a lot of data in order to offer any significant insight into model differences. An alternative method for evaluating competing models is based on the comparison of some relevant loss function using a test such as the Model Confidence Set (MCS) of Hansen, Lunde and Nason (2011). In keeping with the focus on the tail of the distribution, a VaR loss function at the 1% and 5% coverage levels was used, described in González-Rivera et al.(2004) and defined as:
$${Q_{loss}} \equiv {N^{ – 1}}\sum\limits_{t = R}^T {\left( {\alpha  – 1\left( {{r_{t + 1}} < VaR_{t + 1}^\alpha } \right)} \right)} \left( {{r_{t + 1}} – VaR_{t + 1}^\alpha } \right)$$
where \( P = T – R \) is the out-of-sample forecast horizon, \( T \) the total horizon to include in estimation, and \( R \) the start of the out-of-sample forecast. This is an asymmetric loss function, linearly penalizing exceedances more heavily by \( (1-\alpha) \). This is not yet exported in rugarch but can still be called as follows:

LossV5 = apply(q5, 2, function(x) rugarch:::.varloss(0.05, fitlist[[1]][, 'Realized'], x))
LossV1 = apply(q1, 2, function(x) rugarch:::.varloss(0.01, fitlist[[1]][, 'Realized'], x))

The MCS function is also not yet included, but may be in a future version. The following barplot shows the result of running the MCS test using a stationary bootstrap with 5000 replications.




While no models are excluded from the model confidence set, at the 95% significance level, the ranking is still somewhat preserved for the top and bottom models, with the (2,1) NIG and JSU models being at the top. Based on this test, it would appear that the eGARCH(2,1)-NIG is the superior model at both coverage rates, with the GARCH(1,1)-Normal being at the very bottom with a 7% and 20% probability of belonging to the set at the 1% and 5% coverage rates respectively.


The normality assumption does not realistically capture the observed market dynamics, and neither does the GARCH(1,1)-N model. There were few models which did not beat it in this application. In fact, higher order GARCH models were shown to provide significant out performance on a range of tail related measures, and distributions such as the NIG and JSU appeared to provide the most realistic representation of the observed dynamics.


Berkowitz, J. (2001). Testing density forecasts, with applications to risk management. Journal of Business & Economic Statistics, 19(4), 465-474.
Christoffersen, P. F. (1998). Evaluating Interval Forecasts. International Economic Review, 39(4), 841-62.
González-Rivera, G., Lee, T. H., & Mishra, S. (2004). Forecasting volatility: A reality check based on option pricing, utility function, value-at-risk, and predictive likelihood. International Journal of Forecasting, 20(4), 629-645.
Hansen, P. R., & Lunde, A. (2005). A forecast comparison of volatility models: does anything beat a GARCH (1, 1)?. Journal of applied econometrics, 20(7), 873-889.
Hansen, P. R., Lunde, A., & Nason, J. M. (2011). The model confidence set. Econometrica, 79(2), 453-497.