If a time series data is assumed to be following an Auto-Regressive (AR(N)) model of given form,
the natural tendency is to estimate the model parameters a1,a2,…,aN.
Least squares method can be applied here to estimate the model parameters but the computations become cumbersome as the order N increases.
Fortunately, the AR model co-efficients can be solved for using Yule Walker Equations.
Yule-Walker Equations
Yule Walker equations relate auto-regressive model parameters to auto-covariance rxx[k] of random process x[n]. It can be applied to find the AR(N) model parameters from a given time-series data x[n] that indeed can be assumed to be AR process (visually examining auto-correlation function (ACF) and partial auto-correlation function (PACF) may give clue on whether the data can be assumed as an AR or MA process with appropriate model order N).
Steps to find model parameters using Yule-Walker Equations:
- Given x[n], estimate auto-covariance of the process rxx[k]
- Solve Yule Walker Equations to find ak (k=0,1,…,N) and σ2 from
To solve Yule-Walker equations, one has to know the model order. Although the Yule-Walker equations can be used to find the model parameters, it cannot give any insight into the model order N directly. Several methods exists to ascertain the model order.
- ACF and PACF Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) may give insight into the model order of the underlying process
- Akaike Information Criterion (AIC)[1] It associates the number of model parameters and the goodness of fit. It also associates a penalty factor to avoid over-fitting (increasing model order always improves the fit – the fitting process has to be stopped somewhere)
- Bayesian Information Criterion (BIC)[2] Very similar to AIC
- Cross Validation[3] The given time series is broken into subsets. One subset of data is taken at a time and model parameters estimated. The estimated model parameters are cross validated with data from the remaining subsets.
Deriving Yule-Walker Equations
A generic AR model
can be written in compact form as
Note that the scaling factor x[n] term is a0=1
Multiplying (2) by x[n-l]
One can readily identify the auto-correlation and cross-correlation terms as
The next step is to compute the identified cross-correlation rwx[l] term and relate it to the auto-correlation term rxx[l-k]
The term x[n-l] can also be obtained from equation (1) as
Note that data and noise are always uncorrelated, therefore: x[n-k-l]w[n]=0. Also the auto-correlation of noise is zero at all lags except at lag 0 where its value is equal to σ2 (remember the flat Power Spectral Density of white noise and its auto-correlation). These two properties are used in the following steps. Restricting the lags only to positive values and zero ,
Substituting (6) in (4),
Here there are two cases to be solved – l>0 and l=0
For l>0 case, equation (7) becomes,
Clue: notice the lower limit of the summation which changed from k=0 to k=1 .
Now, equation (8) can be written in matrix form
This is the Yule-Walker Equations which comprises of a set of N linear equations and N unknown parameters.
Representing equation (9) in a compact form
The solutions
Once we solve for
Matlab’s “aryule” efficiently solves the “Yule-Walker” equations using “Levinson Algorithm”[4][5]
Simulation:
Let’s generate an AR(3) process and pretend that we do not anything about the model parameters. We will take this as input data to Yule-Walker and check if it can estimate the model parameters properly
Generating the data from the AR(3) process given above
a=[1 0.9 0.6 0.5]; %model parameters
x=filter(1,a,randn(1000,1)); %data generation with gaussian noise - 1000 samples
plot(x);
title('Sample data from AR(3) process');
xlabel('Sample index');
ylabel('Value');
Plot the periodogram (PSD plot) for reference. The PSD plots from the estimated model will be checked against this reference plot later.
figure(2);
periodogram(x); hold on; %plot the original frequency response of the data
Running the simulation for three model orders and checking which model order suits the best.
N=[2,3,4];
for i=1:3,
[a,p] = aryule(x,N(i))
[H,w]=freqz(sqrt(p),a);
hp = plot(w/pi,20*log10(2*abs(H)/(2*pi))); %plotting in log scale
end
xlabel('Normalized frequency (\times \pi rad/sample)');
ylabel('One-sided PSD (dB/rad/sample)');
legend('PSD estimate of x','PSD of model output N=2','PSD of model output N=3','PSD of model output N=4');
The estimated model parameters and the noise variances computed by the Yule-Walker system are given below. It can be ascertained that the estimated parameters are approximately same as that of what is expected. See how the error decreases as the model order N increases. The optimum model order in this case is N=3 since the error has not changed significantly when increasing the order and also the model parameter a4 of the AR(4) process is not-significantly different from zero.
Rate this article: Note: There is a rating embedded within this post, please visit this post to rate it.
References
[1] Akaike H. (1998) Information Theory and an Extension of the Maximum Likelihood Principle. In: Parzen E., Tanabe K., Kitagawa G. (eds) Selected Papers of Hirotugu Akaike. Springer Series in Statistics (Perspectives in Statistics). Springer, New York, NY. https://doi.org/10.1007/978-1-4612-1694-0_15
[2] McQuarrie, A. D. R., and Tsai, C.-L., 1998. Regression and Time Series Model Selection. World Scientific.↗
[3] Arlot, Sylvain, and Alain Celisse. “A survey of cross-validation procedures for model selection.” Statistics surveys 4 (2010): 40-79.
[4] J. Durbin.The fitting of time series in models.Review of the International Statistical Institute, 28:233-243, 1960
[5] G. H. Golub and C. F. Van Loan, “Matrix Computations”, The Johns Hopkins University Press, third edition, 1996.↗
Related topics
Books by the author
can you please tell me why LS method requires more computations?
Least squares method is mathematically tractable upto N=2 (that is AR(2) model). For N>2, the Least Squares solution yields very long expressions and is not mathematically tractable.
The following document explains this with examples. Hope this helps !!!
http://www-stat.wharton.upenn.edu/~steele/Courses/956/Resource/YWSourceFiles/YW-Eshel.pdf
I thought p is the variance estimate of the white noise input to the AR model, and not the error in the prediction.
Is there another way to exhibit the error of the estimated filter?
Thanks.
Excuse me, I don’t understand the way you used the function freqz in the script ([H,w]=freqz(sqrt(p),d);).
As it defined in MATLAB documentations: [h,w] = freqz(d,n) returns the n-point complex frequency response for the digital filter, d.
So, why instead of the filter you put the std of input white noise, and instead of number of samples you put the filter cofficients?
Sorry, but I am not getting it.
Thanks in advance.
With Yule Walker equations one can estimate the parameters of the filter (a0,a1,…an in eqn [1] above) that when excited with white noise w[n] produces the future output samples that closely matches the signal characteristics of x[n]. In linear prediction, we would like to predict the future samples from the past samples (recorded time series). This can be done if we have a auto-regressive model (filter) with specified model order and the filter co-effs.
The following function estimates the filter co-effs using the Yule Walker equations on the recorded time-series x[n].
[d,p] = aryule(x,N) %N is the model order and x is the recorded time series
Here d is the estimated filter co-effs and p is the required noise variance of the white noise input.
To estimate (recreate) the original sequence x[n] or to predict the future samples , all we have to do is to filter the white noise with variance p through AR filter whose filter co-effs are set to d.
Equivalently, the AR process is an FIR filter whose transfer function takes the form
h[n] = x[n]/w[n] = sqrt(p)/d[n]
where sqrt(p) is the standard deviation of the required white noise input and d[n] is the filter co-effs (denominator) of the FIR filter.
Since I would like to know the PSD of the AR process, all I have to do is plot the frequency response of the above mentioned filter which is done using freqz(sqrt(d), d)
I assume that there is a typo in equation 4, x should be also in underscript.
Thanks for notifying. Typo corrected…