Yule Walker Estimation and simulation in Matlab

If a time series data is assumed to be following an Auto-Regressive (AR(N)) model of given form,

x[n] + a_1 x[n-1] + a_2 x[n-2] + \cdots + a_N x[n-N] = w[n]

the natural tendency is to estimate the model parameters a1,a2,…,aN.

Least squares method can be applied here to estimate the model parameters but the computations become cumbersome as the order N increases.

Fortunately, the AR model co-efficients can be solved for using Yule Walker Equations.

Yule-Walker Equations

Yule Walker equations relate auto-regressive model parameters to auto-covariance rxx[k] of random process x[n]. It can be applied to find the AR(N) model parameters from a given time-series data x[n] that indeed can be assumed to be AR process (visually examining auto-correlation function (ACF) and partial auto-correlation function (PACF) may give clue on whether the data can be assumed as an AR or MA process with appropriate model order N).

Steps to find model parameters using Yule-Walker Equations:

  • Given x[n], estimate auto-covariance of the process rxx[k]
  • Solve Yule Walker Equations to find ak (k=0,1,…,N) and σ2 from \hat{r}_{xx}[k]

To solve Yule-Walker equations, one has to know the model order. Although the Yule-Walker equations can be used to find the model parameters, it cannot give any insight into the model order N directly. Several methods exists to ascertain the model order.

  • ACF and PACF Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) may give insight into the model order of the underlying process
  • Akaike Information Criterion (AIC)[1] It associates the number of model parameters and the goodness of fit. It also associates a penalty factor to avoid over-fitting (increasing model order always improves the fit – the fitting process has to be stopped somewhere)
  • Bayesian Information Criterion (BIC)[2] Very similar to AIC
  • Cross Validation[3] The given time series is broken into subsets. One subset of data is taken at a time and model parameters estimated. The estimated model parameters are cross validated with data from the remaining subsets.

Deriving Yule-Walker Equations

A generic AR model

x[n] + a_1 x[n-1] + a_2 x[n-2] + \cdots + a_N x[n-N] = w[n] \quad\quad (1)

can be written in compact form as

\displaystyle{\sum_{k=0}^{N} a_k x[n-k]=w[n], \;\;\; a_0=1} \quad\quad (2)

Note that the scaling factor x[n] term is a0=1

Multiplying (2) by x[n-l]

\displaystyle{ \sum_{k=0}^{N} a_k E\left\{ x[n-k]x[n-l] \right\}=E \left\{ w[n]x[n-l] \right\}, \;\;\; a_0=1} \quad\quad  (3)

Yule Walker Equations

One can readily identify the auto-correlation and cross-correlation terms as

\displaystyle{\sum_{k=0}^{N} a_k \underbrace{E\left\{ x[n-k] x[n-l]\right\}}_{r_{xx}(l-k)} = \underbrace{E\left\{ w[n] x[n-l]\right\}}_{r_{wx}(l)}}

\displaystyle{\sum_{k=0}^{N} a_k r_{xx}[l-k]=r_{wx}[l]}, \;\;\; a_0=1 \quad\quad (4)

The next step is to compute the identified cross-correlation rwx[l] term and relate it to the auto-correlation term rxx[l-k]

The term x[n-l] can also be obtained from equation (1) as

\displaystyle{x[n-l]=-\sum_{k=1}^{N} a_k x[n-k-l]+w[n-l]} \quad\quad (5)

Note that data and noise are always uncorrelated, therefore: x[n-k-l]w[n]=0. Also the auto-correlation of noise is zero at all lags except at lag 0 where its value is equal to σ2 (remember the flat Power Spectral Density of white noise and its auto-correlation). These two properties are used in the following steps. Restricting the lags only to positive values and zero ,

\displaystyle{ \begin{aligned} r_{wx}(l) &=E \left\{ w[n]x[n-l] \right\} \\ &= E \left\{ w[n] \left ( -\sum_{k=1}^{N} a_kx[n-k-l]+w[n-l] \right ) \right\}\\ &= E\left \{ -\sum_{k=1}^{N} a_kx[n-k-l]w[n]+w[n-l]w[n] \right \} \\ &= E\left \{ 0 + w[n-l]w[n] \right \}\\ &= E\left \{ w[n-l]w[n] \right \}\\ &=\begin{cases}0 \;\; ,l>0 \\ \sigma^2 \;,l=0\end{cases} \end{aligned}} \quad\quad (6)

Substituting (6) in (4),

\displaystyle{\sum_{k=0}^{N} a_kr_{xx}[l-k]=\begin{cases}0 \;\; ,l>0 \\ \sigma^2 \;,l=0\end{cases} },\;\;\; a_0=1 \quad\quad (7)

Here there are two cases to be solved – l>0 and l=0
For l>0 case, equation (7) becomes,

\displaystyle{ \sum_{k=1}^{N} a_k r_{xx}[l-k]=-r_{xx}[l]} \quad\quad (8)

Clue: notice the lower limit of the summation which changed from k=0 to k=1 .

Now, equation (8) can be written in matrix form

Yule Walker Equations Matrix form

This is the Yule-Walker Equations which comprises of a set of N linear equations and N unknown parameters.

Representing equation (9) in a compact form

\bar{R}\bar{a}=-\bar{r} \quad\quad (10)

The solutions \bar{a} can be solved by

\bar{a}=-\bar{R}^{-1}\bar{r}\quad\quad (11)

Once we solve for \bar{a} , equivalently ak, – the model parameters, the noise variance σ2 can be found by applying the estimated values of ak in equation (7) by setting l=0

Matlab’s “aryule” efficiently solves the “Yule-Walker” equations using “Levinson Algorithm”[4][5]

Simulation:

Let’s generate an AR(3) process and pretend that we do not anything about the model parameters. We will take this as input data to Yule-Walker and check if it can estimate the model parameters properly

x[n] = -0.9 x[n-1] -0.6 x[n-2] -0.5 x[n-3]

Generating the data from the AR(3) process given above

a=[1  0.9   0.6 0.5]; %model parameters
x=filter(1,a,randn(1000,1)); %data generation with gaussian noise - 1000 samples
plot(x);
title('Sample data from AR(3) process');
xlabel('Sample index');
ylabel('Value');
Simulated data from AR(3) process
Figure 1: Simulated data from AR(3) process

Plot the periodogram (PSD plot) for reference. The PSD plots from the estimated model will be checked against this reference plot later.

figure(2);
periodogram(x); hold on; %plot the original frequency response of the data

Running the simulation for three model orders and checking which model order suits the best.

N=[2,3,4];

for i=1:3,
    [a,p] = aryule(x,N(i))
    [H,w]=freqz(sqrt(p),a);
    hp = plot(w/pi,20*log10(2*abs(H)/(2*pi))); %plotting in log scale
end
xlabel('Normalized frequency (\times \pi rad/sample)');
ylabel('One-sided PSD (dB/rad/sample)');
legend('PSD estimate of x','PSD of model output N=2','PSD of model output N=3','PSD of model output N=4');
Power Spectral density of modeled data
Figure 2: Power Spectral density of modeled data

The estimated model parameters and the noise variances computed by the Yule-Walker system are given below. It can be ascertained that the estimated parameters are approximately same as that of what is expected. See how the error decreases as the model order N increases. The optimum model order in this case is N=3 since the error has not changed significantly when increasing the order and also the model parameter a4 of the AR(4) process is not-significantly different from zero.

Estimated model parameters and prediction errors for the given model model order
Figure 3: Estimated model parameters and prediction errors for the given model model order

Rate this article: PoorBelow averageAverageGoodExcellent (15 votes, average: 4.07 out of 5)

References

[1] Akaike H. (1998) Information Theory and an Extension of the Maximum Likelihood Principle. In: Parzen E., Tanabe K., Kitagawa G. (eds) Selected Papers of Hirotugu Akaike. Springer Series in Statistics (Perspectives in Statistics). Springer, New York, NY. https://doi.org/10.1007/978-1-4612-1694-0_15
[2] McQuarrie, A. D. R., and Tsai, C.-L., 1998. Regression and Time Series Model Selection. World Scientific.↗
[3] Arlot, Sylvain, and Alain Celisse. “A survey of cross-validation procedures for model selection.” Statistics surveys 4 (2010): 40-79.
[4] J. Durbin.The fitting of time series in models.Review of the International Statistical Institute, 28:233-243, 1960
[5] G. H. Golub and C. F. Van Loan, “Matrix Computations”, The Johns Hopkins University Press, third edition, 1996.↗

Related topics

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author

Wireless Communication Systems in Matlab
Wireless Communication Systems in Matlab
Second Edition(PDF)

PoorBelow averageAverageGoodExcellent (180 votes, average: 3.62 out of 5)

Digital modulations using Python
Digital Modulations using Python
(PDF ebook)

PoorBelow averageAverageGoodExcellent (134 votes, average: 3.56 out of 5)

digital_modulations_using_matlab_book_cover
Digital Modulations using Matlab
(PDF ebook)

PoorBelow averageAverageGoodExcellent (136 votes, average: 3.63 out of 5)

Hand-picked Best books on Communication Engineering
Best books on Signal Processing

7 thoughts on “Yule Walker Estimation and simulation in Matlab”

  1. I thought p is the variance estimate of the white noise input to the AR model, and not the error in the prediction.
    Is there another way to exhibit the error of the estimated filter?
    Thanks.

    Reply
  2. Excuse me, I don’t understand the way you used the function freqz in the script ([H,w]=freqz(sqrt(p),d);).
    As it defined in MATLAB documentations: [h,w] = freqz(d,n) returns the n-point complex frequency response for the digital filter, d.
    So, why instead of the filter you put the std of input white noise, and instead of number of samples you put the filter cofficients?
    Sorry, but I am not getting it.
    Thanks in advance.

    Reply
    • With Yule Walker equations one can estimate the parameters of the filter (a0,a1,…an in eqn [1] above) that when excited with white noise w[n] produces the future output samples that closely matches the signal characteristics of x[n]. In linear prediction, we would like to predict the future samples from the past samples (recorded time series). This can be done if we have a auto-regressive model (filter) with specified model order and the filter co-effs.

      The following function estimates the filter co-effs using the Yule Walker equations on the recorded time-series x[n].
      [d,p] = aryule(x,N) %N is the model order and x is the recorded time series

      Here d is the estimated filter co-effs and p is the required noise variance of the white noise input.

      To estimate (recreate) the original sequence x[n] or to predict the future samples , all we have to do is to filter the white noise with variance p through AR filter whose filter co-effs are set to d.

      Equivalently, the AR process is an FIR filter whose transfer function takes the form
      h[n] = x[n]/w[n] = sqrt(p)/d[n]

      where sqrt(p) is the standard deviation of the required white noise input and d[n] is the filter co-effs (denominator) of the FIR filter.

      Since I would like to know the PSD of the AR process, all I have to do is plot the frequency response of the above mentioned filter which is done using freqz(sqrt(d), d)

      Reply

Post your valuable comments !!!