Linear Models – Least Squares Estimator (LSE)

Key focus: Understand step by step, the least squares estimator for parameter estimation. Hands-on example to fit a curve using least squares estimation

Background:

The various estimation concepts/techniques like Maximum Likelihood Estimation (MLE), Minimum Variance Unbiased Estimation (MVUE), Best Linear Unbiased Estimator (BLUE) – all falling under the umbrella of classical estimation – require assumptions/knowledge on second order statistics (covariance) before the estimation technique can be applied. Linear estimators, discussed here, do not require any statistical model to begin with. It only requires a signal model in linear form.

Linear models are ubiquitously used in various fields for studying the relationship between two or more variables. Linear models include regression analysis models, ANalysis Of VAriance (ANOVA) models, variance component models etc. Here, one variable is considered as a dependent (response) variable which can be expressed as a linear combination of one or more independent (explanatory) variables.

Studying the dependence between variables is fundamental to linear models. For applying the concepts to real application, following procedure is required

  1. Problem identification
  2. Model selection
  3. Statistical performance analysis
  4. Criticism of the model based on statistical analysis
  5. Conclusions and recommendations

Following text seeks to elaborate on linear models when applied to parameter estimation using Ordinary Least Squares (OLS).

Linear Regression Model

A regression model relates a dependent (response) variable y to a set of k independent explanatory variables {x1, x2 ,…, xk} using a function. When the relationship is not exact, an error term e is introduced.

y = f(x_1,x_2,...,x_k) + e \quad\quad (1)

If the function f is not a linear function, the above model is referred as Non-Linear Regression Model. If f is linear, equation (1) is expressed as linear combination of independent variables xk weighted by unknown vector parameters θ = {θ1, θ2,…, θk } that we wish to estimate.

y = x_1 \theta_1 + x_2 \theta_2 + ... + x_k \theta_k + e \quad\quad (2)

Equation (2) is referred as Linear Regression model. When N such observations are made

y_i = x_{1i} \theta_1 + x_{2i} \theta_2 + ... + x_{ki} \theta_k + e , \left(i=1,2,...,N \right) \quad (3)

where,
yi – response variable
xi – independent variables – known expressed as observed matrix X with rank k
θi – set of parameters to be estimated
e – disturbances/measurement errors – modeled as noise vector with PDF N(0, σ2 I)

It is convenient to express all the variables in matrix form when N observations are made.

y=\begin{bmatrix} y_1\\ \vdots \\ y_n \end{bmatrix} ,\; X=\begin{bmatrix} x_{11} & x_{21} & ... & x_{k1} \\ \vdots &\vdots & \ddots & \vdots \\ x_{1n} & x_{2n} & ... & x_{kn} \end{bmatrix} ,\; \theta =\begin{bmatrix} \theta_1\\ \vdots \\ \theta_k \end{bmatrix} ,\; e=\begin{bmatrix} e_1\\ \vdots \\ e_n \end{bmatrix} \quad (4)

Denoting equation (3) using (4),

y = X \theta + e \quad\quad (5)

Except for X which is a matrix, all other variables are column/row vectors.

Ordinary Least Squares Estimation (OLS)

In OLS – all errors are considered equal as opposed to Weighted Least Squares where some errors are considered significant than others.

If \hat{\theta} is a k ⨉ 1 vector of estimates of θ, then the estimated model can be written as

y = X \hat{\theta} + e \quad\quad(6)

Thus the error vector e can be computed from the observed data matrix y and the estimated \hat{\theta} as

e = y-X \hat{\theta} \quad\quad (7)

Here, the errors are assumed to be following multivariate normal distribution with zero mean and standard deviation σ2.

To determine the least squares estimator, we write the sum of squares of the residuals (as a function of \hat{\theta} ) as

\begin{aligned} S(\hat{\theta})&=\sum e^2_i = e^Te=(y-X\hat{\theta})^T(y-X\hat{\theta})\\ &=y^Ty-y^T X \hat{\theta} -\hat{\theta}^TX^Ty + \hat{\theta}^TX^TX\hat{\theta} \end{aligned} \quad (8)

The least squares estimator is obtained by minimizing S(\hat{\theta}). In order to get the estimate that gives the least square error, differentiate with respect to \hat{\theta} and equate to zero.

\begin{aligned} \frac{\delta S}{\delta \hat{\theta}}&= -2X^Ty+2X^TX\hat{\theta} = 0\\ &=> \hat{\theta} = \left (X^TX \right )^{-1}X^Ty \end{aligned}\quad (9)

Thus, the least squared estimate of θ is given by

\boxed{ \hat{\theta} = \left (X^TX \right )^{-1}X^Ty }

where the operator T denotes Hermitian Transpose (conjugate transpose).

Summary of computations

  1. Step 1: Choice of variables. Choose the variable to be explained (y) and the explanatory variables { x1, x2 ,…, xk } where x1 is often considered a constant (optional) that always takes the value 1 – this is to incorporate a DC component in the model.
  2. Step 2: Collect data. Collect n observations of y and for a set of known values of { x1, x2 ,…, xk }. Example: { x1, x2 ,…, xk } is the pilot data in OFDM using which we would like to estimate the channel impulse response θ and y is the received vector of samples. Store the observed data y in an – n⨉1 vector and the data on the explanatory variables in the n⨉k matrix X.
  3. Step 3: Compute the estimates. Compute the least squares estimates by the formula
    \boxed{ \hat{\theta} = \left (X^TX \right )^{-1}X^Ty }

The superscript T indicates Hermitian Transpose (conjugate transpose) operation.

Key Points

  • We do not need a probabilistic assumption but only a deterministic signal model.
  • It has a broader range of applications.
  • Least squares is unbiased.
  • Estimating the disturbance variance (k variables to estimate and n observations are available).
    \sigma^2 = \frac{e^Te}{n-k}
  • To keep the variance low, the number of observations must be greater than the number of variables to estimate.
  • The observation matrix X should have maximum rank – this leads to independent rows and columns which always happens with real data. This will make sure (XTX) is invertible.
  • Least Squares Estimator can be used in block processing mode with overlapping segments – similar to Welch’s method of PSD estimation.
  • Useful in time-frequency analysis.
  • Adaptive filters are utilized for non-stationary applications.

LSE applied to curve fitting

Matlab snippet for implementing Least Estimate to fit a curve is given below.

x = -5:.1:5; % set of x- values - known explanatory variables
y = 5.3 + 1.2* x; % Straight line without noise
e=randn(size(y));
y = y + e; % adding random noise to get observed variable - 
%Linear model - Y=Xa+e where a - parameters to be estimated

X = [ ones(length(x),1) x']; %first column treated aas all ones since x_1=1
y = y'; %column vector for proper dimension during multiplication
a = inv(X'*X)*X'*y  % Least Squares Estimator - equivalent code X\y
h=plot ( x , y , 'o'); %original data
hold on;
plot( x , a(1)+ a(2)*x , 'r-' ); %Fitted line
legend('observed samples',['y=' num2str(a(1)) '+' num2str(a(2)) 'x']) 
title('Least Squares Estimate for Curve Fitting');
xlabel('X values');
ylabel('Y values');

Simulation Results

Least Squares Estimate for Curve Fitting Matlab
Figure 1: Least Squares Estimate for Curve Fitting

Rate this article: PoorBelow averageAverageGoodExcellent (11 votes, average: 3.55 out of 5)

Related topics:

[1]An Introduction to Estimation Theory
[2]Bias of an Estimator
[3]Minimum Variance Unbiased Estimators (MVUE)
[4]Maximum Likelihood Estimation
[5]Maximum Likelihood Decoding
[6]Probability and Random Process
[7]Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]Score, Fisher Information and Estimator Sensitivity
[9]Introduction to Cramer Rao Lower Bound (CRLB)
[10]Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]Efficient Estimators and CRLB
[13]Cramer Rao Lower Bound for Phase Estimation
[14]Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]The Mean Square Error – Why do we use it for estimation problems
[17]How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]Essential Preliminary Matrix Algebra for Signal Processing
[19]Why Cholesky Decomposition ? A sample case:
[20]Tests for Positive Definiteness of a Matrix
[21]Solving a Triangular Matrix using Forward & Backward Substitution
[22]Cholesky Factorization - Matlab and Python
[23]LTI system models for random signals – AR, MA and ARMA models
[24]Comparing AR and ARMA model - minimization of squared error
[25]Yule Walker Estimation
[26]AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]Linear Models - Least Squares Estimator (LSE)
[28]Best Linear Unbiased Estimator (BLUE)

Books by the author

Wireless Communication Systems in Matlab
Wireless Communication Systems in Matlab
Second Edition(PDF)

Note: There is a rating embedded within this post, please visit this post to rate it.
Digital modulations using Python
Digital Modulations using Python
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
digital_modulations_using_matlab_book_cover
Digital Modulations using Matlab
(PDF ebook)

Note: There is a rating embedded within this post, please visit this post to rate it.
Hand-picked Best books on Communication Engineering
Best books on Signal Processing

4 thoughts on “Linear Models – Least Squares Estimator (LSE)”

  1. Hello Sir

    I want to do channel equalization and I am using the zero forcing equalizer.

    I am using this code.

    enbtx=dlmread(‘input.txt’);

    uerx_cap=dlmread(‘output.txt’);

    enbtx=enbtx(:,1)+1i*enbtx(:,2);

    enbtx_norm=enbtx/max(abs(enbtx));

    uerx_cap=uerx_cap(:,1)+1i*uerx_cap(:,2);

    uerx_cap_norm=uerx_cap/max(abs(uerx_cap));

    x=enbtx_norm; % I/P

    y=uerx_cap_norm; %o/p

    X=fft(x,);

    Y=fft(y,);

    H=Y*pinv(X); channel estimation

    H_zf=pinv(H); making 1/H(z)

    As channel is estimated then I take new data which is passed by the same channel

    z is the new data taken

    Z=fft(z);

    Y_eq=H_zf*Y;

    y_eq=ifft(Y_eq);

    But for the new input output the equalizer is not working
    Kindly help me, I am stuck in it.

    With warm regards

    Reply
  2. can u please tell me how to do same estimation of parameter in linear model using Maximum likelihood? as soon as possible…in MLE u have solved only x=A+wn but I want to know for x = H*s(n)+w

    Reply

Post your valuable comments !!!