Linear Models - Least Squares Estimator (LSE)

Key focus: Understand step by step, the least squares estimator for parameter estimation. Hands-on example to fit a curve using least squares estimation

Background:

The various estimation concepts/techniques like Maximum Likelihood Estimation (MLE), Minimum Variance Unbiased Estimation (MVUE), Best Linear Unbiased Estimator (BLUE) – all falling under the umbrella of classical estimation – require assumptions/knowledge on second order statistics (covariance) before the estimation technique can be applied. Linear estimators, discussed here, do not require any statistical model to begin with. It only requires a signal model in linear form.

Linear models are ubiquitously used in various fields for studying the relationship between two or more variables. Linear models include regression analysis models, ANalysis Of VAriance (ANOVA) models, variance component models etc. Here, one variable is considered as a dependent (response) variable which can be expressed as a linear combination of one or more independent (explanatory) variables.

Studying the dependence between variables is fundamental to linear models. For applying the concepts to real application, following procedure is required

Problem identification
Model selection
Statistical performance analysis
Criticism of the model based on statistical analysis
Conclusions and recommendations

Following text seeks to elaborate on linear models when applied to parameter estimation using Ordinary Least Squares (OLS).

Linear Regression Model

A regression model relates a dependent (response) variable y to a set of k independent explanatory variables {x₁, x₂ ,…, x_k} using a function. When the relationship is not exact, an error term e is introduced.

$y = f(x_1,x_2,...,x_k) + e \quad\quad (1)$

If the function f is not a linear function, the above model is referred as Non-Linear Regression Model. If f is linear, equation (1) is expressed as linear combination of independent variables x_k weighted by unknown vector parameters θ = {θ₁, θ₂,…, θ_k } that we wish to estimate.

$y = x_1 \theta_1 + x_2 \theta_2 + ... + x_k \theta_k + e \quad\quad (2)$

Equation (2) is referred as Linear Regression model. When N such observations are made

$y_i = x_{1i} \theta_1 + x_{2i} \theta_2 + ... + x_{ki} \theta_k + e , \left(i=1,2,...,N \right) \quad (3)$

where,
y_i – response variable
x_i – independent variables – known expressed as observed matrix X with rank k
θ_i – set of parameters to be estimated
e – disturbances/measurement errors – modeled as noise vector with PDF N(0, σ² I)

It is convenient to express all the variables in matrix form when N observations are made.

$y=\begin{bmatrix} y_1\\ \vdots \\ y_n \end{bmatrix} ,\; X=\begin{bmatrix} x_{11} & x_{21} & ... & x_{k1} \\ \vdots &\vdots & \ddots & \vdots \\ x_{1n} & x_{2n} & ... & x_{kn} \end{bmatrix} ,\; \theta =\begin{bmatrix} \theta_1\\ \vdots \\ \theta_k \end{bmatrix} ,\; e=\begin{bmatrix} e_1\\ \vdots \\ e_n \end{bmatrix} \quad (4)$

Denoting equation (3) using (4),

$y = X \theta + e \quad\quad (5)$

Except for X which is a matrix, all other variables are column/row vectors.

Ordinary Least Squares Estimation (OLS)

In OLS – all errors are considered equal as opposed to Weighted Least Squares where some errors are considered significant than others.

If $\hat{\theta}$ is a k ⨉ 1 vector of estimates of θ, then the estimated model can be written as

$y = X \hat{\theta} + e \quad\quad(6)$

Thus the error vector e can be computed from the observed data matrix y and the estimated $\hat{\theta}$ as

$e = y-X \hat{\theta} \quad\quad (7)$

Here, the errors are assumed to be following multivariate normal distribution with zero mean and standard deviation σ².

To determine the least squares estimator, we write the sum of squares of the residuals (as a function of $\hat{\theta}$ ) as

$\begin{aligned} S(\hat{\theta})&=\sum e^2_i = e^Te=(y-X\hat{\theta})^T(y-X\hat{\theta})\\ &=y^Ty-y^T X \hat{\theta} -\hat{\theta}^TX^Ty + \hat{\theta}^TX^TX\hat{\theta} \end{aligned} \quad (8)$

The least squares estimator is obtained by minimizing $S(\hat{\theta})$ . In order to get the estimate that gives the least square error, differentiate with respect to $\hat{\theta}$ and equate to zero.

$\begin{aligned} \frac{\delta S}{\delta \hat{\theta}}&= -2X^Ty+2X^TX\hat{\theta} = 0\\ &=> \hat{\theta} = \left (X^TX \right )^{-1}X^Ty \end{aligned}\quad (9)$

Thus, the least squared estimate of θ is given by

$\boxed{ \hat{\theta} = \left (X^TX \right )^{-1}X^Ty }$

where the operator T denotes Hermitian Transpose (conjugate transpose).

Summary of computations

Step 1: Choice of variables. Choose the variable to be explained (y) and the explanatory variables { x₁, x₂ ,…, x_k } where x₁ is often considered a constant (optional) that always takes the value 1 – this is to incorporate a DC component in the model.
Step 2: Collect data. Collect n observations of y and for a set of known values of { x₁, x₂ ,…, x_k }. Example: { x₁, x₂ ,…, x_k } is the pilot data in OFDM using which we would like to estimate the channel impulse response θ and y is the received vector of samples. Store the observed data y in an – n⨉1 vector and the data on the explanatory variables in the n⨉k matrix X.
Step 3: Compute the estimates. Compute the least squares estimates by the formula
$\boxed{ \hat{\theta} = \left (X^TX \right )^{-1}X^Ty }$

The superscript T indicates Hermitian Transpose (conjugate transpose) operation.

Key Points

We do not need a probabilistic assumption but only a deterministic signal model.
It has a broader range of applications.
Least squares is unbiased.
Estimating the disturbance variance (k variables to estimate and n observations are available).
$\sigma^2 = \frac{e^Te}{n-k}$
To keep the variance low, the number of observations must be greater than the number of variables to estimate.
The observation matrix X should have maximum rank – this leads to independent rows and columns which always happens with real data. This will make sure (X^TX) is invertible.
Least Squares Estimator can be used in block processing mode with overlapping segments – similar to Welch’s method of PSD estimation.
Useful in time-frequency analysis.
Adaptive filters are utilized for non-stationary applications.

LSE applied to curve fitting

Matlab snippet for implementing Least Estimate to fit a curve is given below.

x = -5:.1:5; % set of x- values - known explanatory variables
y = 5.3 + 1.2* x; % Straight line without noise
e=randn(size(y));
y = y + e; % adding random noise to get observed variable - 
%Linear model - Y=Xa+e where a - parameters to be estimated

X = [ ones(length(x),1) x']; %first column treated aas all ones since x_1=1
y = y'; %column vector for proper dimension during multiplication
a = inv(X'*X)*X'*y  % Least Squares Estimator - equivalent code X\y
h=plot ( x , y , 'o'); %original data
hold on;
plot( x , a(1)+ a(2)*x , 'r-' ); %Fitted line
legend('observed samples',['y=' num2str(a(1)) '+' num2str(a(2)) 'x']) 
title('Least Squares Estimate for Curve Fitting');
xlabel('X values');
ylabel('Y values');

Simulation Results

Least Squares Estimate for Curve Fitting Matlab — *Figure 1: Least Squares Estimate for Curve Fitting*

Rate this article: Poor Below average Average Good Excellent (11 votes, average: 3.55 out of 5)

Books by the author

Wireless Communication Systems in Matlab Second Edition(PDF) (184 votes, average: 3.65 out of 5) Checkout Added to cart	Digital Modulations using Python (PDF ebook) (137 votes, average: 3.57 out of 5) Checkout Added to cart	Digital Modulations using Matlab (PDF ebook) (138 votes, average: 3.64 out of 5) Checkout Added to cart
Hand-picked Best books on Communication Engineering Best books on Signal Processing

4 thoughts on “Linear Models – Least Squares Estimator (LSE)”

Girish

August 12, 2015 at 2:25 pm

Hello Sir

I want to do channel equalization and I am using the zero forcing equalizer.

I am using this code.

enbtx=dlmread(‘input.txt’);

uerx_cap=dlmread(‘output.txt’);

enbtx=enbtx(:,1)+1i*enbtx(:,2);

enbtx_norm=enbtx/max(abs(enbtx));

uerx_cap=uerx_cap(:,1)+1i*uerx_cap(:,2);

uerx_cap_norm=uerx_cap/max(abs(uerx_cap));

x=enbtx_norm; % I/P

y=uerx_cap_norm; %o/p

X=fft(x,);

Y=fft(y,);

H=Y*pinv(X); channel estimation

H_zf=pinv(H); making 1/H(z)

As channel is estimated then I take new data which is passed by the same channel

z is the new data taken

Z=fft(z);

Y_eq=H_zf*Y;

y_eq=ifft(Y_eq);

But for the new input output the equalizer is not working
Kindly help me, I am stuck in it.

With warm regards
Nivedita negi

December 8, 2014 at 9:45 pm

can u please tell me how to do same estimation of parameter in linear model using Maximum likelihood? as soon as possible…in MLE u have solved only x=A+wn but I want to know for x = H*s(n)+w
- Mathuranathan
  
  December 8, 2014 at 11:11 pm
  
  For your question on x=H*s(n)+w, I assume your goal is to estimate the channel – ‘H’. This problem is very specific to the application and the nature of the channel (channel model dependent).
  
  To apply MLE for channel estimation, you need to first understand the channel model. Then develop a statistical model that represents the mix of received signal, noise and interference (if any).
  
  An excellent example would be pilot estimation algorithms in OFDM systems. Some of them can be found here.
  http://www.freescale.com/files/dsp/doc/app_note/AN3059.pdf
  - Nivedita negi
    
    March 30, 2015 at 2:14 pm
    
    thank you so much.

[1]	An Introduction to Estimation Theory
[2]	Bias of an Estimator
[3]	Minimum Variance Unbiased Estimators (MVUE)
[4]	Maximum Likelihood Estimation
[5]	Maximum Likelihood Decoding
[6]	Probability and Random Process
[7]	Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]	Score, Fisher Information and Estimator Sensitivity
[9]	Introduction to Cramer Rao Lower Bound (CRLB)
[10]	Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]	Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]	Efficient Estimators and CRLB
[13]	Cramer Rao Lower Bound for Phase Estimation
[14]	Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]	Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]	The Mean Square Error – Why do we use it for estimation problems
[17]	How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]	Essential Preliminary Matrix Algebra for Signal Processing
[19]	Why Cholesky Decomposition ? A sample case:
[20]	Tests for Positive Definiteness of a Matrix
[21]	Solving a Triangular Matrix using Forward & Backward Substitution
[22]	Cholesky Factorization - Matlab and Python
[23]	LTI system models for random signals – AR, MA and ARMA models
[24]	Comparing AR and ARMA model - minimization of squared error
[25]	Yule Walker Estimation
[26]	AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]	Linear Models - Least Squares Estimator (LSE)
[28]	Best Linear Unbiased Estimator (BLUE)

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Linear Models – Least Squares Estimator (LSE)