BLUE estimator

Why BLUE :

We have discussed Minimum Variance Unbiased Estimator (MVUE) in one of the previous articles. Following points should be considered when applying MVUE to an estimation problem

MVUE is the optimal estimator
Finding a MVUE requires full knowledge of PDF (Probability Density Function) of the underlying process.
Even if the PDF is known, finding an MVUE is not guaranteed.
If PDF is unknown, it is impossible find an MVUE using techniques like Cramer Rao Lower Bound (CRLB)
In practice, knowledge of PDF of the underlying process is actually unknown.

Considering all the points above, the best possible solution is to resort to finding a sub-optimal estimator. When we resort to find a sub-optimal estimator

We may not be sure how much performance we have lost – Since we will not able to find the MVUE estimator for bench marking (due to non-availability of underlying PDF of the process).
We can live with it, if the variance of the sub-optimal estimator is well with in specification limits

Common Approach for finding sub-optimal Estimator:

Restrict the estimator to be linear in data
Find the linear estimator that is unbiased and has minimum variance
This leads to Best Linear Unbiased Estimator (BLUE)
To find a BLUE estimator, full knowledge of PDF is not needed. Just the first two moments (mean and variance) of the PDF is sufficient for finding the BLUE

Definition of BLUE:

Consider a data set $y[n]= \{ y[0],y[1], \cdots ,y[N-1] \} $ whose parameterized PDF $p(y ;\theta) $ depends on the unknown parameter $\beta$. As the BLUE restricts the estimator to be linear in data, the estimate of the parameter can be written as linear combination of data samples with some weights $a_n$

$$\hat{\beta} = \displaystyle{\sum_{n=0}^{N} a_n y[n] = \textbf{a}^T \textbf{y}}$$

Here $\textbf{a}$ is a vector of constants whose value we seek to find in order to meet the design specifications. Thus, the entire estimation problem boils down to finding the vector of constants – $\textbf{a}$. The above equation may lead to multiple solutions for the vector $\textbf{a}$. However, we need to choose those set of values of $\textbf{a}$, that provides estimates that are unbiased and has minimum variance.

Thus seeking the set of values for $\textbf{a}$ for finding a BLUE estimator that provides minimum variance, must satisfy the following two constraints

The estimator must be linear in data
Estimate must be unbiased

Constraint 1: Linearity Constraint:

Linearity constraint was already given above. Just repeated here for convenience.

$$\hat{\beta} = \displaystyle{\sum_{n=0}^{N} a_n y[n] = \textbf{a}^T \textbf{y}} \quad\quad(1) $$

Constraint 2: Constraint for unbiased estimates:

For the estimate to be considered unbiased, the expectation (mean) of the estimate must be equal to the true value of the estimate.

$$E[\hat{\beta}] = \beta \quad\quad(2)$$

Thus,

$$ \displaystyle{\sum_{n=0}^{N} a_n E \left(y[n] \right)=\beta}\quad\quad (3) $$

Combining both the constraints (1) and (2) or (3),

$$ E[\hat{\beta}] =\displaystyle{\sum_{n=0}^{N} a_n E \left( y[n] \right) = \textbf{a}^T \textbf{y}=\beta}\quad\quad (4) $$

Now, the million dollar question is : “When can we meet both the constraints ? “. We can meet both the constraints only when the observation is linear. That is $y[n]$ is of the form $y[n] = x[n] \beta$ where $\beta$ is the unknown parameter that we wish to estimate.

Consider a data model, as shown below, where the observed samples are in linear form with respect to the parameter to be estimated.

$$ y[n] = x[n] \beta + \epsilon[n]\quad\quad (5) $$

Here, $\epsilon[n]$ is zero mean noise process , whose PDF can take any form (Uniform, Gaussian, Colored etc., ). The mean of the above equation is given by

$$E(y[n]) = E(x[n] \beta) = x[n] \beta\quad\quad(6)$$

Substuiting (6) in (4) ,

$$E[\hat{\beta}] = \displaystyle{\sum_{n=0}^{N} a_n E \left( y[n] \right) = \beta \sum_{n=0}^{N} a_n x[n] = \beta \textbf{a}^T \textbf{x} = \beta} \quad\quad (7)$$

Looking at the last set of equality,

$$ \theta \textbf{a}^T \textbf{x}=\beta\quad\quad(8) $$

The above equality can be satisfied only if

$$\textbf{a}^T \textbf{x} =1 \quad\quad (9)$$

Given this condition is met, the next step is to minimize the variance of the estimate. Minimizing the variance of the estimate,

$$\begin{aligned} var(\hat{\beta}) &=E\left [ \left (\sum_{n=0}^{N}a_n y[n] – E\left [\sum_{n=0}^{N}a_n y[n] \right ] \right )^2 \right ]\\ &=E\left [ \left ( \textbf{a}^T \textbf{y} – \textbf{a}^T E[\textbf{y}] \right )^2\right ]\\ &=E\left [ \left ( \textbf{a}^T \left [\textbf{y}- E(\textbf{y}) \right ] \right )^2\right ]\\ &=E\left [ \textbf{a}^T \left [\textbf{y}- E(\textbf{y}) \right ]\left [\textbf{y}- E(\textbf{y}) \right ]^T \textbf{a} \right ]\\ &=E\left [ \textbf{a}^T \textbf{C} \textbf{a} \right ]\\ &=\textbf{a}^T \textbf{C} \textbf{a} \end{aligned} \quad\quad (10) $$

Finding BLUE:

As discussed above, in order to find a BLUE estimator for a given set of data, two constraints – linearity & unbiased estimates – must be satisfied and the variance of the estimate should be minimum. Thus the goal is to minimize the variance of $\hat{\beta}$ which is $\mathbf{a}^T\mathbf{C} \mathbf{a}$ subject to the constraint $\mathbf{a}^T\mathbf{x} = 1$. This is a typical Lagrangian Multiplier↗ problem, which can be considered as minimizing the following equation with respect to $\mathbf{a}$ (Remember !!! this is what we would like to find ).

Lagrangian Multiplier

The Lagrangian multiplier is a mathematical tool used in optimization problems, particularly in constrained optimization. It helps find the local maxima and minima of a function subject to equality constraints. The method of Lagrange multipliers transforms a constrained optimization problem into an unconstrained one by incorporating the constraints into the objective function

$$J = \textbf{a}^T \textbf{C} \textbf{a} + \lambda(\textbf{a}^T \textbf{x} -1)\quad\quad (11) $$

Minimizing $J$ with respect to $\textbf{a}$ is equivalent to setting the first derivative of $J$ w.r.t $\textbf{a}$ to zero.

\begin{aligned} \frac{\partial J}{\partial \textbf{a}} &= 2\textbf{C}\textbf{a} + \lambda \textbf{x}=0 \ & \Rightarrow \boxed {\textbf{a}=-\frac{\lambda}{2}\textbf{C}^{-1}\textbf{x}} \quad \quad (12) \end{aligned}

Substituting (12) in (9)

$$\textbf{a}^T \textbf{x} = -\frac{\lambda}{2}\textbf{x}^{T}\textbf{C}^{-1} \textbf{x}=1 \Rightarrow \boxed {-\frac{\lambda}{2}=\frac{1}{\textbf{x}^{T}\textbf{C}^{-1}\textbf{x}}} \quad\quad (13)$$

Finally, from (12) and (13), the co-effs of the BLUE estimator (vector of constants that weights the data samples) is given by

$$\boxed{a = \frac{\textbf{C}^{-1}\textbf{x}}{\textbf{x}^{T}\textbf{C}^{-1}\textbf{x}}} \quad\quad\quad (14)$$

The BLUE estimate and the variance of the estimates are as follows

$$\boxed{ \hat{\beta}_{BLUE} =\textbf{a}^{T} \textbf{y} =\frac{\textbf{C}^{-1}\textbf{x} \textbf{y}}{\textbf{x}^{T}\textbf{C}^{-1}\textbf{x}}}\quad\quad\quad (15)$$

$$\boxed {var(\hat{\beta})= \frac{1}{\textbf{x}^{T}\textbf{C}^{-1}\textbf{x}}}\quad\quad\quad (16)$$

Gauss-Markov theorem

We can generalize the BLUE estimator using Gauss-Markov theorem,. The properties of BLUE are derived from the Gauss-Markov theorem, which states that:

In a linear regression model where the errors have an expected value of zero and are uncorrelated with constant variance , the Ordinary Least Squares (OLS) estimator is the BLUE.

Suppose, if we have a model:

$$ Y = X \beta + \epsilon $$

Where,$Y$ is the vector of observed samples representing the dependent variable, $X$ is the maxtrix of predictors representing the independent variable, $\beta$ is the vector of coefficients/parameter to be estimated and $\epsilon$ represents random error term with $E[\epsilon] = 0$ and constant variance $Var(\epsilon) = \sigma^2 I$. Then the OLS estimator is given by the following equation is BLUE:

$$\hat{\beta} = \left(X^T X \right)^{-1} X^T Y $$

Limitations

While BLUE has many advantages, there are some limitations.

The validity of BLUE relies on several assumptions (linearity, independence of errors, homoscedasticity). If these assumptions are violated, BLUE may not be optimal.
Like other linear estimators, BLUE can be sensitive to outliers in data.
BLUE does not provide robustness against violations of normality or heteroscedasticity.

Homoscedasticity

The spread of the error terms is constant across all values of the predictor variables.

Heteroscedasticity

The spread (variance) of the error term in a regression model is not constant across all levels of the independent variable

Python code

To illustrate the BLUE concept, let’s generate synthetic data representing linear relationship (with added noise), fit a linear model using Ordinary Least Squares (OLS), estimate the intercept & slope from the fitted model and visualize the results by plotting the original data points along with the fitted line representing the BLUE estimate.

import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Set random seed for reproducibility
np.random.seed(42)

# Generate synthetic data
n = 100  # Number of samples
X = np.linspace(0, 10, n)  # Independent variable
true_slope = 2.5
true_intercept = 1.0

# Generate random noise
noise = np.random.normal(0, 1, n)  # Gaussian noise with mean=0 and std=1

# Dependent variable (Y)
Y = true_intercept + true_slope * X + noise

# Fit the model using Ordinary Least Squares (OLS)
X_with_const = sm.add_constant(X)  # Add a constant term for the intercept
model = sm.OLS(Y, X_with_const)  # Create OLS model
results = model.fit()  # Fit the model

# Get estimated parameters
estimated_intercept, estimated_slope = results.params

# Print estimated parameters
print(f"Estimated Intercept: {estimated_intercept:.2f}")
print(f"Estimated Slope: {estimated_slope:.2f}")

# Plotting the results
plt.figure(figsize=(10, 6))
plt.scatter(X, Y, label='Data Points', color='blue', alpha=0.5)
plt.plot(X, estimated_intercept + estimated_slope * X, color='red', label='BLUE Estimate (OLS)', linewidth=2)
plt.axhline(y=true_intercept, color='green', linestyle='--', label='True Intercept')
plt.axvline(x=0, color='black', linewidth=0.5)
plt.title('Best Linear Unbiased Estimator (BLUE) Illustration')
plt.xlabel('Independent Variable (X)')
plt.ylabel('Dependent Variable (Y)')
plt.legend()
plt.grid()
plt.show()

BLUE estimate using Ordinary Least Squares

Rate this article: Poor Below average Average Good Excellent (37 votes, average: 4.08 out of 5)

Books by the author

Wireless Communication Systems in Matlab Second Edition(PDF) (184 votes, average: 3.65 out of 5) Checkout Added to cart	Digital Modulations using Python (PDF ebook) (137 votes, average: 3.57 out of 5) Checkout Added to cart	Digital Modulations using Matlab (PDF ebook) (138 votes, average: 3.64 out of 5) Checkout Added to cart
Hand-picked Best books on Communication Engineering Best books on Signal Processing

[1]	An Introduction to Estimation Theory
[2]	Bias of an Estimator
[3]	Minimum Variance Unbiased Estimators (MVUE)
[4]	Maximum Likelihood Estimation
[5]	Maximum Likelihood Decoding
[6]	Probability and Random Process
[7]	Likelihood Function and Maximum Likelihood Estimation (MLE)
[8]	Score, Fisher Information and Estimator Sensitivity
[9]	Introduction to Cramer Rao Lower Bound (CRLB)
[10]	Cramer Rao Lower Bound for Scalar Parameter Estimation
[11]	Applying Cramer Rao Lower Bound (CRLB) to find a Minimum Variance Unbiased Estimator (MVUE)
[12]	Efficient Estimators and CRLB
[13]	Cramer Rao Lower Bound for Phase Estimation
[14]	Normalized CRLB - an alternate form of CRLB and its relation to estimator sensitivity
[15]	Cramer Rao Lower Bound (CRLB) for Vector Parameter Estimation
[16]	The Mean Square Error – Why do we use it for estimation problems
[17]	How to estimate unknown parameters using Ordinary Least Squares (OLS)
[18]	Essential Preliminary Matrix Algebra for Signal Processing
[19]	Why Cholesky Decomposition ? A sample case:
[20]	Tests for Positive Definiteness of a Matrix
[21]	Solving a Triangular Matrix using Forward & Backward Substitution
[22]	Cholesky Factorization - Matlab and Python
[23]	LTI system models for random signals – AR, MA and ARMA models
[24]	Comparing AR and ARMA model - minimization of squared error
[25]	Yule Walker Estimation
[26]	AutoCorrelation (Correlogram) and persistence – Time series analysis
[27]	Linear Models - Least Squares Estimator (LSE)
[28]	Best Linear Unbiased Estimator (BLUE)

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.