Key focus: Understand maximum likelihood estimation (MLE) using hands-on example. Know the importance of log likelihood function and its use in estimation problems.
Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of a statistical model. The core idea behind MLE is to find the parameter values that maximize the likelihood of observing the given data under the assumed statistical model.
Likelihood Function:
The likelihood function measures how likely it is to observe the given data for different parameter values. The MLE seeks the parameter values that make the observed data most likely.
Suppose \(X = \left(x_1, x_2, \cdots, x_n \right)\) is the observed data consisting of (\(n \)) independent and identically distributed (i.i.d.) samples, parameterized by \(\theta\). \(\theta\) represents the parameter (or vector of parameters) we want to estimate from the observed data. The parameter \(\theta\) has an underlying probability density function (PDF) or probability mass function (PMF) given by \( f ( x | \theta) \). The likelihood function is given by
The above equation differs significantly from the joint probability calculation that in joint probability calculation, \(\theta\), is considered a random variable. In the above equation, the parameter \(\theta\) is the parameter to be estimated.
Maximum Likelihood Estimation
The maximum likelihood estimate is obtained by solving:
In practice, it is often easier to work with the log-likelihood function, which simplifies computations by converting products into sums:
This is particularly useful when implementing the likelihood metric in digital signal processors.
Example: Estimating Parameters of a Normal Distribution
Suppose we have a dataset that we believe follows a normal distribution with unknown mean (\(\mu\)) and known variance (\(\sigma^2\)).
First step is to choose a statistical model that describes how the data is generated. The PDF (statistical model) of the normal distribution is:
The likelihood function is given by
and the log-likelihood function is
After constructing the log likelihood function, we use calculus or optimization techniques to find parameter values that maximize the likelihood or log-likelihood function.
To maximize this log-likelihood with respect to \( \mu \), take its derivative and set it to zero:
Solving gives:
Thus, the maximum likelihood estimate for \(\mu\) is simply the sample mean.
Example:
Consider the DC estimation problem presented in the previous article where a transmitter transmits continuous stream of data samples representing a constant value – \(A\). The data samples sent via a communication channel gets added with White Gaussian Noise- \(w[n] \sim \mathbb{C} (\mu =0, \sigma^2 =1) \) . The receiver receives the samples and its goal is to estimate the actual DC component – \(A\) in the presence of noise.
Likelihood as an Estimation Metric:
Let’s use the likelihood function as estimation metric. The estimation of A depends on the PDF of the underlying noise – \(w[n]\) . The estimation accuracy depends on the variance of the noise. More the variance less is the accuracy of estimation and vice versa.
Let’s fix \(A = 1.3\) and generate 10 samples from the above model (Use the Matlab script given below to test this. You may get different set of numbers). Now we pretend that we do not know anything about the model and all we want to do is to estimate the DC component (Parameter to be estimated \(\theta = A\) from the observed samples:
Assuming a variance of 1 for the underlying PDF, we will try a range of values for \(A\) from \(-2.0\) to \(+ 1.5\) in steps of \(0.1\) and calculate the likelihood function for each value of \(A\).
Matlab script:
% Demonstration of Maximum Likelihood Estimation in Matlab % Author: Mathuranathan (https://www.gaussianwaves.com) % License : creative commons : Attribution-NonCommercial-ShareAlike 3.0 Unported A=1.3; N=10; %Number of Samples to collect x=A+randn(1,N); s=1; %Assume standard deviation s=1 rangeA=-2:0.1:5; %Range of values of estimation parameter to test L=zeros(1,length(rangeA)); %Place holder for likelihoods for i=1:length(rangeA) %Calculate Likelihoods for each parameter value in the range L(i) = exp(-sum((x-rangeA(i)).^2)/(2*s^2)); %Neglect the constant term (1/(sqrt(2*pi)*sigma))^N as it will pull %down the likelihood value to zero for increasing value of N end [maxL,index]=max(L); %Select the parameter value with Maximum Likelihood display('Maximum Likelihood of A'); display(rangeA(index)); %Plotting Commands plot(rangeA,L);hold on; stem(rangeA(index),L(index),'r'); %Point the Maximum Likelihood Estimate displayText=['leftarrow Likelihood of A=' num2str(rangeA(index))]; title('Maximum Likelihood Estimation of unknown Parameter A'); xlabel('leftarrow A'); ylabel('Likelihood'); text(rangeA(index),L(index)/3,displayText,'HorizontalAlignment','left'); figure(2); plot(rangeA,log(L));hold on; YL = ylim;YMIN = YL(1); plot([rangeA(index) rangeA(index)],[YMIN log(L(index))] ,'r'); %Point the Maximum Likelihood Estimate title('Log Likelihood Function'); xlabel('leftarrow A'); ylabel('Log Likelihood'); text([rangeA(index)],YMIN/2,displayText,'HorizontalAlignment','left');
Simulation Result:
For the above mentioned 10 samples of observation, the likelihood function over the range \(-2 \; to -1.5\) of DC component values is plotted below. The maximum likelihood value happens at \(A=1.4\) as shown in the figure. The estimated value of \(A = 1.4\) since the maximum value of likelihood occurs there.
The estimation accuracy will increase if the number of samples for observation is increased. Try the simulation with the number of samples \(N\) set to \(5000\) or \(10000\) and observe the estimated value of A for each run.
It is often useful to calculate the log likelihood function as it reduces the above mentioned equation to series of additions instead of multiplication of several terms.
The corresponding plot is given below
Advantages of Maximum Likelihood Estimation:
- Asymptotically Efficient – achieving minimum variance among unbiased estimators under regularity conditions 2
- Asymptotically normal – their distribution approaches a normal distribution as sample size grows large.
- Asymptotically consistent – MLEs are consistent estimators; they converge to true parameter values as sample size increases.
- MLE can handle a wide variety of statistical models and distributions 3
- Estimation without any prior information
- The estimates closely agree with the data
Challenges in Maximum Likelihood Estimation:
- For complex models or large datasets, maximizing the likelihood can be computationally expensive 4
- Non-linear models may lead to multiple local maxima in the likelihood function, making global optimization challenging.
- MLE assumes that the chosen model correctly represents how data is generated. Incorrect assumptions can lead to biased estimates.
- It does not utilize any prior information for the estimation. But in real world scenario, we always have some prior information about the parameter to be estimated. We should always use it to our advantage despite it introducing bias in the estimates.
Rate this article:
For further reading
2. Casella, G., & Berger, R. L. (2002). Statistical Inference. Springer Science & Business Media.
3. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
4. Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.
Related topics
Books by the author
![]() Wireless Communication Systems in Matlab Second Edition(PDF) Note: There is a rating embedded within this post, please visit this post to rate it. | ![]() Digital Modulations using Python (PDF ebook) Note: There is a rating embedded within this post, please visit this post to rate it. | ![]() Digital Modulations using Matlab (PDF ebook) Note: There is a rating embedded within this post, please visit this post to rate it. |
Hand-picked Best books on Communication Engineering Best books on Signal Processing |
Can we use the same principle with an inverse gaussian distribution? If so, we calculated the likelihood simply by the exponent part?
Could you please tell me how to do this for multivariate case.?
I have 1000 samples of 5 variables(X = Xtrue + error) and i want to estimate sigma_e(covariance matrix of error) using mle where error is not changing w.r.t samples.
Could you please tell me, why do you start the loop in “i=1:length(rangeA) ” at 1 ? —> that is line 17
It supplies the index for each values contained in the array named “rangeA”
OK, thank you.
In the line 10 of your code you make x=A+randn(1,N) but this doesn’t affect the outcome at all. Isn’t something missing?
okay. Thanks for your comment. Let me know if you find any mistake.
how to find variance when mean is zero using MLE??