Keywords: maximum likelihood decoding, digital communication, data storage, noise, interference, wireless communication systems, optical communication systems, digital storage systems, probability, likelihood estimation, python
Introduction
Maximum likelihood decoding is a technique used to determine the most likely transmitted message in a digital communication system, based on the received signal and statistical models of noise and interference. The method uses maximum likelihood estimation to calculate the probability of each possible transmitted message and then selects the one with the highest probability.
To perform maximum likelihood decoding, the receiver uses a set of pre-defined models to estimate the likelihood of each possible transmitted message based on the received signal. The method is commonly used in various digital communication and data storage systems, such as wireless communication and digital storage. However, it can be complex and time-consuming, particularly in systems with large message spaces or complex noise and interference models.
Maximum Likelihood Decoding:
Consider a set of possible codewords (valid codewords – set \(Y\)) generated by an encoder in the transmitter side. We pick one codeword out of this set ( call it \(y\) ) and transmit it via a Binary Symmetric Channel (BSC) with probability of error \(p\) ( To know what is a BSC – click here ). At the receiver side we receive the distorted version of \(y\) ( call this erroneous codeword \(x\)).
Maximum Likelihood Decoding chooses one codeword from \(Y\) (the list of all possible codewords) which maximizes the following probability.
Meaning that the receiver computes \(P(y_1,x) , P(y_2,x) , P(y_3,x),\cdots,P(y_n,x)\). and chooses a codeword (\(y\)) which gives the maximum probability. In practice we don’t know \(y\) (at the receiver) but we know \(x\). So how to compute the probability ? Maximum Likelihood Estimation (MLE) comes to our rescue. For a detailed explanation on MLE – refer here[1] The aim of maximum likelihood estimation is to find the parameter value(s) that makes the observed data most likely. Understanding the difference between prediction and estimation is important at this point. Estimation differs from prediction in the following way … In estimation problems, likelihood of the parameters is estimated based on given data/observation vector. In prediction problems, probability is used as a measure to predict the outcome from known parameters of a model.
Examples for “Prediction” and “Estimation” :
1) Probability of getting a “Head” in a single toss of a fair coin is \(0.5\). The coin is tossed 100 times in a row.Prediction helps in predicting the outcome ( head or tail ) of the \(101^{th}\) toss based on the probability.
2) A coin is tossed 100 times and the data ( head or tail information) is recorded. Assuming the event follows Binomial distribution model, estimation helps in determining the probability of the event. The actual probability may or may not be \(0.5\). Maximum Likelihood Estimation estimates the conditional probability based on the observed data ( received data – \(x\)) and an assumed model.
Example of Maximum Likelihood Decoding:
Let \(y=11001001\) and \(x=10011001\) . Assuming Binomial distribution model for the event with probability of error \(0.1\) (i.e the reliability of the BSC is \(1-p = 0.9\)), the Hamming distance between codewords is \(2\) . For binomial model,
where \(d\) =the hamming distance between the received and the sent codewords n= number of bit sent
\(p\)= error probability of the BSC.
\(1-p\) = reliability of BSC
Substituting \(d=2, n=8\) and \(p=0.1\) , then \(P(y\;received \mid x\;sent) = 0.005314\).
Note : Here, Hamming distance is used to compute the probability. So the decoding can be called as “minimum distance decoding” (which minimizes the Hamming distance) or “maximum likelihood decoding”. Euclidean distance may also be used to compute the conditional probability.
As mentioned earlier, in practice \(y\) is not known at the receiver. Lets see how to estimate \(P(y \;received \mid x\; sent)\) when \(y\) is unknown based on the binomial model.
Since the receiver is unaware of the particular \(y\) corresponding to the \(x\) received, the receiver computes \(P(y\; received \mid x\; sent)\) for each codeword in \(Y\). The \(y\) which gives the maximum probability is concluded as the codeword that was sent.
Python code implementing Maximum Likelihood Decoding:
The following program for demonstrating the maximum likelihood decoding, involves generating a noisy signal from a transmitted message and then using maximum likelihood decoding to estimate the transmitted message from the noisy signal.
- The
maximum_likelihood_decoding
function takes three arguments:received_signal
,noise_variance
, andmessage_space
. - The
calculate_probabilities
function is called to calculate the probability of each possible message given the received signal, using the known noise variance. - The probabilities are normalized so that they sum to 1.
- The
maximum_likelihood_decoding
function finds the index of the most likely message (i.e., the message with the highest probability). - The function returns the most likely message.
- An example usage is demonstrated where a binary message space is defined (
[0, 1]
), along with a noise variance and a transmitted message. - The transmitted message is added to noise to generate a noisy received signal.
- The
maximum_likelihood_decoding
function is called to decode the noisy signal. - The transmitted message, received signal, and decoded message are printed to the console for evaluation.
import numpy as np
import matplotlib.pyplot as plt
# Define a function to calculate the probability of each possible message given the received signal
def calculate_probabilities(received_signal, noise_variance, message_space):
probabilities = np.zeros(len(message_space))
for i, message in enumerate(message_space):
error = received_signal - message
probabilities[i] = np.exp(-np.sum(error ** 2) / (2 * noise_variance))
return probabilities / np.sum(probabilities)
# Define a function to perform maximum likelihood decoding
def maximum_likelihood_decoding(received_signal, noise_variance, message_space):
probabilities = calculate_probabilities(received_signal, noise_variance, message_space)
most_likely_message_index = np.argmax(probabilities)
return message_space[most_likely_message_index]
# Example usage
message_space = np.array([0, 1])
noise_variance = 0.4
transmitted_message = 1
received_signal = transmitted_message + np.sqrt(noise_variance) * np.random.randn()
decoded_message = maximum_likelihood_decoding(received_signal, noise_variance, message_space)
print('Transmitted message:', transmitted_message)
print('Received signal:', received_signal)
print('Decoded message:', decoded_message)
# Plot probability distribution
probabilities = calculate_probabilities(received_signal, noise_variance, message_space)
plt.bar(message_space, probabilities)
plt.title('Probability Distribution for Received Signal = {}'.format(received_signal))
plt.xlabel('Transmitted Message')
plt.ylabel('Probability')
plt.ylim([0, 1])
plt.show()
The probability of the received signal given a specific transmitted message is calculated as follows:
- Compute the difference between the received signal and the transmitted message.
- Compute the sum of squares of this difference vector.
- Divide this sum by twice the known noise variance.
- Take the negative exponential of this value.
This results in a probability density function (PDF) for the received signal given the transmitted message, assuming that the noise is Gaussian and zero-mean.
The probabilities for each possible transmitted message are then normalized so that they sum to 1. This is done by dividing each individual probability by the sum of all probabilities.
The maximum_likelihood_decoding
function determines the most likely transmitted message by selecting the message with the highest probability, which corresponds to the maximum likelihood estimate of the transmitted message given the received signal and the statistical model of the noise.
Sample outputs
Transmitted message: 1
Received signal: 0.21798306949364643
Decoded message: 0
Transmitted message: 1
Received signal: -0.5115453787966966
Decoded message: 0
Transmitted message: 1
Received signal: 0.8343088336355061
Decoded message: 1
Transmitted message: 1
Received signal: -0.5479891887158619
Decoded message: 0
The probability distribution for the last sample output is shown below
Reference :
[1] – Maximum Likelihood Estimation – a detailed explanation by S.Purcell
Books by the author
Related Topics:
Hello! Great explanation. It helped me a lot.
I wanted to tell you that I think there is a small mistake. One of your paragraphs begin with “In practice we don’t know Y (at the receiver) but we know x.” I think you mean ‘y’ (non capital) instead of ‘Y’. I think at the receiver we know Y, the set of all codewords. Later on, we can compute the hamming distances because we know this set. Isn’t this correct? Thanks for the explanation
Thank for catching the mistake. It is corrected now. Yes, the transmitter and receiver should agree on the set of codewords (the coding scheme) used. Receiver computes likelihood for all set of codewords and selects the one that maximizes the likelihood.