Key focus: Markov chains are a probabilistic models that describe a sequence of observations whose occurrence are statistically dependent only on the previous ones.
● Time-series data like speech, stock price movements.
● Words in a sentence.
● Base pairs on the rung of a DNA ladder.
States and transitions
Assume that we want to model the behavior of a driver behind the wheel. The possible behaviors are
● accelerate (state 1)
● constant speed (state 2)
● idling (engine running slowly but the vehicle is not moving – (state 3))
● brake (state 4)
Let’s refer each of these behaviors as a state. In the given example, there are N=4 states, refer them as Q = {q1,q2,q3,q4}.
We observe the following pattern in the driver’s behavior (Figure 1). That is, the driver operates the vehicle through a certain sequence of states. In the graph shown in Figure 1, the states are represented as nodes and the transitions as edges.
We see that, sometimes, the driver changes the state of the vehicle from one state to another and sometimes stays in the same state (as indicated by the arrows).
We also note that either the vehicle stays in the same state or changes to the next state. Therefore, from this model, if we want to predict the future state, all that matters is the current state of the vehicle. The past states has no bearing on the future state except through the current state. Take note of this important assumption for now.
Probabilistic model
We know that we cannot be certain about the driver’s behavior at any given point in time. Therefore, to model this uncertainty, the model is turned into a probabilistic model. A probabilistic model allows us to account for the likelihood of the behaviors or change of states.
An example for a probabilistic model for the given problem is given in Figure 2.
In this probabilistic model, we have assigned probability values to the transitions.These probabilities are collectively called transition probabilities. For example, considering the state named “idling”, the probability of the car to transition from this state to the next state (accelerate) is 0.8. In probability mathematics this is expressed as a conditional probability conditioned on the previous state.
p(state = “accelerate” | previous state = “idling” ) = 0.8
Usually, the transition probabilities are formulated in the form of matrix called transition probability matrix. The transition probability matrix is shown in Figure 2. In a transition matrix, denoted as A, each element aij represent the probability of transitioning from state i to state j. The elements of the transition matrix satisfy the following property.
That is, the sum of transition probabilities leaving any given state is 1.
As we know, in this example, the driver cannot start car in any state (example, it is impossible to start the car in “constant speed” state). He can only start the car from at rest (i.e, brake state). To model this uncertainty, we introduce πi – the probability that the Markov chain starts in a given state i. The set of starting probabilities for all the N states are called initial probability distribution (π = π1, π2, …, πN). In Figure 3, the starting probabilities are denoted by green arrows.
Markov Assumption
As noted in the definition, the Markov chain in this example, assumes that the occurrence of each event/observation is statistically dependent only on the previous one. This is a first order Markov chain (or termed as bigram language model in natural language processing application). For the states Q = {q1, …, qn}, predicting the probability of a future state depends only on the current observation, all other previous observations do not matter. In probabilistic terms, this first order Markov chain assumption is denoted as
Extending the assumption for mth order Markov chain, predicting the probability of a future observation depends only on the previous m observations. This is an m-gram model.
Given a set of n arbitrary random variables/observations Q = {q1, …, qn}, their joint probability distribution is usually computed by applying the following chain rule.
However, if the random observations in Q are of sequential in nature and follows the generic mth order Markov chain model, then the computation of joint probability gets simplified.
The Markov assumptions for first and second order of Markov models are summarized in Figure 4.
Hidden Markov Model (HMM)
Markov chains are useful in computing the probability of events that are observable. However, in many real world applications, the events that we are interested in are usually hidden, that is we don’t observe them directly. These hidden events need to be inferred. For example, given a sentence in a natural language we only observe the words and characters directly. The parts-of-speech from the sentence are hidden, they have to be inferred. This brings us to the next topic of discussion – the hidden Markov models.
Rate this post: Note: There is a rating embedded within this post, please visit this post to rate it.
Books by the author
Similar topics