Tuesday, February 11, 2014

Sequential data analysis

Consider a sequence AABABABBABAB where each occurrence of A or B can be viewed as a state. In the given sequence of length 12 with 2 states we can see 11 transitions. Number of transitions doesn't depend on number of states but length of sequence.

Types of event sequences:

• Event sequence data : Where sequences are recorded independent of their duration.
• e.g. OSOFSFS (assuming states are O,S,F)
• Timed event sequence data: Where sequences are recorded with their duration.
• e.g. OOSSSOOFFSFS (assuming states are O,S,F)
• Multiple event sequence data: More than one category at a time is possible.
• e.g. Mother holds infant, infant vocalizes, mother vocalizes.

There are two goals to sequential analysis,
• To find a stochastic pattern in data
• To assess the effect of contextual or explanatory variables on sequential structure

Consider the first sequence example AABABABBABAB, lets try to apply some probability functions.

P(A) = n(A) / n = 6/12 = 0.50
P(B) = n(B) / n = 6/12 = 0.50 ...(1)

We can see in the sequence that A occurs 6 times and B occurs 5 times immediately after A. So probability of occurring B just after A is,

P( B | A) = 5/6 = 0.83 ...(2)

Based on (1) and (2) we can say that the occurrence of prior state gives us significantly more information about current state probability.

For any given sequence the transitions can be described by,

1. Frequency state transition matrix
2. Probability state transition matrix

In order to model a sequential data we must check reduction of uncertainty by knowledge of past states. Now to check if the reduction in uncertainty is significant there are few methods like,
• A binomial test
• Chi-squared test
• Likelihood ratio
• Logit transformation

These help us understand if the sequence in random or there is some kind of pattern.