Sunday, August 17, 2014

Survival Analysis

While working on few assignments related to exploring disorder in cohort studies I came across the concept of Survival analysis. It seems very useful in many real life scenarios.

What is Survival analysis?

"Survival analysis is a branch of statistics which deals with analysis of time duration to until one or more events happen".[1] The event of interest can be development of a disease, failure of a mechanical system or a person getting married.

In survival analysis, subjects are generally followed over a certain time period and the focus is on the time at which the event of interest occurs.

Censoring:  Observations are considered censored when the information about their survival time is not complete.

Types of Censoring

Right censoring: Consider a survival analysis study with event of interest as getting divorced. Assume subjects are followed in a study for 20 years. Now a subject who does not get divorced (does not experience the event of interest) for the duration of the study is called right censored. The survival time for this person is considered to be at least as long as the duration of the study.

Left censoring: If a subject's lifetime is less than observed duration, is it said to be left censored.

Analogy with regression: The concept is pretty much similar to regression with a dependent variable and multiple independent variable. Also we will get similar output containing coefficient, standard error, p-value etc. Then why cant we simply use linear regression instead? Well, because regression is not capable of effectively dealing with censored data. 

Difference with regression: In survival analysis the dependent variable is made up of two variables, 
  • Time to the event of interest
  • The event status 
We use censoring concepts discussed above to fix any missing data in dependent variable. Now the point of whole analysis is estimation of two functions,
  • Survival function: It gives us survival probability (chances of event of interests not happening)
  • Hazard function: It gives us chances of event happening per unit time, provided the subject has survived in given time.
A popular model used in survival analysis is cox proportional hazards regression model. In R you can find this in "survival" package, coxph() function.


[1]Survival analysis
[2]Cornell university stats news letter

No comments:

Post a Comment