Datavu: Network Analysis application in Genetic Studies

Background

SNPs are DNA sequence variations in a population. For example DNA molecule 1 has C-G pair of alleles and DNA molecule 2 has T-A pair of alleles as shown below.

"Dna-SNP" by David Hall (Gringer) - Own work. Licensed under CC BY 2.5 via Wikimedia Commons.

In human genome there are around 3 billion base pairs like as we can see above. Variations in these affect traits like eye color, height and disorders like obesity, diabetes etc.

Again, traits are affected by both genetic and environmental factors. Good thing is we can determine what percent of cause behind a certain trait is genetic, its called heriatability.

Formal definition would be, percentage of phenotypic variation (variation in trait) in population due to genetic variation.

This genetic variation can be SNP or something else like insertion-deletion variant, block substitution, inversion variant, copy number variant. However SNP is most abundant form of variation and can be easily measured hence the preferred one.

Motivation

Traditional studies focus on independent additive effects of SNPs to trait.

Though we have discovered lot of genes associated certain traits/disorders, the case of "missing heritability" is still unsolved.

The "missing heritability" problem can be defined as the fact that individual genes cannot account for much of the heritability of diseases, behaviors, and other phenotypes. [2]

[Image source unknown]

To summarize, the individual additive effects of SNPs are giving us very little information.

Lets take hypothetical example, if diabetes has 50 percent heritability, which mean half of the cause should be explained by genetic factors. In reality we could barely find 7-8 percent by following tradition approach.

This makes us think, rather than considering only individual effect of genetic variations we should be looking at complex interactions between these variations and possible effects of these interactions on traits.

Network analysis is one of the few measures proposed to deal with this issue.

How Network analysis is used

Previous discussion gives us two hints,

We could consider SNPs as nodes of our new network
We need to accommodate numerical values for interaction between every 2 SNPs

So we can build a possible network like following,

Create a matrix of n * n, representing interaction between n SNPs. The values in cells would be p-values of interaction between SNPs with respect to trait under observation. We can form the regression equation for a binary trait like,

reg(trait ~ factor1 + SNP1 + SNP2 + SNP1 * SNP2)

Important thing here is as we add more factors (covariates) in addition to SNPs, we must be careful while interpreting the results. The underlined part of the equation accounts for interaction in the equation,

Now the matrix is pretty much similar to adjacency matrix which is used as input to network generation. The only problem is we have edges between any two given nodes. We can fix this by setting some threshold for p-value. This helps in converting fully connected graph into a network which will help us to understand SNPs associated with strong interactions.

Now here, unlike individual effect we want to account for interaction effect of SNPs which can be done by calculating centralities of every node (SNP).The preferred measure is Eigenvector centrality.

So if we rank SNPs in descending order of Eigenvector centrality, it will give us SNPs playing significant role in interaction at the top in the list.

These SNPs might be playing crucial role in determining the trait not because of individual effect but due to complex interactions with other SNPs.

Datavu

Labels

Thursday, July 31, 2014

Network Analysis application in Genetic Studies

No comments:

Post a Comment