Table of Contents
Weighted Gene Co-Expression Network Analysis (WGCNA) #
Why we need WGCNA? #
- To reduce the high dimensional data in to low dimension (Modules).
- Understand the system instead of the individual parts
- Integrate multi scale data (miRNA, mRNA, and Methylation)
Basically, if you are in the process of analysis the gene expression data, you use the t-test FDR and fold changes (FC) for individuals and set the threshold for FC (2 fold or 4 fold), Later you calculate the the co-expression among the desire threshold level gene sets. In this process you can miss the real dynamics of the genes. To overcome those artifacts, WGCNA were formulated with the following
Construct the network
Identify the modules
Relate modules to external information
1) Construct the network
1.1) Calculate the Co-expression (bi-weight mid correlation co-efficient) for each pair of genes from the Expression data (MicroArray/RNAseq) 1.2) Create a adjacency matric A = [aij], by using the step 1.3 to represent the network. 1.3) Each elements of the matric are calculated by the connectivity of the nodes in network. i.e) Weighted Network (Strength of the edge) and Unweighted Network (0 or 1) 1.4) If you chose the weighted network, the Co-expression values need to transform as continuous values using signed or unsigned with soft threshold value, ie β = 6 for signed and β = 12 for unsigned network. Mostly the signed network is preferred by authour. (Figure1)
2) Identify the modules
2.1) construct the hierarchical clustering (HC) from the network and the branches of the HC were cut by dynamic hybrid branch (HB) method automatically and each branches of HB were consider as module and colored with unique colors.(turquoise, blue, brown etc) (Figure3) 2.2) Calculate the Module eigengene (ME) for each module using the singular value decomposition method. 2.3) Module membership (kME = COR(x, ME)) were calculated to each module to relate the modules.
3) Relate modules to external information
3.1) Include other annotations such as gene ontology and T-test values to module genes from the external source.
This is the fundamental behind the WGCNA. Based on this you can also study the module preservation across different data and find the key drivers from the interesting modules. Finally you can visualize the modules in cytoscape and other network viewers.
1) Robustness and less computational power need for large dataset.
2) Easy to find the pathways and key genes from the large data set.
3) Implemented in R (Package WGCNA)
Not able to use for the small data sets. Minimum you need 20 samples set