- Node specific parameters filter the gene sets included in the enrichment map.
- For a gene set to be included in the enrichment map it needs to pass both p-value and q-value thresholds.
P-value
- All gene sets with a p-value with the specified threshold or below are included in the map.
FDR Q-value
- All gene sets with a q-value with the specified threshold or below are included in the map.
- Depending on the type of analysis the FDR Q-value used for filtering genesets by EM is different
- For GSEA the FDR Q-value used is 8th column in the gsea_results file and is called "FDR q-val".
- For Generic the FDR Q-value used is 4th column in the generic results file.
- For David the FDR Q-value used is 12th column in the david results file and is called "Benjamini".
- For Bingo the FDR Q-value used is 3rd column in the Bingo results file and is called "core p-value"
- An edge represents the degree of gene overlap that exists between two gene sets, A and B.
- Edge specific parameters control the number of edges that are created in the enrichment map.
- Only one coefficient type can be chosen to filter the edges.
Jaccard Coefficient
Jaccard Coefficient = [size of (A intersect B)] / [size of (A union B)]
Overlap Coefficient
Overlap Coefficient = [size of (A intersect B)] / [size of (minimum( A , B))]
Combined Coefficient
- the combined coefficient is a merged version of the jacquard and overlap coefficients.
- the combined constant allows the user to modulate reciprocally the weights associated with the jacquard and overlap coefficients.
- When k = 0.5 the combined coefficient is the average between the jacquard and overlap.
Combined Constant = k Combined Coefficient = (k * Overlap) + ((1-k) * Jaccard)
P-value and FDR Thresholds
GSEA can be used with two different significance estimation settings: gene-set permutation and phenotype permutation. Gene-set permutation was used for Enrichment Map application examples.
Gene-set Permutation
Here are different sets of thresholds you may consider for gene-set permutation:
- Very permissive:
- p-value < 0.05
- FDR < 0.25
- Moderately permissive:
- p-value < 0.01
- FDR < 0.1
- Moderately conservative:
- p-value < 0.005
- FDR < 0.075
- Conservative:
- p-value < 0.001
- FDR < 0.05
For high quality, high coverage transcriptomic data, the number of enriched terms at the very conservative threshold is usually 100-250 when using gene-set permutation.
Phenotype Permutation
- Recommended:
- p-value < 0.05
- FDR < 0.25
In general, we recommend to use permissive thresholds only if your having a hard time finding any enriched terms.
Jaccard vs. Overlap Coefficient
- The Overlap Coefficient is recommended when relations are expected to occur between large-size and small-size gene-sets, as in the case of the Gene Ontology.
- The Jaccard Coefficient is recommended in the opposite case.
- When the gene-sets are about the same size, Jaccard is about the half of the Overlap Coefficient for gene-set pairs with a small intersection, whereas it is about the same as the Overlap Coefficient for gene-sets with large intersections.
- When using the Overlap Coefficient and the generated map has several large gene-sets excessively connected to many other gene-sets, we recommend switching to the Jaccard Coefficient.
Overlap Thresholds
- 0.5 is moderately conservative, and is recommended for most of the analyses.
- 0.3 is permissive, and might result in a messier map.
Jaccard Thresholds
- 0.5 is very conservative
- 0.25 is moderately conservative