Skip to content

Latest commit

 

History

History
55 lines (42 loc) · 1.96 KB

README.md

File metadata and controls

55 lines (42 loc) · 1.96 KB

integration_analysis_scripts

Scripts for multi-omics integration

Unsupervised analysis: integration_unsupervised.R

This script performs unsupervised analyses (clustering) from transformed expression data (e.g., log fpkm) and methylation beta values

Prerequisites

This R script requires the following packages:

  • iClusterPlus
  • gplots
  • lattice

Usage

Rscript integration_unsupervised.R [options]
PARAMETER DEFAULT DESCRIPTION
-d NULL File with somatic mutation data
-C NULL File with copy number variation data
-r NULL File with expression data
-m NULL File with methylation data (beta values)
-k 2 Minimum number of clusters
-K 6 Maximum number of clusters
-c 2 Number of cores
-o out output prefix
-h Show help message and exit

For example, one can type

Rscript integration_unsupervised.R -r expression_matrix.txt -o output/

Details

The script involves 3 steps

  • Data transformation of methylation beta values, using the logit function
  • Clustering across a range of LASSO lambda penalties and for each number of clusters K using iClusterPlus
  • Selection of the best lambda value (BIC) for each K, and plot of the R^2 as a function of K to help the choice of K
  • Selection of the top features differentiating the clusters

Output

  • A figure with R^2 as a function of K, and cluster memberships of each sample as a function of K

In addition, for each value of K:

  • an .RData file with clustering results
  • a heatmap with the top features for each dataset
  • a .txt file with the name of the top features for each dataset

Regression analysis for unsupervised analysis: PCA_regression.R

This script provides functions to perform regression analysis between variables (e.g., batch variables or clinical variables) and latent factors as obtained by PCA or group factor analysis.