Skip to content
forked from cistrome/MIRA

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.

Notifications You must be signed in to change notification settings

ctheodoris/MIRA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIRA (Probabilistic Multimodal Models for Integrated Regulatory Analysis) is a comprehensive methodology that systematically contrasts single cell transcription and accessibility to infer the regulatory circuitry driving cells along developmental trajectories.

MIRA leverages joint topic modeling of cell states and regulatory potential modeling at individual gene loci to:

  • jointly represent cell states in an efficient and interpretable latent space
  • infer high fidelity lineage trees
  • determine key regulators of fate decisions at branch points
  • expose the variable influence of local accessibility on transcription at distinct loci

See our manuscript for details.

Install

MIRA can be installed from either PyPI or conda-forge:

pip install mira-multiome

or

conda install -c conda-forge mira-multiome

Getting Started

MIRA takes count matrices of transcripts and accessible regions measured by single cell multimodal RNA-seq and ATAC-seq from any platform as input data. MIRA output integrates with AnnData data structure for interoperability with Scanpy. The initial model training is faster with GPU hardware but can be accomplished with CPU computation.

Please refer to our tutorial for an overview of analyses that can be achieved with MIRA using an example 10x Multiome embryonic brain dataset.

Gallery

With MIRA, you can analyze single cell multimodal transcriptional (RNA-seq) and accessibility (ATAC-seq) to:

Construct biologically meaningful joint representations of cells progressing through developmental trajectories1:

 

Infer high fidelity lineage trees defining developmental fate decisions1:

 

Learn the "topics" describing cell transcriptional and accessibility states1:

 

Contrast transcriptional and accessibility topics on stream graphs and determine the pathways and regulators governing in each cell state1:

 

Identify the transcription factors driving poised genes down diverging developmental paths, predict transcription factor targets via in silico deletion of putative regulatory elements, plot heatmaps of transcriptional and accessibility dynamics, and compare expression and motif scores of key factors on MIRA's joint representation1:

 

Explore gene expression within lineage trajectories and compare expression to motif score of key factors with stream graphs1:

 

Determine the transcription factors driving fate decisions at key lineage branch points2:

 

Elucidate genes with local chromatin accessibility-influenced transcriptional expression (LITE) versus non-local chromatin accessibility-influenced transcriptional expression (NITE) and plot "chromatin differential" to highlight cells where transcription is decoupled from shifts in local chromatin accessibility2:

 

Quantify NITE regulation of topics or cells across the developmental continuum to reveal how variable circuitry regulates fate commitment and terminal identity.1,2:

 

Overall, MIRA leverages principled probabilistic cell-level topic modeling and gene-level RP modeling to expose the key regulators driving fate decisions at lineage branch points and to precisely contrast the spatiotemporal dynamics of transcription and local chromatin accessibility at unprecedented resolution to reveal the distinct circuitry regulating fate commitment versus terminal identity.

 

Methodology

MIRA Topic Model

MIRA harnesses a variational autoencoder approach to model both transcription and chromatin accessibility topics defining each cell’s identity while accounting for their distinct statistical properties and employing a sparsity constraint to ensure topics are coherent and interpretable. MIRA’s hyperparameter tuning scheme learns the appropriate number of topics needed to comprehensively yet non-redundantly describe each dataset. MIRA next combines the expression and accessibility topics into a joint representation used to calculate a k-nearest neighbors (KNN) graph. This output can then be leveraged for visualization and clustering, construction of high fidelity lineage trajectories, and rigorous topic analysis to determine regulators driving key fate decisions at lineage branch points.

MIRA RP Model

MIRA’s regulatory potential (RP) model integrates transcriptional and chromatin accessibility data at each gene locus to determine how regulatory elements surrounding each gene influence its expression. Regulatory influence of enhancers is modeled to decay exponentially with genomic distance at a rate learned by the MIRA RP model from the joint multimodal data. MIRA learns independent upstream and downstream decay rates and includes parameters to weigh upstream, downstream, and promoter effects. The RP of each gene is scored as the sum of the contribution of individual regulatory elements. MIRA predicts key regulators at each locus by examining transcription factor motif enrichment or occupancy (if provided chromatin immunoprecipitation (ChIP-seq) data) within elements predicted to highly influence transcription at that locus using probabilistic in silico deletion (ISD).

MIRA LITE vs NITE Models

MIRA quantifies the regulatory influence of local chromatin accessibility by comparing the local RP model with a second, expanded model that augments the local RP model with genome-wide accessibility states encoded by MIRA’s chromatin accessibility topics. Genes whose expression is significantly better described by this expanded model are defined as non-local chromatin accessibility-influenced transcriptional expression (NITE) genes. Genes whose transcription is sufficiently predicted by the RP model based on local accessibility alone are defined as local chromatin accessibility-influenced transcriptional expression (LITE) genes. While LITE genes appear tightly regulated by local chromatin accessibility, the transcription of NITE genes appears to be titrated without requiring extensive local chromatin remodeling. MIRA defines the extent to which the LITE model over- or under-estimates expression in each cell as “chromatin differential”, highlighting cells where transcription is decoupled from shifts in local chromatin accessibility. MIRA examines chromatin differential across the developmental continuum to reveal how variable circuitry regulates fate commitment and terminal identity.

Citations

MIRA was created by researchers in the X. Shirley Liu Lab at Dana-Farber Cancer Institute. If you use MIRA in your research, we would appreciate citation of our manuscript (bibtex).

 

Public datasets used for analyses in gallery and tutorial:

  1. Ma, S. et al. Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin. Cell (2020).
  2. Datasets - 10x Genomics. https://support.10xgenomics.com/single-cell-multiome-atac-gex/datasets.

About

Python package for analysis of multiomic single cell RNA-seq and ATAC-seq.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%