SVIME

A C++ library that implements Stochastic Variational Inference for Motif Elicitation (SVIME). It discovers an unbounded number of motifs over DNA sequences from FASTA files and produces their logos.

Installation

Simply copy the source and header files to the src folder of your project.

Dependencies

C++

Boost
Eigen
OpenMP

Python3

Matplotlib
Numpy
Pandas

Tutorial

We'll find Oct4 binding motifs in DNA sequences overlapping Oct4 ChIP-seq peaks from [1]. Download the 'oct4_sorted.fa' file from https://github.com/tahmidmehdi/svime/tree/master/data. This file contains the binding sites.

Create a project with a source file and include the following files:

#include "svime.h"
#include "distribution.h"
#include "util.h"
#include "asa103.hpp"
#include "processFasta.h"
#include <omp.h>
#include <boost/foreach.hpp>
#include <boost/math/special_functions/digamma.hpp>
#include <Eigen/StdVector>

In your main function, create a mapping of chromosomes to sizes (number of 15-mers in the chromosome based on the FASTA file) with the faToMatrix function. This stores sequences and genomic coordinates to .txt files in a specified output directory.

std::map<std::string, int> chrSizes = faToMatrix("/path/to/oct4_sorted.fa", 15, "/path/to/output");

Create an array of parameters for the step-size function of SVI. The function is described in [2]. The first element of the array is the tau parameter and the second is the kappa parameter. Then, create a pointer to tau.

float step[2] = {0, 0.5};
float* stepPtr = step;

Create a svime object. The arguments are described in the next section.

svime model = svime(15, 1, 1, 20, stepPtr, 1000, 10, 4, 42);

Fit the model & find motifs. Check /path/to/output/results for logos.

svime::variationalDist q = model.fit_predict("/path/to/output", chrSizes, NULL);

Classes

svime

Implements SVI for Dirichlet Process Mixture of Product-Multinomials [3].

Argument	Data type	Description
window	int	required. Length of motifs.
alpha	float	required. The alpha parameter for the Dirichlet Process. Determines how precisely the model should look for motifs. Higher values will create more motifs.
epochs	int	required. The maximum number of epochs.
max_clusters	int	required. The maximum number of motifs the model can create.
step_pars	float*	required. Array of parameters for step-size function.
batch_size	int	optional (default: 1000). Number of window-mers in each batch.
tol	float	optional (default: 0.001). The algorithm stops when the difference between evidence lower bounds (ELBOs) in 2 consecutive iterations is less than tol.
n_jobs	int	optional (default: 1). The number of threads to use.
random_state	int	optional (default: 42). Determines the initial clusters and ensures reproducibility.

fit_predict(outDir, chrSizes, hyperparameters = NULL)

Argument	Data type	Description
outDir	string	required. Output directory.
chrSizes	map<string, int>	required. A mapping of chromosomes to their sizes.
hyperparameters	psm*	optional (default: all concentrations are set to 1). A position score matrix (psm struct) of prior concentration parameters for each base and position.

References

[1] Kopp, W. and Schulte-Sasse, R. (2017). Unsupervised learning of dna sequence features using a convolutional restricted boltzmann machine. bioRxiv.

[2] Hoffman, M. D. et al. (2013). Stochastic variational inference. The Journal of Machine Learning Research, 14(1), 1303-1347.

[3] Dunson, D. B. and Xing, C. (2009). Nonparametric bayes modeling of multivariate categorical data. Journal of the American Statistical Association, 104, 1042-1051.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
asa103.cpp		asa103.cpp
asa103.hpp		asa103.hpp
distribution.h		distribution.h
gact.py		gact.py
motif_logo.py		motif_logo.py
processFasta.cpp		processFasta.cpp
processFasta.h		processFasta.h
svime.cpp		svime.cpp
svime.h		svime.h
util.h		util.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SVIME

Installation

Dependencies

Tutorial

Classes

svime

References

About

Releases

Packages

Languages

License

tahmidmehdi/svime

Folders and files

Latest commit

History

Repository files navigation

SVIME

Installation

Dependencies

Tutorial

Classes

svime

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages