-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
81 lines (56 loc) · 2.21 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# UMIcountR
<!-- badges: start -->
<!-- badges: end -->
## Molecular Spikes
For information on obtaining molecular spikes, reference fasta and more, please visit the [molecular spikes GitHub repo](https://github.com/sandberg-lab/molecularSpikes).
## Installation
You can install UMIcountR from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("cziegenhain/UMIcountR")
```
## Example
This is a basic example which shows you how to load and analyse some molecular spikes data:
```{r example}
library(UMIcountR)
## basic example code
#load reads from the provided example bam file (Smart-seq3 data)
bam_path <- system.file("extdata", "Smartseq3.TTACCTGCCAGATTCG.bam", package = "UMIcountR", mustWork = TRUE)
#in the case of the simple v1 molecular spike
spikedat <- extract_spike_dat(bam_path, match_seq_before_UMI = "GAGCCTGGGGGAACAGGTAGG", match_seq_after_UMI = "CTCGGAGGAGAAA")
#in the case of the complex molecular spikes set
data("molspike_barcodes_infos_fivep_final")
#spikedat <- extract_complex_spike_dat(bam_path, bc_df = spike_info, max_pattern_dist = 3)
```
After loading the data, we can see the data structure:
```{r spikedat}
str(spikedat)
```
Next, we can run the filtering for overrespresented spUMIs:
```{r overrepresentation}
overrep <- get_overrepresented_spikes(spikedat, readcutoff = 75)
overrep$plots[[1]]
```
You could also apply directional-adjacency error correction to the Smart-seq3 UMI in the test data:
```{r ham}
spikedat[, UB_directional := return_corrected_umi(UX, editham = 1, collapse_mode = "adjacency_directional"), by = BC]
```
To downsample the copy number of the molecular spikes in your data to a relevant "expression level" of interest, you can use the following function:
```{r subs}
spikedat_mean100 <- subsample_recompute(spikedat, mu_nSpikeUMI = 100, threads = 4)
```
## Reference
Molecular spikes: a gold standard for single-cell RNA counting
https://www.nature.com/articles/s41592-022-01446-x