-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path4.1_BSgenomeBuild.Rmd
72 lines (68 loc) · 2.72 KB
/
4.1_BSgenomeBuild.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
title: "scATAC-seq Rat Metrial Glands"
author: "Ha T. H. Vu"
output: html_document
---
```{r setup, include=FALSE}
options(max.print = "75")
knitr::opts_chunk$set(
echo = TRUE,
collapse = TRUE,
comment = "#>",
fig.path = "Files/",
fig.width = 15,
prompt = FALSE,
tidy = FALSE,
message = FALSE,
warning = TRUE
)
knitr::opts_knit$set(width = 75)
```
This is a documentation for analyses of scATAC-seq data, generated from rat metrial gland tissues on gestational day (GD) 15.5 and 19.5. <br>
In order to carry out motif analysis, we need a `BSgenome` library for Rnor6. Following the instructions from https://www.bioconductor.org/packages/release/bioc/vignettes/BSgenome/inst/doc/BSgenomeForge.pdf we can develop the library. <br>
First, we need to obtain DNA sequence fasta files from Ensembl (http://ftp.ensembl.org/pub/release-98/fasta/rattus_norvegicus/dna/). The fasta files of each chromosome should be named as `chromosomeName.fa`. For example, `1.fa` for DNA sequence of chromosome 1. <br>
It is possible to use `gz` files, but here I'm using `fa` files.
```{BASH, eval=F}
cd /work/LAS/geetu-lab/hhvu/project3_scATAC/rnor6-Ensembl/
for i in *gz; do e=`echo $i | sed 's/Rattus_norvegicus.Rnor_6.0.dna.chromosome.//g'`; mv $i $e; done
for i in *fa.gz; do gunzip -d $i; done
```
Second, create a seed file like the following:
```
Package: BSgenome.Rnorvegicus.Ensembl.rn6
Title: Full genome sequences for Rattus norvegicus (Ensembl archive Sept. 2019)
Description: Full genome sequences for Rattus norvegicus as provided by Ensembl, Sept. 2019 version.
Version: 0.0.1
organism: Rattus norvegicus
common_name: Rat
provider: Ensembl
provider_version: rn6
release_date: Sept. 2019
release_name: rnor6
source_url: ftp://ftp.ensembl.org/pub/release-98/fasta/rattus_norvegicus/
organism_biocview: Rattus_norvegicus
BSgenomeObjname: Rnorvegicus
seqnames: c(1:20, "X", "Y")
seqs_srcdir: /work/LAS/geetu-lab/hhvu/project3_scATAC/rnor6-Ensembl
```
Third, build the package in R:
```{r, eval=FALSE}
library(BSgenome)
forgeBSgenomeDataPkg("/work/LAS/geetu-lab/hhvu/project3_scATAC/rnor6-Ensembl/BSgenome.Rnorvegicus.Ensembl.rn6-seed")
```
Last, in command line, do the following:
```{BASH, eval=FALSE}
module load gcc/7.3.0-xegsmw4
module load r/4.0.2-py3-icvulwq
module load gsl/2.5-fpqcpxf
module load udunits/2.2.24-yldmp4h
module load gdal/2.4.4-nw2drgf
module load geos/3.8.1-2m7gav4
R CMD build /work/LAS/geetu-lab/hhvu/rstudio/packages/BSgenome.Rnorvegicus.Ensembl.rn6
R CMD check /work/LAS/geetu-lab/hhvu/rstudio/packages/BSgenome.Rnorvegicus.Ensembl.rn6
R CMD INSTALL /work/LAS/geetu-lab/hhvu/rstudio/packages/BSgenome.Rnorvegicus.Ensembl.rn6
```
After this, the package `BSgenome.Rnorvegicus.Ensembl.rn6` will be ready to use.
```{r}
ssessionInfo()
```