-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STRT-Seq technology and errors #12
Comments
Hi all, I was wondering if you had a chance to look into my issue I experiencing which is described in two posts? Thank you in advance. Looking forward to your response. |
Hi, sorry for the delayed response. I don't have much time at the moment but I have some ideas on what may be causing this. I think it is unrelated to the technology. The "proc" command is used to detect the number of cores available to set the default number of threads. Please try running it again with the number of threads set manually with --threads and let us know if you still have problems persisting. I'll note that you have different system configurations compared to ours discussed in previous issues some some dependencies may be missing. |
This dataset appears to use SmartSeq2.
https://www.ncbi.nlm.nih.gov/sra/?term=SRR6026844 Note that our code supports the following configurations: It is possible to support an 8bp UMI but it will require a dedicated configuration. If it is a popular protocol we can support this but it appears to be a custom workflow used in this paper. Another possible workaround is to rename R1 and R2 (manually switch them) and run custom_8_8 which assumes R1 contains [BC][UMI].... and R2 contains transcript reads (as for 10x settings). I'll note the paper here to investigate later in more details: Fan X et al., "Spatial transcriptomic survey of human embryonic cerebral cortex by single-cell RNA-seq analysis.", Cell Res, 2018 Jul;28(7):730-745 |
@Davidwei7 sorry for the delayed response. I've investigated issues with these protocols and updated the source code to support it. Please note that this protocol by Fan et al. (2018) is significantly modified from the originally published data from Islam et al. (2011).
This requires a different bioinformatics approach.
Therefore I have created separate technology settings " |
Closing this issue as this technology is now supported. Raw reads from SRR6026844 tested without errors. Please re-open of file another issue if there are still problems with your environment preventing you from replicating this. |
Hi! |
v1.2.7 has been merged and released on GitHub. Docker builds are in progress and will be available soon. |
@adc0032 @Davidwei7 @kbattenb The latest version (1.2.7) passed docker builds and is now available on dockerhub: https://hub.docker.com/r/tomkellygenetics/universc/tags This version supports STRT-Seq, PIP-Seq, and VASA-Seq protocols. I have some minor changes to versioning and issues #17 or #20 under consideration but the above technologies should work resolving the above issues #12 and #16. |
Thank you for getting this updated! Should Line 3662 in 7cbd039
I don't see it included here in the STRT-Seq section. |
I think not since UMIs are already included in the 2018 custom protocol. It may be necessary to remove the TSO sequence from as described in the paper (by hard trimming R1s) if you are using (paired-end) 10x 5' scRNA chemistry settings. I think it is not necessary to perform TSO conversion on the R2 after the barcode and UMI as you can just use 10x 3' scRNA chemistry settings which ignore the rest of this read. |
Dear Sir/Madam,
Hope you are well.
Following resolved problem with the docker image on my cluster, I tried my first run with the launch_universc.sh with technology of STRT-Seq. I ran everything in the docker image converted singularity image.
My command is this:
launch_universc.sh -R1 SRR6026844_sra_S1_L001_R1_001.fastq -R2 SRR6026844_sra_S1_L001_R2_001.fastq -t strt-seq -r /lustre/project/m2_jgu-canshank3/Comparison/Human/HomSap_GRCh38 -i SRR6026844
Please see the below snapshots of the processes and errors:
Please see below the first few rows of the fastq files:
head -n 24 SRR6026844.sra_S1_L001_R1_001.fastq
@SRR6026844.sra.fastq.1 1 length=150
AAGCAGTGGTATCAACGCAGAGTACATGGGGAAAAAGAGAAAAGTGGAGGGATGTGTGGGCCTAGACAGGGGAAAAAGGAGAACAGGAGGCTCCAGACTGGTGAGGAAGGGGAGTGGGCTGGGCGTGCGGCTCATGCCTGTCATCCCAGC
+SRR6026844.sra.fastq.1 1 length=150
AA<<FFJJJFJJJJJJAJJJJJFA<JFJF<7FFFJJ--FJJJJA-F-AJJF<7FAA-FFJJ<AJJJFJJJ--7AJAJJFFJ<J7AJFA<FJ-7-AAJ7JF<<F7AJAAFJ7--777FJJFAA<JA-AJFAJJ-<7<7<FFAJF-FFAF7F
@SRR6026844.sra.fastq.2 2 length=150
AAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTCGTATCAAAGCAGAGTACATGGG
+SRR6026844.sra.fastq.2 2 length=150
AA7FAF7-FFJJJJJJFJJJJJAAJJ7FFJJJJJJJJJJJJJJJJJJJJJJJFJJJFJJJAJJJJJJJJJAJJJJJJJJJJJ-<JJ<J<<-FJA<-<--A7AFJ--7AJJ<<-FF-FJAJA-A<-7F-7AA<--7---FF-)--<AJFJJ
@SRR6026844.sra.fastq.3 3 length=150
AAGCAGTGGTATCAACGCAGAGTACATGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+SRR6026844.sra.fastq.3 3 length=150
AA<FFF<FFFJJJJJJJFJ<JF<JFJ<A<7-FJJJJJFJFJFJJJJJJJJJJFJJJJJJJJJJJJJJJJAJFJJJJ<FJ-FJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJFJJJJAJJJAJFFJJJJJFFJJJJJFJJJFFAA7FFJ
@SRR6026844.sra.fastq.4 4 length=150
TGACCTGTCCCCTCTGGCTGCCTCTGAGTCTGAATCTCCCAAAGAGAGAAACCAATTTCTAAGAGGACTGGATTGCAGAAGACTCGGGGACAACATTTGATCCAAGATCTTAAATGTTATATTGATAACCATGCTCAGCAATGAGCTATT
+SRR6026844.sra.fastq.4 4 length=150
AA<-<7AFFJAFJJJF-7<FJFFAJJA7FFFJF7FJ7JJJF7FJAJFJFF7FFFFJ-FJJJ-<A-F<JFJ-<AA7-AJJJ<FFFJFFJF<JJAJF-FJA-FJJJ7FJJJJF<7A<FJFFJ-<AAJJJ7F7<F<FF7-<7<-<FAF<FAFJ
@SRR6026844.sra.fastq.5 5 length=150
GGAAGGAAGGAAGAAAGAAAGAAAGATAGAGAGAGAGAGAGAGAGAGAAAGATAGAGAGAAATAAAGAAACAAAGAAAGAAAGAAAGAAAGAAAGAAAAAAAAAGAAAAATACAAAAAAAAAAATTCACTTAACTCAGGGGTTCGGAGAT
+SRR6026844.sra.fastq.5 5 length=150
-A-FFF77F-F<<F<JJAF<AJFFJ----<<FJAFFF<FF<FJF<<AFJJJ<<FJJFJFJJ7-F<JAJJA---7A<7AF-<7-777AJAFJJ<-AJJ-F-<-A---A---77F7--7F<FAF<A-------7--7--A--))-)7)---7
@SRR6026844.sra.fastq.6 6 length=150
CCTCCAGATACCACTGAGCCTCTTGCCCATGATTCAGAGCTTTCAAGGATAGGCTTTATTCTGCAAGCAATCAAATAATAAATCTATTCTGCTGAGAGATCACAAAAAAAAAAAAAAAAAAAAAAAAAAACCTATTTGCTGATGAGATCA
+SRR6026844.sra.fastq.6 6 length=150
AA<7-7<<FFJJFFFJJJJJJJ<FJF<J<JFJJJJJJJJJJJJJFJJJFA7F<FJJJFFFJJAJFJ<FF-FA77JA-AJFFJFFA7-FA-FJJJ-AFJ----<F-<AJJA<<--AF-7AFA--AF-A<--7-7-AA<---7---7---7-
head -n 24 SRR6026844_sra_S1_L001_R2_001.fastq
@SRR6026844.sra.fastq.1 1 length=150
NAGGTGCATTCGCCCTCCGTAGAAATCCATGCCAAGTACGCTCCTTCCATTGATTTTCTTGGATCGGGTGTGCACCGCGTAGCTCAGCATGGCAAGTCTGTGTAGTCCGTGGACCCGCCAGGACCCCCCGCCGCACGAGACGCAATACGT
+SRR6026844.sra.fastq.1 1 length=150
#AAA--A--777-A---AA--7------7-7))--)---7)-7----7--------7--77-----7))-))7-7)<--))77-7--)---))--)----7-----7-))7))-)))))-))))-))))-)-)-)))))-))))7---7-
@SRR6026844.sra.fastq.2 2 length=150
NTATGACTCCACCCCTCAGAGAGGAGGAGGCGACGGGGACAACAACTCACAGAGAGCAAAGTCCGTGGCAACCACCCCGTCTGCGGAGAGCAGGTCCGACCCTACTAGACGAGAGACAACGAACGCCGGACCGCACAATGGCGAGAGCTA
+SRR6026844.sra.fastq.2 2 length=150
#<A-----A---7A--A--<-7---))7)7--7)-)---77<F-7-7--7---7-77------77--)))-7)--))))))))-))7)7))-)))))))))))-)----7--)-)----------)))))))))7-)<----))-)))-7
@SRR6026844.sra.fastq.3 3 length=150
NCCACATATAGGGAAACATTTTAATTCTTAGTTATTATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTTTTTTTTTTTTCATTTTTTTTATTT
+SRR6026844.sra.fastq.3 3 length=150
#AAAA-F-A<-A--7----------7--------7------77--F<----7<-<7A<A--AA7-A-----F-<<-<-7--7--7-<77A<7A7-A<-A-F--<A<<-------------A<7-F7-------7----7----7---7--
@SRR6026844.sra.fastq.4 4 length=150
NACTTCGATATAAGATTTTTTTTTTTATTTATTACTCAAAGTTTAGAACATTTTATTAAAGTACAAAAATGTTAGAATTTAGCTAATAGAAAAACATAGTAAATATTTAAAAAAACGCTTATAAAATTACTCAAGGCACCCACAGAAAAC
+SRR6026844.sra.fastq.4 4 length=150
#AAFFAJ-FAJFJJJJ----------------------A---7-77-FFF----7--<-A-7A<--7<----A-A7<---77<-7<-77<FJ77-<-7--AAF-A-------AFJF<---7-<7--7-<----7--)-))-))))7-7--
@SRR6026844.sra.fastq.5 5 length=150
TCGAAGTATGGTGATATCGGAAGAGCTTCGAGTACGTAAATAGTGTAGATCTCGGTTGTCGTCTTATCATTAAAAAAACATTTCTTACTTTTCTCTCTTCGCACACCTCACTTCCTCGCTATATTGCTTCCTCCCTTCCGGGGACAGACC
+SRR6026844.sra.fastq.5 5 length=150
--AF-FF<--7F7-A---<-7--------7-7---7---7-------7----<-)--)---------7---<------------------------7----)-))))-)------7-<))----7----7)--7<)-7<))----))--)
@SRR6026844.sra.fastq.6 6 length=150
TATCAGCAAATAGGGTTTTTTTTTATTATTTTATTTTTTTTTTGATCCCTGAGGAGAATAGCGTTCATATGTGAGTTCTGGCAGAACAAAGGCTAACCTTGAAAGGCCTGTTATCTGGGACAGAAGCCCAGGAGTGCTCGTGTCTGTACC
+SRR6026844.sra.fastq.6 6 length=150
AAAAAFF<FF-FJ-F----------------------------77---)-))7A-7<7----------7------------777)----7A--------7-<AA7<-)7-)-----77)))))----7)))-)7-))-)7)7-)------
**Do you have any idea why the process was not completed?
I also have some information regarding the fastq file and I am sharing here to see if it could be helpful us resolving this problem I am facing.
Firstly, the sequence structure of the fastq file is this:
Secondly, the more detail on how the author analysed their data is in their github (link: [https://github.com/zorrodong/HECA/tree/master/scRNA-seq_pipeline_hg38]).
I am not sure whether I am correct on thinking this:
-b barcode_96_8bp.txt
(barcode_96_8bp.txt is found in their github page: [https://github.com/zorrodong/HECA/blob/master/scRNA-seq_pipeline_hg38/barcode_96_8bp.txt]).custom_8_8
in my command? Does our current UniverSC support this setting?umi_tools
in the pipeline to firstly extract UMI and barcodes from the raw fastq file, do I need to do this first before using Universc? (the authors' pipeline is this:)
I am terribly sorry for giving so much information on my issue. I am quite new to complex bioinformatic problems and want to use your software for integrative analysis. Because I have three datasets with BD-rhapsody technology and 10xGenomices and STRT-Seq (described above), and the STRT-Seq technology generated fastq file I described in this post is the main reference data we are comparing against, I want to do this correctly.
Thanks for developing this tool, and looking forward to your response.
David
The text was updated successfully, but these errors were encountered: