STRT-Seq technology and errors #12

Davidwei7 · 2023-04-03T22:20:51Z

Dear Sir/Madam,
Hope you are well.
Following resolved problem with the docker image on my cluster, I tried my first run with the launch_universc.sh with technology of STRT-Seq. I ran everything in the docker image converted singularity image.
My command is this:
launch_universc.sh -R1 SRR6026844_sra_S1_L001_R1_001.fastq -R2 SRR6026844_sra_S1_L001_R2_001.fastq -t strt-seq -r /lustre/project/m2_jgu-canshank3/Comparison/Human/HomSap_GRCh38 -i SRR6026844

Please see the below snapshots of the processes and errors:

Please see below the first few rows of the fastq files:

head -n 24 SRR6026844.sra_S1_L001_R1_001.fastq
@SRR6026844.sra.fastq.1 1 length=150
AAGCAGTGGTATCAACGCAGAGTACATGGGGAAAAAGAGAAAAGTGGAGGGATGTGTGGGCCTAGACAGGGGAAAAAGGAGAACAGGAGGCTCCAGACTGGTGAGGAAGGGGAGTGGGCTGGGCGTGCGGCTCATGCCTGTCATCCCAGC
+SRR6026844.sra.fastq.1 1 length=150
AA<<FFJJJFJJJJJJAJJJJJFA<JFJF<7FFFJJ--FJJJJA-F-AJJF<7FAA-FFJJ<AJJJFJJJ--7AJAJJFFJ<J7AJFA<FJ-7-AAJ7JF<<F7AJAAFJ7--777FJJFAA<JA-AJFAJJ-<7<7<FFAJF-FFAF7F
@SRR6026844.sra.fastq.2 2 length=150
AAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTCGTATCAAAGCAGAGTACATGGG
+SRR6026844.sra.fastq.2 2 length=150
AA7FAF7-FFJJJJJJFJJJJJAAJJ7FFJJJJJJJJJJJJJJJJJJJJJJJFJJJFJJJAJJJJJJJJJAJJJJJJJJJJJ-<JJ<J<<-FJA<-<--A7AFJ--7AJJ<<-FF-FJAJA-A<-7F-7AA<--7---FF-)--<AJFJJ
@SRR6026844.sra.fastq.3 3 length=150
AAGCAGTGGTATCAACGCAGAGTACATGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+SRR6026844.sra.fastq.3 3 length=150
AA<FFF<FFFJJJJJJJFJ<JF<JFJ<A<7-FJJJJJFJFJFJJJJJJJJJJFJJJJJJJJJJJJJJJJAJFJJJJ<FJ-FJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJFJJJJAJJJAJFFJJJJJFFJJJJJFJJJFFAA7FFJ
@SRR6026844.sra.fastq.4 4 length=150
TGACCTGTCCCCTCTGGCTGCCTCTGAGTCTGAATCTCCCAAAGAGAGAAACCAATTTCTAAGAGGACTGGATTGCAGAAGACTCGGGGACAACATTTGATCCAAGATCTTAAATGTTATATTGATAACCATGCTCAGCAATGAGCTATT
+SRR6026844.sra.fastq.4 4 length=150
AA<-<7AFFJAFJJJF-7<FJFFAJJA7FFFJF7FJ7JJJF7FJAJFJFF7FFFFJ-FJJJ-<A-F<JFJ-<AA7-AJJJ<FFFJFFJF<JJAJF-FJA-FJJJ7FJJJJF<7A<FJFFJ-<AAJJJ7F7<F<FF7-<7<-<FAF<FAFJ
@SRR6026844.sra.fastq.5 5 length=150
GGAAGGAAGGAAGAAAGAAAGAAAGATAGAGAGAGAGAGAGAGAGAGAAAGATAGAGAGAAATAAAGAAACAAAGAAAGAAAGAAAGAAAGAAAGAAAAAAAAAGAAAAATACAAAAAAAAAAATTCACTTAACTCAGGGGTTCGGAGAT
+SRR6026844.sra.fastq.5 5 length=150
-A-FFF77F-F<<F<JJAF<AJFFJ----<<FJAFFF<FF<FJF<<AFJJJ<<FJJFJFJJ7-F<JAJJA---7A<7AF-<7-777AJAFJJ<-AJJ-F-<-A---A---77F7--7F<FAF<A-------7--7--A--))-)7)---7
@SRR6026844.sra.fastq.6 6 length=150
CCTCCAGATACCACTGAGCCTCTTGCCCATGATTCAGAGCTTTCAAGGATAGGCTTTATTCTGCAAGCAATCAAATAATAAATCTATTCTGCTGAGAGATCACAAAAAAAAAAAAAAAAAAAAAAAAAAACCTATTTGCTGATGAGATCA
+SRR6026844.sra.fastq.6 6 length=150
AA<7-7<<FFJJFFFJJJJJJJ<FJF<J<JFJJJJJJJJJJJJJFJJJFA7F<FJJJFFFJJAJFJ<FF-FA77JA-AJFFJFFA7-FA-FJJJ-AFJ----<F-<AJJA<<--AF-7AFA--AF-A<--7-7-AA<---7---7---7-

head -n 24 SRR6026844_sra_S1_L001_R2_001.fastq
@SRR6026844.sra.fastq.1 1 length=150
NAGGTGCATTCGCCCTCCGTAGAAATCCATGCCAAGTACGCTCCTTCCATTGATTTTCTTGGATCGGGTGTGCACCGCGTAGCTCAGCATGGCAAGTCTGTGTAGTCCGTGGACCCGCCAGGACCCCCCGCCGCACGAGACGCAATACGT
+SRR6026844.sra.fastq.1 1 length=150
#AAA--A--777-A---AA--7------7-7))--)---7)-7----7--------7--77-----7))-))7-7)<--))77-7--)---))--)----7-----7-))7))-)))))-))))-))))-)-)-)))))-))))7---7-
@SRR6026844.sra.fastq.2 2 length=150
NTATGACTCCACCCCTCAGAGAGGAGGAGGCGACGGGGACAACAACTCACAGAGAGCAAAGTCCGTGGCAACCACCCCGTCTGCGGAGAGCAGGTCCGACCCTACTAGACGAGAGACAACGAACGCCGGACCGCACAATGGCGAGAGCTA
+SRR6026844.sra.fastq.2 2 length=150
#<A-----A---7A--A--<-7---))7)7--7)-)---77<F-7-7--7---7-77------77--)))-7)--))))))))-))7)7))-)))))))))))-)----7--)-)----------)))))))))7-)<----))-)))-7
@SRR6026844.sra.fastq.3 3 length=150
NCCACATATAGGGAAACATTTTAATTCTTAGTTATTATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTTTTTTTTTTTTCATTTTTTTTATTT
+SRR6026844.sra.fastq.3 3 length=150
#AAAA-F-A<-A--7----------7--------7------77--F<----7<-<7A<A--AA7-A-----F-<<-<-7--7--7-<77A<7A7-A<-A-F--<A<<-------------A<7-F7-------7----7----7---7--
@SRR6026844.sra.fastq.4 4 length=150
NACTTCGATATAAGATTTTTTTTTTTATTTATTACTCAAAGTTTAGAACATTTTATTAAAGTACAAAAATGTTAGAATTTAGCTAATAGAAAAACATAGTAAATATTTAAAAAAACGCTTATAAAATTACTCAAGGCACCCACAGAAAAC
+SRR6026844.sra.fastq.4 4 length=150
#AAFFAJ-FAJFJJJJ----------------------A---7-77-FFF----7--<-A-7A<--7<----A-A7<---77<-7<-77<FJ77-<-7--AAF-A-------AFJF<---7-<7--7-<----7--)-))-))))7-7--
@SRR6026844.sra.fastq.5 5 length=150
TCGAAGTATGGTGATATCGGAAGAGCTTCGAGTACGTAAATAGTGTAGATCTCGGTTGTCGTCTTATCATTAAAAAAACATTTCTTACTTTTCTCTCTTCGCACACCTCACTTCCTCGCTATATTGCTTCCTCCCTTCCGGGGACAGACC
+SRR6026844.sra.fastq.5 5 length=150
--AF-FF<--7F7-A---<-7--------7-7---7---7-------7----<-)--)---------7---<------------------------7----)-))))-)------7-<))----7----7)--7<)-7<))----))--)
@SRR6026844.sra.fastq.6 6 length=150
TATCAGCAAATAGGGTTTTTTTTTATTATTTTATTTTTTTTTTGATCCCTGAGGAGAATAGCGTTCATATGTGAGTTCTGGCAGAACAAAGGCTAACCTTGAAAGGCCTGTTATCTGGGACAGAAGCCCAGGAGTGCTCGTGTCTGTACC
+SRR6026844.sra.fastq.6 6 length=150
AAAAAFF<FF-FJ-F----------------------------77---)-))7A-7<7----------7------------777)----7A--------7-<AA7<-)7-)-----77)))))----7)))-)7-))-)7)7-)------

**Do you have any idea why the process was not completed?

I also have some information regarding the fastq file and I am sharing here to see if it could be helpful us resolving this problem I am facing.

Firstly, the sequence structure of the fastq file is this:

Secondly, the more detail on how the author analysed their data is in their github (link: [https://github.com/zorrodong/HECA/tree/master/scRNA-seq_pipeline_hg38]).

I am not sure whether I am correct on thinking this:

The sequence structure of this fastq files are opposite of what UniverSC assume by default, so I need to swap the order from read 1 to read 2 before running the command?
The fastq file has its own designed barcodes so I will need to provide a list of barcodes by using -b barcode_96_8bp.txt (barcode_96_8bp.txt is found in their github page: [https://github.com/zorrodong/HECA/blob/master/scRNA-seq_pipeline_hg38/barcode_96_8bp.txt]).
The sequence structure of this fastq file is in line with the strt-seq, but the barcodes length is 8 and UMI length is 8, so I need to use custom_8_8 in my command? Does our current UniverSC support this setting?
The author of this fastq file also used umi_tools in the pipeline to firstly extract UMI and barcodes from the raw fastq file, do I need to do this first before using Universc? (the authors' pipeline is this:

)

I am terribly sorry for giving so much information on my issue. I am quite new to complex bioinformatic problems and want to use your software for integrative analysis. Because I have three datasets with BD-rhapsody technology and 10xGenomices and STRT-Seq (described above), and the STRT-Seq technology generated fastq file I described in this post is the main reference data we are comparing against, I want to do this correctly.

Thanks for developing this tool, and looking forward to your response.

David

The text was updated successfully, but these errors were encountered:

Davidwei7 · 2023-04-14T11:31:04Z

Hi all, I was wondering if you had a chance to look into my issue I experiencing which is described in two posts? Thank you in advance. Looking forward to your response.
Best Wishes,
David

TomKellyGenetics · 2023-04-18T02:30:39Z

Hi, sorry for the delayed response. I don't have much time at the moment but I have some ideas on what may be causing this. I think it is unrelated to the technology. The "proc" command is used to detect the number of cores available to set the default number of threads.

Please try running it again with the number of threads set manually with --threads and let us know if you still have problems persisting. I'll note that you have different system configurations compared to ours discussed in previous issues some some dependencies may be missing.

TomKellyGenetics · 2023-05-25T05:21:32Z

This dataset appears to use SmartSeq2.

Construction protocol: A modified Smart-seq2 protocol was applied for single-cell RNA-seq. Briefly, a single cell was picked into the lysis buffer by mouth pipette. The reverse transcription reaction was performed with 24 oligo (dT) primer anchored with the 8 bp cell specific barcode, and also with 8 bp unique molecular identifiers (UMIs).

https://www.ncbi.nlm.nih.gov/sra/?term=SRR6026844
https://trace.ncbi.nlm.nih.gov/Traces/?run=SRR6026844

Note that our code supports the following configurations:
launch_universc.sh: STRT-Seq (6 bp barcode, no UMI): strt-seq
launch_universc.sh: STRT-Seq-C1 (8 bp barcode, 5 bp UMI): strt-seq-c1
launch_universc.sh: STRT-Seq-2i (13 bp barcode, 6 bp UMI): strt-seq-2i

It is possible to support an 8bp UMI but it will require a dedicated configuration. If it is a popular protocol we can support this but it appears to be a custom workflow used in this paper. Another possible workaround is to rename R1 and R2 (manually switch them) and run custom_8_8 which assumes R1 contains [BC][UMI].... and R2 contains transcript reads (as for 10x settings).

I'll note the paper here to investigate later in more details:

Fan X et al., "Spatial transcriptomic survey of human embryonic cerebral cortex by single-cell RNA-seq analysis.", Cell Res, 2018 Jul;28(7):730-745

…18) https://doi.org/10.1038/s41422-018-0053-3 #12

TomKellyGenetics · 2023-09-19T00:24:55Z

@Davidwei7 sorry for the delayed response. I've investigated issues with these protocols and updated the source code to support it.

Please note that this protocol by Fan et al. (2018) is significantly modified from the originally published data from Islam et al. (2011).

We modified the STRT-seq method for amplification of single-cell transcriptomes by changing the reverse transcription primer, the induced cell barcode, and the unique molecular identifier (UMI).

This requires a different bioinformatics approach.

Raw reads were first segregated based on the cell-specific barcode information in read 2 of the pair-ended reads. Then, sequences in read 1 were trimmed with customized scripts to remove the TSO sequence, the polyA tail sequence and sequences with low-quality bases (N > 10%) or contaminated with adapters. Subsequently, the stripped read 1 sequences were aligned to the hg19 human reference genome.

Therefore I have created separate technology settings "strt-seq" for the original protocol and "strt-seq-2018" for the custom version. I've pushed this new configuration to the "dev" branch so it is possible to update to the development version to try it. There are minor changes to the source code so I expect it will run without errors. I've tested it on raw SRA data in FASTQ format from both publications and confirmed it created Cell Ranger compatible files.

…18) https://doi.org/10.1038/s41422-018-0053-3 minoda-lab#12

TomKellyGenetics · 2023-09-19T00:34:24Z

Closing this issue as this technology is now supported. Raw reads from SRR6026844 tested without errors. Please re-open of file another issue if there are still problems with your environment preventing you from replicating this.

adc0032 · 2024-02-07T21:00:38Z

Hi!
has this been added to the main branch and incorporated into what is installed in the docker? I don't have an option for strt-seq-2018 in my docker installation.

TomKellyGenetics · 2024-02-07T21:26:00Z

v1.2.7 has been merged and released on GitHub. Docker builds are in progress and will be available soon.

TomKellyGenetics · 2024-02-08T07:33:38Z

@adc0032 @Davidwei7 @kbattenb The latest version (1.2.7) passed docker builds and is now available on dockerhub: https://hub.docker.com/r/tomkellygenetics/universc/tags

This version supports STRT-Seq, PIP-Seq, and VASA-Seq protocols. I have some minor changes to versioning and issues #17 or #20 under consideration but the above technologies should work resolving the above issues #12 and #16.

adc0032 · 2024-02-14T15:28:18Z

@TomKellyGenetics

Thank you for getting this updated!

Should strt-seq-2018 still be undergoing file format conversion in line 3138?

universc/launch_universc.sh

Line 3662 in 7cbd039

#STRT-Seq

I don't see it included here in the STRT-Seq section.

TomKellyGenetics · 2024-02-24T07:14:26Z

I think not since UMIs are already included in the 2018 custom protocol. It may be necessary to remove the TSO sequence from as described in the paper (by hard trimming R1s) if you are using (paired-end) 10x 5' scRNA chemistry settings. I think it is not necessary to perform TSO conversion on the R2 after the barcode and UMI as you can just use 10x 3' scRNA chemistry settings which ignore the rest of this read.

TomKellyGenetics mentioned this issue Sep 6, 2023

Issue with 5' technologies with single-ends #16

Closed

TomKellyGenetics added a commit that referenced this issue Sep 18, 2023

adds custom parameters (STRT-Seq-2018 technology) for Fan et al., (20…

4ace971

…18) https://doi.org/10.1038/s41422-018-0053-3 #12

TomKellyGenetics added a commit to TomKellyGenetics/universc that referenced this issue Sep 19, 2023

adds custom parameters (STRT-Seq-2018 technology) for Fan et al., (20…

848e490

…18) https://doi.org/10.1038/s41422-018-0053-3 minoda-lab#12

TomKellyGenetics closed this as completed Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STRT-Seq technology and errors #12

STRT-Seq technology and errors #12

Davidwei7 commented Apr 3, 2023

Davidwei7 commented Apr 14, 2023

TomKellyGenetics commented Apr 18, 2023

TomKellyGenetics commented May 25, 2023

TomKellyGenetics commented Sep 19, 2023

TomKellyGenetics commented Sep 19, 2023

adc0032 commented Feb 7, 2024

TomKellyGenetics commented Feb 7, 2024

TomKellyGenetics commented Feb 8, 2024 •

edited

Loading

adc0032 commented Feb 14, 2024

TomKellyGenetics commented Feb 24, 2024

STRT-Seq technology and errors #12

STRT-Seq technology and errors #12

Comments

Davidwei7 commented Apr 3, 2023

Please see the below snapshots of the processes and errors:

Please see below the first few rows of the fastq files:

**Do you have any idea why the process was not completed?

I also have some information regarding the fastq file and I am sharing here to see if it could be helpful us resolving this problem I am facing.

Firstly, the sequence structure of the fastq file is this:

Secondly, the more detail on how the author analysed their data is in their github (link: [https://github.com/zorrodong/HECA/tree/master/scRNA-seq_pipeline_hg38]).

Davidwei7 commented Apr 14, 2023

TomKellyGenetics commented Apr 18, 2023

TomKellyGenetics commented May 25, 2023

TomKellyGenetics commented Sep 19, 2023

TomKellyGenetics commented Sep 19, 2023

adc0032 commented Feb 7, 2024

TomKellyGenetics commented Feb 7, 2024

TomKellyGenetics commented Feb 8, 2024 • edited Loading

adc0032 commented Feb 14, 2024

TomKellyGenetics commented Feb 24, 2024

TomKellyGenetics commented Feb 8, 2024 •

edited

Loading