Running Flair with many samples #396

shenglin-liu · 2024-12-13T09:57:54Z

I have read the paper (https://doi.org/10.1038/s41467-020-15171-6)
and the manual (https://flair.readthedocs.io/en/latest/) and I still have a question about

collapse

Dear developers for Flair,

Thank you very much for providing this tool. I have a question concerning how to run flair collapse.

I have 167 samples (nanopore; transcriptome; human), each with a fastq file of 20-40GB (unzipped). After running flair correct, concatenating the corrected bed files and splitting according to chromosomes, the resulting bed files ranged from 240 MB to 24 GB. What would be the best way to run collapse in this case? Do I really need to feed all the 167 fastq files? How much resource (core, memory, time) should be expected?

For the record, I tried running collapse using the following command. But it stopped after 30 hours, exceeding the memory limit. I allocated 16 cores and 200 GB memory. I fed all the 167 fastq files.
flair collapse -t 16 -o $out -g $ref --gtf $gtf -q $bed -r $fas --stringent --check_splice --generate_map --annotation_reliant generate

Thank you for your help.

Best regards,
Shenglin

cafelton · 2024-12-13T19:02:41Z

Sorry you're having this issue, we're working on a new version that has better parallelization + uses less memory. If you want the most complete transcriptome, you do need to run all 167 files together. In that case, you need to split the bed by chromosome, then split the fastq by chr and run collapse (see #391 )
With so many files, if you're ok with not getting every low expression isoform, you can also run collapse on each file individually (not split by chr) and then combine the transcriptomes using this bed file (on a testing branch of FLAIR currently, you can download + add to your install)
This is still in testing, but it sounds perfect for your issue so I wanted to offer it as a solution
Code: https://github.com/BrooksLabUCSC/flair/blob/flair-fusion/src/flair/collapse_bed_files.py
Documentation: https://github.com/BrooksLabUCSC/flair/blob/flair-fusion/docs/source/scripts.rst

cafelton added duplicate This issue or pull request already exists enhancement New feature or request Documentation Update ReadTheDocs manual mod; collapse labels Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Flair with many samples #396

Running Flair with many samples #396

shenglin-liu commented Dec 13, 2024

cafelton commented Dec 13, 2024

Running Flair with many samples #396

Running Flair with many samples #396

Comments

shenglin-liu commented Dec 13, 2024

cafelton commented Dec 13, 2024