Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Flair with many samples #396

Open
shenglin-liu opened this issue Dec 13, 2024 · 1 comment
Open

Running Flair with many samples #396

shenglin-liu opened this issue Dec 13, 2024 · 1 comment
Labels
Documentation Update ReadTheDocs manual duplicate This issue or pull request already exists enhancement New feature or request mod; collapse

Comments

@shenglin-liu
Copy link

I have read the paper (https://doi.org/10.1038/s41467-020-15171-6)
and the manual (https://flair.readthedocs.io/en/latest/) and I still have a question about

  • collapse

Dear developers for Flair,

Thank you very much for providing this tool. I have a question concerning how to run flair collapse.

I have 167 samples (nanopore; transcriptome; human), each with a fastq file of 20-40GB (unzipped). After running flair correct, concatenating the corrected bed files and splitting according to chromosomes, the resulting bed files ranged from 240 MB to 24 GB. What would be the best way to run collapse in this case? Do I really need to feed all the 167 fastq files? How much resource (core, memory, time) should be expected?

For the record, I tried running collapse using the following command. But it stopped after 30 hours, exceeding the memory limit. I allocated 16 cores and 200 GB memory. I fed all the 167 fastq files.
flair collapse -t 16 -o $out -g $ref --gtf $gtf -q $bed -r $fas --stringent --check_splice --generate_map --annotation_reliant generate

Thank you for your help.

Best regards,
Shenglin

@cafelton
Copy link
Collaborator

Sorry you're having this issue, we're working on a new version that has better parallelization + uses less memory. If you want the most complete transcriptome, you do need to run all 167 files together. In that case, you need to split the bed by chromosome, then split the fastq by chr and run collapse (see #391 )
With so many files, if you're ok with not getting every low expression isoform, you can also run collapse on each file individually (not split by chr) and then combine the transcriptomes using this bed file (on a testing branch of FLAIR currently, you can download + add to your install)
This is still in testing, but it sounds perfect for your issue so I wanted to offer it as a solution
Code: https://github.com/BrooksLabUCSC/flair/blob/flair-fusion/src/flair/collapse_bed_files.py
Documentation: https://github.com/BrooksLabUCSC/flair/blob/flair-fusion/docs/source/scripts.rst

@cafelton cafelton added duplicate This issue or pull request already exists enhancement New feature or request Documentation Update ReadTheDocs manual mod; collapse labels Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Update ReadTheDocs manual duplicate This issue or pull request already exists enhancement New feature or request mod; collapse
Projects
None yet
Development

No branches or pull requests

2 participants