Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: Can't open path/to/data/input_R1.fastq.gz_sample.fastq.gz #13

Open
morien opened this issue May 2, 2019 · 6 comments
Open

error: Can't open path/to/data/input_R1.fastq.gz_sample.fastq.gz #13

morien opened this issue May 2, 2019 · 6 comments

Comments

@morien
Copy link

morien commented May 2, 2019

I've got the above error (i modified the path and file name to simplify the question) with a set of data straight from the sequencing facility. if there are no reads associated with a particular barcode, could that cause this error?

The error occurs after IDEMP has finished parsing the fastq files.

Here's my input command:
/git/idemp/idemp -b data/metadata/demultiplexing_file.txt -I1 data/raw_data/input_I1.fastq.gz -R1 data/raw_data/input_R1.fastq.gz -R2 data/raw_data/input_R2.fastq.gz -m 1 -o data/raw_data/demultiplexed/

i am having trouble imagining why this is happening. i have used IDEMP successfully many times before with the same kind of experiment.

@yhwu
Copy link
Owner

yhwu commented May 15, 2019

I don't think so. Without data, it's hard to know what's going on.

@morien
Copy link
Author

morien commented May 16, 2019

Okay, I'd like to figure out the issue. I'll try to make a small dataset that recreates the problem. The data are too large to share practically right now.

@catalicu
Copy link

catalicu commented Nov 2, 2020

Was this resolved? I am getting the same error and cannot figure out what is wrong.

@morien
Copy link
Author

morien commented Nov 2, 2020

I wasn't able to re-create the issue with a small dataset. Since last year, some of my research group have noticed that this only happens when the dataset has a large number of samples. Somewhere north of 250. I'm not sure whether the developers will have any idea about that, but that's what I and others have observed "in the wild"

So, for datasets with huge numbers of samples, we usually just break up the samples into two chunks, or use a different demultiplexing tool.

@catalicu
Copy link

catalicu commented Nov 2, 2020

That sounds about right, I have 350 samples. I ll try what you suggested, thank you so much for your quick reply!

@yhwu
Copy link
Owner

yhwu commented Nov 3, 2020

Thanks for offering the workaround! This program opens multiple files to write. If you have 100 samples, it writes 100 files at the same time. You may have reached that limit. You can use ulimit -n to check. On my machine, it is only 256. If you are able to change that number, I bet it would work. Otherwise, @morien 's workaround is easy to implement, and it would give you the correct files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants