-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NA annotations in pau table (drosophila, dm6) #15
Comments
In the APA_ID, the NA values usually contain an Ensembl gene ID. So this looks like a possible issue during the |
Hi @kcha, thanks for the quick reply. I checked the agreement of gene IDs between In case you want to reproduce anything: I used the ensembl-91, dm6 annotation. Below are the examples for the requested files:
Here is a sample of the
List of chromosome names in the
For our internal annotations we remove the |
Hi @mirax87, thanks for sharing the information. Can you share what your |
here are the commands - following the guide lines.
both log files are empty - no warning nor error. |
The command looks good to me so I'm not exactly sure what the issue is. Would you be able to send me It's been great hearing from users building their own libraries, but I don't have much experience working with other species outside of human and mouse! |
I have been using an older version, which was working fine. I have not investigated #13 as the user did not provide any further information. Perhaps with the data you e-mailed me I can finally resolve it. |
I tried downgrading the CRAN package data.table to versions 1.11.0 and 1.10.0. No effect on the amount NA counts in APA_ID. |
Hi @kcha , your latest update fixing
On top of that, I used data.table 1.12.2, indicating that the version works for me. Further, I observed, that transcript IDs are now separated by a comma. But this is - if at all - a separate issue. Thank you for your time and energy looking into it! From my side, it's okay to close this issue now. |
HI @mirax87, Thanks for sending some sample data for me to investigate and resolve the issue. That was really helpful! As we discussed offline, the issue was related to an apostrophe in the database file and |
After successfully running the QAPA workflow for drosophila (ensembl release 91, dm6) and quantification using mRNA-seq using Salmon (please note, there is no 3'-seq involved) - I found that the the PAU table (output of
qapa quant
) contains a substantial number of NA entries in the APA_ID column, which I intepret as 3'UTRs discarded by QAPA.Short version:
After double checking some of the NA entries, I found genes that contain only non-ambiguous UTR3 and can't explain why they were discarded.
Thus, my questions:
Long version:
The output of
qapa build
, i.e. total number of UTR3 before filtering:Number of UTR3s assessed and the number of NA entries
The
Num_Events
columns shows for all NA entries the total number of NA entries, i.e. 7294.Checking some NA entries through IGV, next to some ambiguous overlaps (overlapping 3'UTR with other transcripts, regions contained within intron, ...), there are unambiguous tandem 3'UTR - without overlap or other ambiguities, which are nontheless discarded.
The text was updated successfully, but these errors were encountered: