Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build library for hg38 #22

Closed
biofisherman opened this issue Aug 2, 2019 · 10 comments
Closed

build library for hg38 #22

biofisherman opened this issue Aug 2, 2019 · 10 comments

Comments

@biofisherman
Copy link

Hi there,

When I try to build the library by hg38, I download all the related data as instructed in the protocol.
Then I used the following code for building:
qapa build -N --db ensembl_identifiers.txt gencode.basic.txt > output.utrs.bed

I found the fourth column contains "hg19", when I used this file to extract sequence using the folling code qapa fasta -f Homo_sapiens.GRCh38.dna.primary_assembly.fa output_utrs.bed output_sequences.fa
I got a empty output_sequences.fa file.

Besides, I also tried to build the library using new poly_A.bed download from https://polyasite.unibas.ch/atlas#2(hg38) and then used the code qapa build --db ensembl_identifiers.txt -o clusters.bed gencode_hg38.txt > output_utrs_2.bed

There is a returned error message: Error message was:


***** ERROR: Requested column 4, but database file /tmp/pybedtools.11y88d0u.tmp only has fields 1 - 0.

Please advance.

Thanks,

Weiyan

@kcha
Copy link
Collaborator

kcha commented Aug 8, 2019

Does your Homo_sapiens.GRCh38.dna.primary_assembly.fa file contain chromosome IDs that match those inside your output_utrs.bed file? It could be that it wasn't able to retrieve your sequences because the chromosome IDs do not match. The "hg19" text is just a string descriptor that QAPA guesses based on Ensembl ID. It doesn't actually consider it when building the library. You can make it print "hg38" by setting the option --species=hg38.

For the new polyaiste BED file, try using the option -p instead of -o to properly parse the file.

@biofisherman
Copy link
Author

Thank you very much for your reply.

  1. I used the hg38.fa downloaded from UCSC and it worked.
  2. When I use the option -p it still not working. Is it okay I ignore the polyA annotation file and just run qapa build -N --db ensembl_identifiers.txt gencode.basic.txt > output.utrs.bed? Not sure is it possible you provide a hg38 version file since most of the data are based on hg38.
  3. For the deltaPPAU analysis between samples, how you calculated this value? based on mean of PAU or TPM?
    Best,

Weiyan

@kcha
Copy link
Collaborator

kcha commented Aug 24, 2019

Yes you can skip the polyA annotation file step. I don't have a hg38 version at this time.

For deltaPPAU, it was based on the median of PAU.

@biofisherman
Copy link
Author

@kcha Thank you very much for all your help.

@kcha
Copy link
Collaborator

kcha commented Sep 5, 2019

Re-opening this issue as I will try to look into adding support for the hg38 version of polyAsite in a future release.

@kcha kcha reopened this Sep 5, 2019
@imcoleman
Copy link

I am having a different issue when trying to build an hg38 polyA site annotation. I used the mysql commands as provided to get Ensemble gene metadata table from Biomart and GENCODE gene prediction annotation table, and GENCODE polyA sites track, along with polyA site annotation from PolyASite. However, when I run the build command to create the 3' UTR library the output file (output_utrs.bed) file is empty.
#---------
qapa build --db ~/qapa/hg38/ensembl_identifiers.txt -g ~/qapa/hg38/gencode.polyA_sites.bed -p ~/qapa/hg38/clusters.hg38.bed ~/qapa/hg38/gencode.basic.txt > ~/qapa/hg38/output_utrs.bed
#---------
This is the final error:
[qapa] Annotating 3' UTRs
[qapa] Error: invalid literal for int() with base 10: '0.2690'
[qapa] Finished!

I tried replacing some of the int() with float() in the annotate.py script but the issue did not resolve.

@kcha
Copy link
Collaborator

kcha commented Sep 27, 2019

Hi @imcoleman, there is an issue with using PolyASite version 2 as the format changed. This will be addressed in an unpcoming release that I'm slowly working on. Thanks!

@kcha kcha mentioned this issue Oct 2, 2019
@kcha
Copy link
Collaborator

kcha commented Oct 2, 2019

Support for PolyASite version 2 is now available. Please upgrade to the latest release (v1.3.0). Thanks!

@imcoleman
Copy link

@kcha Thanks! I will try out v1.3.0.

@imcoleman
Copy link

@kcha Just a quick comment that v1.3.0 has resolved the issue and I was able to process several groups of samples, human and mouse. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants