-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get softmasked genome as output #166
Comments
Hello, yes! This functionality can be achieved using EDTA/util/
make_masked.pl
Please try it out and let me know if you have any question.
Best,
Shujun
…On Thu, Feb 25, 2021 at 3:22 AM romseg ***@***.***> wrote:
Dear author,
Is it possible to get softmasked genome instead of the hardmasked default?
Sometimes softmasking is required or recommended as input by other
annotator (other than Maker) or mapping programs. So it would be very
useful to have this option. Please if this option is not currently
available in Braker, I would appreciate to have your suggestions on how to
convert the hardmasked file to softmasked. Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#166>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNX4NDBR7NE5NEGU7QZTITTAVGYVANCNFSM4YFDKDBA>
.
|
The usage for 'make_masked.pl' is:
But I don't have the 'repeatmasker.out' file. Can I use the hardmasked EDTA output file 'genome.fa.new.masked' instead? Thanks for your help! |
You may find the rm out file in the anno folder.
Shujun
…On Fri, Feb 26, 2021 at 2:00 PM romseg ***@***.***> wrote:
The usage for 'make_masked.pl' is:
Usage: perl make_masked.pl -genome unmasked_genome.fa [options]
-rmout [file] Required. The repeatmasker.out file
But I don't have the 'repeatmasker.out' file. Can I use the hardmasked
EDTA output file 'genome.fa.new.masked' instead?
Thanks for your help!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#166 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNX4NBJWZRUJ2KNOAGJCK3TA42HPANCNFSM4YFDKDBA>
.
|
Oh, I see. I believe it is this one 'genome.fa.mod.EDTA.RM.out'. I would give it a try. Thanks for your help! :) Rom |
Hi Shujun, It did its job, but in addition to softmasking all sequences that was hardmasked in the original 'genome.fa.mod.MAKER.masked' (99Mbp), 'make_masked.pl' with 'genome.fa.mod.EDTA.RM.out' softmasked extra ~50Mbp (149Mbp). It softmasked extra short fragments and in many cases amplified the previously hardmasked fragments. I can't tell what these extra softmasked sequences are. I am wondering why the difference and which masking file version would be more useful for genome gene annotation (with Maker and/or Braker). At first glance the softmasked version generated with RM.out would seem more complete (149Mbp). Thanks! Best, |
Hi Rom,
The MAKER.masked file was lightly (under) masked to avoid masking genic
regions. Like you observed, short TEs won't be masked due to their close
distance to genes. If you use this file to perform gene predictions, you
will likely get some TEs in your results. Please check out the output
section of the manual for more info.
Best,
Shujun
…On Wed, Mar 3, 2021 at 6:15 AM romseg ***@***.***> wrote:
Hi Shujun,
It did its job, but in addition to softmasking all sequences that was
hardmasked in the original 'genome.fa.mod.MAKER.masked' (99Mbp), '
make_masked.pl' with 'genome.fa.mod.EDTA.RM.out' softmasked extra ~50Mbp
(149Mbp). It softmasked extra short fragments and in many cases amplified
the previously hardmasked fragments. I can't tell what these extra
softmasked sequences are. I am wondering why the difference and which
masking file version would be more useful for genome gene annotation (with
Maker and/or Braker). At first glance the softmasked version generated with
RM.out would seem more complete (149Mbp). Thanks!
Best,
Rom
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#166 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNX4NFAFQVKSLR52EWG3C3TBVPO5ANCNFSM4YFDKDBA>
.
|
Hi Shujun, That makes sense. It is good to avoid masking genic regions, especially for annotation. One final question on this masking topic, in the stats of my sum file I observed that 256191256 bp [256Mbp] (51.54% of the total length) is reported as bpMasked (please see below) since they were found as TE elements. This number is higher to the number of hardmasked bp in the MAKER.masked file (99Mbp) or the softmasked one I produced with the 'make_masked.pl' script (149Mbp). Is this difference also to avoid masking genic regions? At first glance it would seem a big downscale from 256 to 99Mbp, but maybe I am not interpreting the results reported in the sum file well. I would be grateful to have your thoughts. Thanks!
The best, |
Hi Rom, The sum file has all sequences of what EDTA believes as TEs. The MAKER.masked file is a subset of the sum file, which was produced by Best, |
Hi Shujun, It worked pretty good! Masking with genome.fa.mod.EDTA.anno/genome.fa.mod.EDTA.TEanno.out and the suggested parameters produced 254947490 softmasked bp, which is very close to the reported 256191256 bpMasked in the sum file of my genome. It is good to have all these masking alternatives for downstream processing. Thanks for the assistance and for designing EDTA! It is a great program that makes research so much easier. All my questions were answered and this thread can be closed. The best, |
Hi Shujun,
|
@dzaccook
Shujun |
Hi Shujun, |
I used make_masked.pl and the output results are all empty files. Has anyone encountered and guided the reason? Thank you very much |
@FengjuanjuanCMS you may need to check the repeatmasker output file provided to the |
@oushujun
Do I need to consider simple repeat sequences ? In addition, has the telomere sequence been passed through the above command by softmask ? |
Mostly just TEs. For gene annotation purpose you may want to unmask shorter
TEs(eg <500bp) to preserve the gene space. Check out the wiki.
Shujun
…On Wed, Jan 10, 2024 at 9:03 PM wanjie ***@***.***> wrote:
@oushujun <https://github.com/oushujun>
hi, shujun
I wonder if softmask.genome can be directly used for subsequent gene
structure annotation if I use the following command to sofrmask my genome.
perl ../util/make_masked.pl -genome genome.fa -minlen 80 -hardmask 0 -t 2 -rmout genome.fa.mod.EDTA.anno/genome.fa.mod.EDTA.TEanno.out
Do I need to consider simple repeat sequences ? In addition, has the
telomere sequence been passed through the above command by softmask ?
—
Reply to this email directly, view it on GitHub
<#166 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNX4NCTHFKOA76UG5BFF5TYN5B65AVCNFSM4YFDKDBKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBYGYYDOOBWGM4A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
1.
2.“4. Low-threshold TE masking: $genome.mod.MAKER.masked. This is a genome file with only long TEs (>=1 kb) being masked. You may use this for de novo gene annotations. In practice, this approach will reduce overmasking for genic regions, which can improve gene prediction quality. However, initial gene models should contain TEs and need further filtering. ” 3.
-------------------------From the above information, I think the following code is appropriate if I want to get the softmask genome for further annotation of de novo gene structure:
|
Hi Shujun, |
Dear author,
Is it possible to get softmasked genome instead of the hardmasked default? Sometimes softmasking is required or recommended as input by other annotator (other than Maker) or mapping programs. So it would be very useful to have this option. Please if this option is not currently available in Braker, I would appreciate to have your suggestions on how to convert the hardmasked file to softmasked. Thanks!
The text was updated successfully, but these errors were encountered: