Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Giraffe-produced GAM cannot be converted to GAF or surjected #3356

Closed
briannadon opened this issue Jul 28, 2021 · 4 comments
Closed

Giraffe-produced GAM cannot be converted to GAF or surjected #3356

briannadon opened this issue Jul 28, 2021 · 4 comments

Comments

@briannadon
Copy link

1. What were you trying to do?

  1. Create a graph from multiple alignment of 100 reference bacterial genomes using pggb. This completes succesfully and vg validate says the graph is fine, as does manual inspection.
  2. Create giraffe indices from that GFA graph to allow for mapping. This appears to complete successfully.
  3. Using giraffe, map simulated reads from 5 of the reference genomes back to the graph to determine accuracy - the question is do reads from e.g. "reference_genome_A" map back to the nodes that correspond to the "reference_genome_A" path in the graph. Again, completes successfully and very quickly.
  4. Convert the GAM from step 3 into GAF, or vg surject the GAM to inspect alignments. This fails. The GAM looks fine and can be vg view'd just fine.

2. What did you want to happen?
Step 4 to complete successfully.

3. What actually happened?
running my command e.g.:
vg convert -G sample_A_alignment.gam 100_genome_graph.hg
Throws:
terminate called after throwing an instance of 'std::out_of_range'
terminate called recursively (repeats many times)
what(): at: key not present
along with some segmentation fault errors.

Occasionally, it will also complain that a certain node is out of range or not in the graph. It appears maybe vg convert is looking for a node mentioned in an alignment that's not actually in the graph?

One of my alternative thoughts was to try to surject the alignment to get the answers i wanted, but vg surject threw very similar errors:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Attempted to get handle for node 1987909 not present in graph

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:
(This should be correct, hard to tell which stacktrace is right because I'm running on a cluster)

Crash report for vg v1.33.0 "Moscona"
Stack trace (most recent call last) in thread 22732:
#17   Object "", at 0xffffffffffffffff, in
#16   Object "/home/brian.nadon/local/vg", at 0x1c4f502, in __clone
#15   Object "/home/brian.nadon/local/vg", at 0x12a3098, in start_thread
#14   Object "/home/brian.nadon/local/vg", at 0x1b888bd, in gomp_thread_start
#13   Object "/home/brian.nadon/local/vg", at 0xb31bc3, in void vg::io::for_each_parallel_impl<vg::Alignment>(std::istream&, std::function<void (vg::Alignment&, vg::Alignment&)> const&, std::function<void (vg::Alignment&)> const&, std::function<bool ()> const&, unsigned long) [clone ._omp_fn.0]
#12   Object "/home/brian.nadon/local/vg", at 0x1b8b197, in gomp_team_barrier_wait_end
#11   Object "/home/brian.nadon/local/vg", at 0x1b8285b, in gomp_barrier_handle_tasks
#10   Object "/home/brian.nadon/local/vg", at 0xb31541, in void vg::io::for_each_parallel_impl<vg::Alignment>(std::istream&, std::function<void (vg::Alignment&, vg::Alignment&)> const&, std::function<void (vg::Alignment&)> const&, std::function<bool ()> const&, unsigned long) [clone ._omp_fn.1]
#9    Object "/home/brian.nadon/local/vg", at 0xb3225d, in main_surject(int, char**)::{lambda(vg::Alignment&)#7}::operator()(vg::Alignment&) const
#8    Object "/home/brian.nadon/local/vg", at 0x4fd1fd, in vg::Surjector::surject(vg::Alignment const&, std::unordered_set<handlegraph::path_handle_t, std::hash<handlegraph::path_handle_t>, std::equal_to<handlegraph::path_handle_t>, std::allocator<handlegraph::path_handle_t> > const&, bool, bool) const [clone .cold]
#7    Object "/home/brian.nadon/local/vg", at 0x1b98fb9, in _Unwind_Resume
#6    Object "/home/brian.nadon/local/vg", at 0x1b985ca, in _Unwind_RaiseException_Phase2
#5    Object "/home/brian.nadon/local/vg", at 0x1ad6d23, in __gxx_personality_v0
#4    Object "/home/brian.nadon/local/vg", at 0x1b765e8, in __cxa_call_terminate
#3    Object "/home/brian.nadon/local/vg", at 0x1ad736b, in __cxxabiv1::__terminate(void (*)())
#2    Object "/home/brian.nadon/local/vg", at 0x1ad8929, in __gnu_cxx::__verbose_terminate_handler()
#1    Object "/home/brian.nadon/local/vg", at 0x57c853, in abort
#0    Object "/home/brian.nadon/local/vg", at 0x12a647b, in raise

5. What data and command can the vg dev team use to make the problem happen?
Can't share data as it's too large. My script/workflow is below.

vg convert -a -g ${gname}.gfa > ${gname}.hg
vg index -x ${gname}.xg ${gname}.hg
vg autoindex -w giraffe -t 48 -T /local/scratch -g mygraph.gfa -p mygraph_giraffe
vg giraffe -t 48 -H mygraph.giraffe.giraffe.gbwt -g mygraph.giraffe.gg -m mygraph.giraffe.min -d mygraph.giraffe.dist -f simulated_reads1 -f simulated_reads2 > $gam
vg convert -G $gam $vg > $gaf
#that's where it fails

6. What does running vg version say?

vg version v1.33.0 "Moscona"
Compiled with g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 on Linux
Linked against libstd++ 20200808
Built by anovak@octagon
@jltsiren
Copy link
Contributor

The graph you use for mapping (mygraph.giraffe.gg) is not the same graph as the one you use in vg convert (is it ${gname}.xg or ${gname}.hg?). If there are nodes longer than ~1000 bp, vg autoindex automatically chops them and creates new shorter nodes that do not exist in the original graph.

You should be able to fix this by dropping the initial vg convert and vg index commands and adding option --request XG to the vg autoindex command.

@briannadon
Copy link
Author

That makes sense. The graph is the same, I just tried to edit the variables to be more readable but clearly missed some - they're all the same in the actual script. I didn't realize autoindex could be asked to do that, so I'll try that.

@briannadon
Copy link
Author

Per your advice, the new XG index does allow for successful conversion and surjection, so thanks for the recommendation. However, I do have a related question that's a bit less technical: does autoindex work/allow for creating giraffe haplotype indices directly from large, multiple-alignment-derived graphs with paths as opposed to haplotypes? I'm hoping to use the embedded genomes (i.e. paths) as the haplotypes with giraffe.

@jltsiren
Copy link
Contributor

jltsiren commented Aug 3, 2021

vg autoindex does not support that, and I don't know if we are going to add the support in the future. It's difficult to determine reliably whether the paths in the graph / P-lines in the GFA are reference paths or aligned sequences, because the file formats do not store such semantics. We currently assume that the paths are reference paths, and you can provide alternate interpretations in manual index construction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants