-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long Read Giraffe #3700
Long Read Giraffe #3700
Conversation
…to alignment elaboration
Merge commit 'bffdd27a300a2669df6469025b5660077666def8' into lr-giraffe
This is updating |
The |
I managed to crash this on read pair |
The build doesn't work on the MacOS 11 image because of missing Protobuf symbols.
It could be that there's somehow a Protobuf version mismatch, even though we should be using an OS-version-specific cache here. We're supposedly building against Protobuf 3.21.2 (which Brew is calling 21.2), but Protobug 3.21.3 came out yesterday and the same day someone on StackOverflow started reporting this error |
Protobug = Freudian slip? |
I think we hit protocolbuffers/protobuf#9947 where this symbol doesn't appear in release ( I think maybe in the last couple days the bad Protobuf releases hit Homebrew, everybody suddenly cared, and the bug was actually fixed. The 3.21.3 release is supposed to fix this problem, so we need Protobuf 3.21.3, or else an old ~3.19 one as in tuplex/tuplex#119. 3.21.3 is now in Homebrew according to Homebrew/homebrew-core#106252 so I think I might just need to rerun? |
🐛 |
I'm breaking off the Mac CI changes into #3708, since they seem to actually be hard and the 10.15 brownout has been stopped. |
Changelog Entry
To be copied to the draft changelog by merger:
vg giraffe
can now--align-from-chains
to do long-read alignmentsmake bin/unittest/<test_file_name>
to build a dynamically-linked binary for just one file of unit tests, for faster iteration.Description
This is my sketch for integrating @xchang1's distance index 2 and @StephenHwang's minimizer selection for long reads, with some chaining logic that uses @jltsiren's WFA-against-GBWT aligner to connect the minimizers together.
The chaining logic is somewhat trivial and refuses to skip even a single minimizer hit, and the WFA-against-GBWT alignment is being limited to at most a relatively small score even if the sequence being aligned is quite long.
Using this, I managed to get through 10k reads from the HiFi read set we've been working with in about 15 minutes. Most of the reads took about 0.02 thread-seconds each, but the slowest 50 were:
@xchang1 Does this need to be updated with more of your DI2 code before it gets merged?
@jltsiren I touched a bunch of the data structures in the WFAExtender trying to improve the algorithmics and address where profiling saw all my time going. In addition to just coalescing multiple graph nodes into one WFANode, I changed from a sorted list to a hashtable to hold the wavefronts on each WFANode (for O(1) insert/lookup when it gets huge), and I changed the way WFAPoint stores its path to try and avoid constantly allocating and deallocating deque stuff in the
std::stack
that was there before (which was taking up almost all the time according to my profile). Have I messed anything up that you want me to try and roll back?