You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The general setup is that we have a query and want to count the occurrence of all k-mers in all bins.
native uses the bulk_contains and uses the subscript operator to increase the counters of a counting vector. iterator uses bulk_contains and then iterates over the binning bitvector to increase the counters of a counting vector. get_int uses bulk_contains and then accesses batches of 64 bit via get_int to increase the counters of a counting vector. data uses bulk_contains and then accesses batches of 64 bit via data to increase the counters of a counting vector.
Acceptance Criteria
The seqan3::interleaved_bloom_filter exposes functionality similar to get_int.
Tasks
Adapt seqan3::interleaved_bloom_filter::binning_bitvector to allow access of multiple bits.
Definition of Done
Implementation and design approved
Unit tests pass
Test coverage = 100%
Microbenchmarks added and/or affected microbenchmarks < 5% performance drop
API documentation added
Tutorial/teaching material added
Test suite compiles in less than 30 seconds (on travis)
Changelog entry added
The text was updated successfully, but these errors were encountered:
rrahn
added
the
needs refinement
A story that was not discussed and/or estimated by the team yet but is planned for upcoming sprints.
label
Apr 30, 2020
And seqan3::binning_bit_vector::operator[] is inefficient on sequential access patterns which is bad for counting.
Proposals:
expose get_int: gives one integer e.g. 0b1010 where each bit position corresponds whether a k-mer was present or not.
expose data: similar as get_int, but the whole raw-data
Add a new data structure "counting_vector" (this is currently the only use-case) which has an operator+= overload for binning_bit_vector which can do the efficient counting.
We implement a counting_vector (name is up to discussion) which can do this use-case more efficient.
Add a documentation note for seqan3::binning_bit_vector::operator[] that this is bad for sequential access pattern and if someone has a different use-case than counting we can provide a better abstraction.
rrahn
added
ready to tackle
This story was discussed and can be immidietly tackled
and removed
needs refinement
A story that was not discussed and/or estimated by the team yet but is planned for upcoming sprints.
labels
Jul 6, 2020
Description
Provide functionality similar to
get_int
for theseqan3::interleaved_bloom_filter
:returns an
uint64_t
that includes the bits[idx, idx + len)
.This provides an efficient way to access multiple bits of the
seqan3::interleaved_bloom_filter
.I already did a benchmark of the different ways one might access the IBF:
The format of the benchmark names is
method / number of bins / size per bin
. All of them use 2 hash functions and sequences of length 1'000.Links:
The general setup is that we have a query and want to count the occurrence of all k-mers in all bins.
native
uses thebulk_contains
and uses the subscript operator to increase the counters of a counting vector.iterator
usesbulk_contains
and then iterates over the binning bitvector to increase the counters of a counting vector.get_int
usesbulk_contains
and then accesses batches of 64 bit viaget_int
to increase the counters of a counting vector.data
usesbulk_contains
and then accesses batches of 64 bit viadata
to increase the counters of a counting vector.Acceptance Criteria
seqan3::interleaved_bloom_filter
exposes functionality similar toget_int
.Tasks
seqan3::interleaved_bloom_filter::binning_bitvector
to allow access of multiple bits.Definition of Done
The text was updated successfully, but these errors were encountered: