Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check interval start to avoid overflowing bin numbers #1774

Merged
merged 1 commit into from
May 2, 2024

Conversation

jmarshall
Copy link
Member

As noted on samtools/samtools#2032 (which includes test queries exhibiting the problem):

Additionally with this data file samtools goes into an infinite(?) loop when given [a particular location / query interval]: […]

This can be traced to the do … while loop in hts_itr_query(), which loops perhaps forever through negative bin numbers. Possibly this code should check for a beg value massively beyond the maximum position covered by the index's bins and sidestep most of its processing.

This PR checks start positions of query intervals against the maximum position representable in the index's geometry, to avoid negative bin numbers and the resulting infinite loops in the do...while loop. It's effectively the beg equivalent of #1595.

I considered implementing the “massively beyond” bit by e.g. adding 256 as various parts of the code do for somewhat different reasons. However tracing the guarded code in hts_itr_query() and hts_itr_multi_bam() (and looking at the checks for data getting into the index at all) suggests that the result would be a finished / unchanged iterator for beg immediately past maxpos anyway — so having the bound exactly at maxpos is fine.

Introduces hts_bin_maxpos() and hts_idx_maxpos(), and uses them wherever the maxpos calculation appears. I've left the latter private, at least for now.

This also changes the existing end checks to <= as end is exclusive -- note it is used as end-1 in the code guarded by the checks. In practice this probably won't make much difference to anything.

Check start positions of query intervals against the maximum position
representable in the index's geometry, to avoid negative bin numbers
and the resulting infinite loops in the do...while loop.

Introduce hts_bin_maxpos() and hts_idx_maxpos(), and use them wherever the
maxpos calculation appears. (Leave the latter private, at least for now.)

Also change the existing end checks to <= as end is exclusive -- note it
is used as end-1 in the code guarded by the checks.
@jkbonfield
Copy link
Contributor

Thanks John.

@jkbonfield jkbonfield merged commit 9a99a1d into samtools:develop May 2, 2024
9 checks passed
@jmarshall jmarshall deleted the check-beg branch May 8, 2024 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants