Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better readInt take two #31

Merged
merged 1 commit into from
Oct 5, 2020

Conversation

vdukhovni
Copy link
Contributor

@vdukhovni vdukhovni commented Sep 30, 2020

This version still outperforms the original by ~4-5x, and now implements
correct overflow checks for the full range of Int values.

A "reasonable" number of leading zeros is supported after an optional
explicit sign, but once ~32k of leading zeros have been seen, no further
chunks will be processed and the read fails (returns Nothing as the
Maybe Int payload).

It also adds skipSomeWS which skips a "reasonable" quantity of
leading whitespace before giving up if more than ~32k of whitespace
is encountered.

The new w8IsSpace function is somewhat out of place here, but it
is more efficient than any stock versions I know of in any other
module (barring writing some FFI code for this). It could be
kept internal, with folks who want to roll their own skipping of
leading whitespace also providing their own implementation of the
predicate...

[ EDIT: got rid of the readIntSkip function, turns out just providing skipSomeWS separately actually works just as well, with no need to provide the composition. ]

@vdukhovni
Copy link
Contributor Author

Squashed the fixup (older Preludes don't have (<>) built-in). So the CI should now be green.

So please let me know which of #29 or #31 (this PR) you'd like to progress.

CC: @chessai, @hsyl20, @Bodigrim, @sjakobi, @cartazio

@sjakobi
Copy link

sjakobi commented Sep 30, 2020

@vdukhovni Would you mind pinging me less often about things related to streaming-bytestring? I'm currently not very interested in this package. Thanks! :)

@vdukhovni
Copy link
Contributor Author

@vdukhovni Would you mind pinging me less often about things related to streaming-bytestring? I'm currently not very interested in this package. Thanks! :)

Sure (sorry about a final notification this will cause). The reason I thought this is relevant is that there's an open discussion re readInt wrt. bytestring and overflow detection, ... so I thought there'd be some relevance. Apologies. Will honour your request going forward...

Data/ByteString/Streaming/Char8.hs Show resolved Hide resolved
Data/ByteString/Streaming/Char8.hs Show resolved Hide resolved
-- on a never-ending stream of whitespace from an untrusted source.
-- For unconditional dropping of all leading whitespace, use `dropWhile`
-- with a suitable predicate.
skipSomeWS :: Monad m => ByteString m r -> ByteString m r

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unfamiliar with streaming-bytestring, but do you need to export this function? Is it expected to be generally useful?

Copy link
Contributor Author

@vdukhovni vdukhovni Sep 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that skipping whitespace is generally useful, and limiting the quantity skipped makes it an interesting new element in the toolkit. Perhaps @fosskers can comment on whether skipSomeWS makes sense to export or not. It is needed for bulk-reading a stream of whitespace-separated integer data, for example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that detecting whitespace is tricky to get right (see w8IsSpace), I think it's good for us to export skipSomeWS.

Data/ByteString/Streaming/Char8.hs Show resolved Hide resolved
tests/Test.hs Show resolved Hide resolved
-- on a never-ending stream of whitespace from an untrusted source.
-- For unconditional dropping of all leading whitespace, use `dropWhile`
-- with a suitable predicate.
skipSomeWS :: Monad m => ByteString m r -> ByteString m r
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that detecting whitespace is tricky to get right (see w8IsSpace), I think it's good for us to export skipSomeWS.

Data/ByteString/Streaming/Char8.hs Outdated Show resolved Hide resolved
This version outperforms the original by ~4-5x, and implements correct
overflow checks for the full range [minBound..maxBound] of `Int` values.

A "reasonable" number of leading zeros is supported after an optional
explicit sign, but once ~32k (defaultChunkSize) of leading zeros have
been seen, no further chunks will be processed and the read fails
(returns `Nothing` as the `Maybe Int` payload).

It also adds a `skipSomeWS` function, which skips a "reasonable"
quantity of leading whitespace, but stops at the first chunk boundary,
having consumed at least ~32k (defaultChunkSize) whitespace bytes.
@vdukhovni
Copy link
Contributor Author

Many thanks! Just pushed a squashed version.

@vdukhovni
Copy link
Contributor Author

Many thanks! Just pushed a squashed version.

It is the same as what you reviewed, just squashed into one commit. So I think this is ready to be merged...

@fosskers fosskers merged commit 6f111ab into haskell-streaming:master Oct 5, 2020
@vdukhovni
Copy link
Contributor Author

Thanks!

@fosskers
Copy link
Collaborator

fosskers commented Oct 5, 2020

Prepping for release soon, once I get mine merged in too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants