Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Add new encoding runlength v3 using Longs instead of Integers #408

Closed
wants to merge 150 commits into from

Conversation

astubbs
Copy link
Contributor

@astubbs astubbs commented Sep 7, 2022

From the javadoc:

Uses {@link Long} for it's underlying structure (in order to support output encoding option using Longs), in order to support the worst case scenarios without running out of run length, as {@link Integer#MAX_VALUE} of 2,147,483,647 (2 billion records) is not completely inconceivable, whereas {@link Long#MAX_VALUE} of 9,223,372,036,854,775,807 (~9 quintillion / 9 x 10^18) is.

Checklist

  • Documentation (if applicable)
  • Changelog

Blocked by:

dependabot bot and others added 30 commits May 20, 2022 16:33
Bumps `testcontainers.version` from 1.17.1 to 1.17.2.

Updates `testcontainers` from 1.17.1 to 1.17.2
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/master/CHANGELOG.md)
- [Commits](testcontainers/testcontainers-java@1.17.1...1.17.2)

Updates `kafka` from 1.17.1 to 1.17.2
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/master/CHANGELOG.md)
- [Commits](testcontainers/testcontainers-java@1.17.1...1.17.2)

Updates `junit-jupiter` from 1.17.1 to 1.17.2
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/master/CHANGELOG.md)
- [Commits](testcontainers/testcontainers-java@1.17.1...1.17.2)

Updates `postgresql` from 1.17.1 to 1.17.2
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/master/CHANGELOG.md)
- [Commits](testcontainers/testcontainers-java@1.17.1...1.17.2)

---
updated-dependencies:
- dependency-name: org.testcontainers:testcontainers
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.testcontainers:kafka
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.testcontainers:junit-jupiter
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.testcontainers:postgresql
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [versions-maven-plugin](https://github.com/mojohaus/versions-maven-plugin) from 2.10.0 to 2.11.0.
- [Release notes](https://github.com/mojohaus/versions-maven-plugin/releases)
- [Changelog](https://github.com/mojohaus/versions-maven-plugin/blob/master/ReleaseNotes.md)
- [Commits](mojohaus/versions@versions-maven-plugin-2.10.0...versions-maven-plugin-2.11.0)

---
updated-dependencies:
- dependency-name: org.codehaus.mojo:versions-maven-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps `vertx.version` from 4.3.0 to 4.3.1.

Updates `vertx-web-client` from 4.3.0 to 4.3.1

Updates `vertx-junit5` from 4.3.0 to 4.3.1

---
updated-dependencies:
- dependency-name: io.vertx:vertx-web-client
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: io.vertx:vertx-junit5
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [postgresql](https://github.com/pgjdbc/pgjdbc) from 42.3.5 to 42.4.0.
- [Release notes](https://github.com/pgjdbc/pgjdbc/releases)
- [Changelog](https://github.com/pgjdbc/pgjdbc/blob/master/CHANGELOG.md)
- [Commits](pgjdbc/pgjdbc@REL42.3.5...REL42.4.0)

---
updated-dependencies:
- dependency-name: org.postgresql:postgresql
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
- CVE-2020-8908: Files::createTempDir local information disclosure vulnerability #4011
- Only used transitively from tests, and is a deprecated function
- google/guava#4011
…rs.version-1.17.2' into improvements/version-bumps
…ql-postgresql-42.4.0' into improvements/version-bumps
….mojo-versions-maven-plugin-2.11.0' into improvements/version-bumps
New issues with WireMocks dep on Jetty BOM ~v9
New issues with WireMocks dep on Jetty BOM ~v9
…that

Especially given we also use dependabot. If anyone is concerned to that level, they can use the plugin in their end user applications.
…that

Especially given we also use dependabot. If anyone is concerned to that level, they can use the plugin in their end user applications.
…iremock fat dependency (slf4j/logback collisions)

Upgrades Wiremock from 2.33.2 to 2.34.0
# Conflicts:
#	parallel-consumer-core/src/main/java/io/confluent/csid/utils/MathUtils.java
#	parallel-consumer-core/src/main/java/io/confluent/csid/utils/Range.java
#	parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/offsets/OffsetSimultaneousEncoder.java
#	parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/offsets/RunLengthEncoder.java
# Conflicts:
#	parallel-consumer-core/src/main/java/io/confluent/csid/utils/Range.java
@astubbs astubbs requested a review from rkolesnev October 21, 2022 13:17
Copy link
Contributor Author

@astubbs astubbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

....

/**
* @author Antony Stubbs
*/
// todo rename - resolve name against RunLengthEncoder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RunLengthEncodingUtils or maybe consolidate into the RunLenghtEncoder if it really is just internal implementation methods?

@@ -112,11 +113,12 @@ static HighestOffsetAndIncompletes runLengthDecodeToIncompletes(OffsetEncoding e
// decodes incompletes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 109, 110 above:
Any reason only unrolling v1 and v2 buffers but not the new v3 one?

            v1ShortBuffer.rewind();
            v2IntegerBuffer.rewind();

if (runLength != shortCastRunlength)
throw new RunlengthV1EncodingNotSupported(msg("Runlength too long for Short ({} cast to {})", runLength, shortCastRunlength));
runLengthEncodedByteBuffer.putShort(shortCastRunlength);
}
case v2 -> {
runLengthEncodedByteBuffer.putInt(runLength);
runLengthEncodedByteBuffer.putInt(Math.toIntExact(runLength));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new exception type? RunlengthV2EncodingNotSupported or align both to use ArithmeticException - although i would prefer less generic exception... either Generic EncodingNotSupported with params for encoding type (i.e. bitset / runlength) and version (v1,v2 etc)
or as you have now - set of specific exceptions...

# Conflicts:
#	parallel-consumer-core/src/main/java/io/confluent/csid/utils/Range.java
#	parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/offsets/BitSetEncoder.java
#	parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/offsets/ByteBufferEncoder.java
#	parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/offsets/OffsetBitSet.java
#	parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/offsets/OffsetEncoder.java
#	parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/offsets/OffsetSimultaneousEncoder.java
#	parallel-consumer-core/src/main/java/io/confluent/parallelconsumer/offsets/RunLengthEncoder.java
#	parallel-consumer-core/src/test/java/io/confluent/parallelconsumer/ParallelEoSStreamProcessorTest.java
#	parallel-consumer-core/src/test/java/io/confluent/parallelconsumer/offsets/RunLengthEncoderTest.java
@what-the-diff
Copy link

what-the-diff bot commented Nov 21, 2022

  • Range.java
    • Added a new constructor for the class that takes in long as parameter instead of int, and changed all other constructors to take in long parameters too.
  • BitSetEncoder.java
  • Change the underlying data structure from Integer to Long
  • Add a new version of RunLengthEncoder (v3) which uses longs for it's encoding, and add tests for this

@eddyv
Copy link
Member

eddyv commented Jun 15, 2023

Closing - Stale.

@eddyv eddyv closed this Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants