Skip to content

v2.4.0

Compare
Choose a tag to compare
@emasab emasab released this 07 May 12:47
· 50 commits to master since this release
a4bf487

librdkafka v2.4.0 is a feature release:

  • KIP-848: The Next Generation of the Consumer Rebalance Protocol.
    Early Access: This should be used only for evaluation and must not be used in production. Features and contract of this KIP might change in future (#4610).
  • KIP-467: Augment ProduceResponse error messaging for specific culprit records (#4583).
  • KIP-516
    Continue partial implementation by adding a metadata cache by topic id
    and updating the topic id corresponding to the partition name (#4676)
  • Upgrade OpenSSL to v3.0.12 (while building from source) with various security fixes,
    check the release notes.
  • Integration tests can be started in KRaft mode and run against any
    GitHub Kafka branch other than the released versions.
  • Fix pipeline inclusion of static binaries (#4666)
  • Fix to main loop timeout calculation leading to a tight loop for a
    max period of 1 ms (#4671).
  • Fixed a bug causing duplicate message consumption from a stale
    fetch start offset in some particular cases (#4636)
  • Fix to metadata cache expiration on full metadata refresh (#4677).
  • Fix for a wrong error returned on full metadata refresh before joining
    a consumer group (#4678).
  • Fix to metadata refresh interruption (#4679).
  • Fix for an undesired partition migration with stale leader epoch (#4680).
  • Fix hang in cooperative consumer mode if an assignment is processed
    while closing the consumer (#4528).

Upgrade considerations

  • With KIP 467, INVALID_MSG (Java: CorruptRecordExpection) will
    be retried automatically. INVALID_RECORD (Java: InvalidRecordException) instead
    is not retriable and will be set only to the records that caused the
    error. Rest of records in the batch will fail with the new error code
    _INVALID_DIFFERENT_RECORD (Java: KafkaException) and can be retried manually,
    depending on the application logic (#4583).

Early Access

KIP-848: The Next Generation of the Consumer Rebalance Protocol

  • With this new protocol the role of the Group Leader (a member) is removed and
    the assignment is calculated by the Group Coordinator (a broker) and sent
    to each member through heartbeats.

    The feature is still not production-ready.
    It's possible to try it in a non-production enviroment.

    A guide is available
    with considerations and steps to follow to test it (#4610).

Fixes

General fixes

  • Issues: confluentinc/confluent-kafka-go#981.
    In librdkafka release pipeline a static build containing libsasl2
    could be chosen instead of the alternative one without it.
    That caused the libsasl2 dependency to be required in confluent-kafka-go
    v2.1.0-linux-musl-arm64 and v2.3.0-linux-musl-arm64.
    Solved by correctly excluding the binary configured with that library,
    when targeting a static build.
    Happening since v2.0.2, with specified platforms,
    when using static binaries (#4666).
  • Issues: #4684.
    When the main thread loop was awakened less than 1 ms
    before the expiration of a timeout, it was serving with a zero timeout,
    leading to increased CPU usage until the timeout was reached.
    Happening since 1.x.
  • Issues: #4685.
    Metadata cache was cleared on full metadata refresh, leading to unnecessary
    refreshes and occasional UNKNOWN_TOPIC_OR_PART errors. Solved by updating
    cache for existing or hinted entries instead of clearing them.
    Happening since 2.1.0 (#4677).
  • Issues: #4589.
    A metadata call before member joins consumer group,
    could lead to an UNKNOWN_TOPIC_OR_PART error. Solved by updating
    the consumer group following a metadata refresh only in safe states.
    Happening since 2.1.0 (#4678).
  • Issues: #4577.
    Metadata refreshes without partition leader change could lead to a loop of
    metadata calls at fixed intervals. Solved by stopping metadata refresh when
    all existing metadata is non-stale. Happening since 2.3.0 (#4679).
  • Issues: #4687.
    A partition migration could happen, using stale metadata, when the partition
    was undergoing a validation and being retried because of an error.
    Solved by doing a partition migration only with a non-stale leader epoch.
    Happening since 2.1.0 (#4680).

Consumer fixes

  • Issues: #4686.
    In case of subscription change with a consumer using the cooperative assignor
    it could resume fetching from a previous position.
    That could also happen if resuming a partition that wasn't paused.
    Fixed by ensuring that a resume operation is completely a no-op when
    the partition isn't paused.
    Happening since 1.x (#4636).
  • Issues: #4527.
    While using the cooperative assignor, given an assignment is received while closing the consumer
    it's possible that it gets stuck in state WAIT_ASSIGN_CALL, while the method is converted to
    a full unassign. Solved by changing state from WAIT_ASSIGN_CALL to WAIT_UNASSIGN_CALL
    while doing this conversion.
    Happening since 1.x (#4528).

Checksums

Release asset checksums:

  • v2.4.0.zip SHA256 24b30d394fc6ce5535eaa3c356ed9ed9ae4a6c9b4fc9159c322a776786d5dd15
  • v2.4.0.tar.gz SHA256 d645e47d961db47f1ead29652606a502bdd2a880c85c1e060e94eea040f1a19a