Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulk-cdk-toolkit-extract-cdc: API changes #52040

Merged
merged 8 commits into from
Jan 30, 2025

Conversation

postamar
Copy link
Contributor

@postamar postamar commented Jan 20, 2025

What

This PR makes substantial changes to the API of the extract-cdc toolkit of the Bulk CDK. Thanks @stephane-airbyte for the actionable feedback.

This PR also adds richer data types to better handle invalid debezium states, which can trigger resetting the sync.

How

The DebeziumInput and DebeziumState types are gone, DebeziumOperations methods are adjusted accordingly and broken down into smaller pieces. Changes then propagate to CdcPartitionsCreator and CdcPartitionReader, and then further down to the tests.

Review guide

Follow the order outlined in the previous paragraph.

User Impact

None.

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Copy link

vercel bot commented Jan 20, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 30, 2025 6:33pm

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues CDK Connector Development Kit connectors/source/mysql labels Jan 20, 2025
@postamar postamar changed the base branch from master to postamar/source-mysql-pin-cdk-version January 20, 2025 21:28
@@ -113,8 +138,4 @@ class CdcPartitionsCreator<T : Comparable<T>>(
log.info { "Current position '$lowerBound' does not exceed target position '$upperBound'." }
return listOf(partitionReader)
}

companion object {
var CDCNeedsRestart: Boolean = false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been replaced by an AtomicBoolean in the cdc partitions creator factory class.

@@ -298,7 +302,6 @@ class CdcPartitionReader<T : Comparable<T>>(
}

enum class CloseReason(val message: String) {
TIMEOUT("timed out"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wasn't being used

startingOffset = syntheticOffset
startingSchemaHistory = null
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff here is noisy, as it should be, but the changes really aren't that substantial.

)

/** Debezium Engine output, other than records of course. */
data class DebeziumState(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two types just weren't used that much and it seems that getting rid of them does improve things.


data class AbortDebeziumWarmStartState(val reason: String) : InvalidDebeziumWarmStartState

data class ResetDebeziumWarmStartState(val reason: String) : InvalidDebeziumWarmStartState
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This abort vs reset failure handling is going to be present in every connector with CDC if it isn't the case already, so it makes sense to recognize it in the CDK. Previously OffsetInvalidNeedsResyncIllegalStateException was used as a stopgap but using exceptions for control flow can be confusing.

@postamar postamar force-pushed the postamar/bulk-cdk-extract-cdc-improvements branch from 37e4428 to 1565204 Compare January 20, 2025 21:50
@octavia-squidington-iii octavia-squidington-iii removed the area/connectors Connector related issues label Jan 20, 2025
@postamar postamar marked this pull request as ready for review January 21, 2025 17:37
@postamar postamar requested a review from a team as a code owner January 21, 2025 17:37
Base automatically changed from postamar/source-mysql-pin-cdk-version to master January 21, 2025 18:59
@postamar postamar merged commit 7c28d4e into master Jan 30, 2025
26 checks passed
@postamar postamar deleted the postamar/bulk-cdk-extract-cdc-improvements branch January 30, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit connectors/source/mysql
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants