Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load CDK: Expanded namespace coverage #52074

Merged
merged 1 commit into from
Feb 4, 2025

Conversation

johnny-schmidt
Copy link
Contributor

What

Adds tests for

  • null namespaces
  • special characters/operators in namespaces

@edgao the null namespaces break glue as well. Not sure if that's a possible scenario for you

Copy link

vercel bot commented Jan 22, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 4, 2025 5:09pm

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frifriSF59 is adding glue support for null namespace in #52068, so this is actually great timing

changes make sense, I think we probably need more functionality in the test to make this usable for most of our destinations though

@@ -460,12 +460,14 @@ abstract class BasicFunctionalityIntegrationTest(
)
val stream1 = makeStream(randomizedNamespace + "_1")
val stream2 = makeStream(randomizedNamespace + "_2")
val streamWithNullNamespace = makeStream(null, randomizedNamespace + "_stream")
runSync(
configContents,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be some sort of setDefaultNamespace(config, randomizedNamespace + "_default_namespace")-ish thing

(s3 supports null namespaces out of the box, but db/dw destinations have a config option to set the default schema/database/whatever)

Copy link
Contributor Author

@johnny-schmidt johnny-schmidt Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the default namespace is set by the destination when the provided namespace is null?

If so, it should be null here, but we'll need a DefaultNamespaceProvider to do the test data diffing?

Something like
dumpAndDiffRecords(..., streamWithNullNamespace.copy(descriptor = streamWithNullNamespace.descriptor.copy(namespace = defaultNamespaceProvider.get(randomizedNamespace)), ...)?

Copy link
Contributor

@edgao edgao Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep it's done by the destination connector. e.g. in destination-postgres, the stream namespace turns into a postgres schema. So there's a "default schema" config option, for when a stream has null namespace

Not sure I see the need for a DefaultNamespaceProvider though - i.e. couldn't we hardcode it in the test? (setDefaultNamespace maybe being an overridable function, or an injected object, or something)

val stream = DestinationStream(Descriptor(namespace = null, name = "test_stream"), ...)
runSync(
  config = setDefaultNamespace(config, randomizedNamespace + "_default_namespace"),
  stream = stream,
  ...
)
dumpAndDiff(
  stream.copy(
    descriptor = stream.descriptor.copy(namespace = randomizedNamespace + "_default_namespace")
  ), ...
)

(also setDefaultNamespace maybe can happen during the test class's init block, to avoid people forgetting to do it)

@johnny-schmidt johnny-schmidt force-pushed the jschmidt/load-cdk/missing-test-cases-reenable-funky branch from 60d2d64 to 41a3ca8 Compare January 23, 2025 18:21
@johnny-schmidt johnny-schmidt force-pushed the jschmidt/load-cdk/tests-for-null-namespaces branch from b7cf463 to b1e58e8 Compare January 23, 2025 18:23
@johnny-schmidt johnny-schmidt force-pushed the jschmidt/load-cdk/tests-for-null-namespaces branch from b1e58e8 to 403abba Compare January 23, 2025 18:42
Base automatically changed from jschmidt/load-cdk/missing-test-cases-reenable-funky to master January 23, 2025 18:49
@johnny-schmidt johnny-schmidt force-pushed the jschmidt/load-cdk/tests-for-null-namespaces branch from 403abba to a694da5 Compare January 23, 2025 19:03
@johnny-schmidt johnny-schmidt enabled auto-merge (squash) January 23, 2025 19:03
@johnny-schmidt johnny-schmidt force-pushed the jschmidt/load-cdk/tests-for-null-namespaces branch from a694da5 to 7792525 Compare February 3, 2025 17:34
Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test needs to not change the stream name? otherwise lgtm

(also had a question, looking for your thoughts on how to

@@ -466,12 +469,14 @@ abstract class BasicFunctionalityIntegrationTest(
)
val stream1 = makeStream(randomizedNamespace + "_1")
val stream2 = makeStream(randomizedNamespace + "_2")
val streamWithDefaultNamespace = makeStream(null, randomizedNamespace + "_stream")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to change the stream name? I thought the intent of this test was to test that the destination can handle streams with different namespace but the same name, i.e. they should all use test_stream as the name

@@ -53,6 +54,9 @@ abstract class S3V2WriteTest(
expectedRecordMapper,
additionalMicronautEnvs = S3V2Destination.additionalMicronautEnvs,
micronautProperties = S3V2TestUtils.assumeRoleCredentials.asMicronautProperties(),
defaultNamespaceProvider = object: DefaultNamespaceProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: for now - I'll probably do some refactoring in this region later

I.e. as part of https://github.com/airbytehq/airbyte-internal-issues/issues/11543, I can probably combine the config-modifier with this thing. And then we would have two impls provided by the CDK, rather than one per destination. Curious what you think of approximately this:

class NullNamespaceIsValid {
  fun setDefaultNs(config, randomNs) {}
  fun getDefaultNs(randomNs) = null
}

abstract class NullNamespaceIsInvalid {
  const val DEFAULT_NS_PREFIX = "_defaultns"
  // contract is to set the default namespace to randomNs + DEFAULT_NS_PREFIX
  abstract fun setDefaultNs(config, randomNs)
  fun getDefaultNs(randomNs) = randomNs + DEFAULT_NS_PREFIX
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why two? What's gained by this versus having one method that's just saying "tell us what to do when the namespace is null?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b/c the missing functionality is telling the connector what to do when the namespace is null. I.e. being able to modify the config.

and at that point, I think there's exactly two valid behaviors?

  1. the s3 way (default NS is just null, and modifying the config is a noop)
  2. the db/dw way (default NS is based on the randomized NS, and modifying the config involves doing some JSON munging)

@johnny-schmidt johnny-schmidt force-pushed the jschmidt/load-cdk/tests-for-null-namespaces branch from 7792525 to 228f44d Compare February 3, 2025 22:31
@johnny-schmidt johnny-schmidt enabled auto-merge (squash) February 3, 2025 22:34
@johnny-schmidt johnny-schmidt force-pushed the jschmidt/load-cdk/tests-for-null-namespaces branch from 228f44d to 9f028fe Compare February 4, 2025 16:45
@johnny-schmidt johnny-schmidt force-pushed the jschmidt/load-cdk/tests-for-null-namespaces branch from 9f028fe to 8780708 Compare February 4, 2025 16:56
@johnny-schmidt johnny-schmidt force-pushed the jschmidt/load-cdk/tests-for-null-namespaces branch from 8780708 to 4069e84 Compare February 4, 2025 16:56
@johnny-schmidt johnny-schmidt merged commit 51a07bb into master Feb 4, 2025
30 checks passed
@johnny-schmidt johnny-schmidt deleted the jschmidt/load-cdk/tests-for-null-namespaces branch February 4, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants