Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-oracle: ingestion #2907

Closed
antoniivanov opened this issue Nov 15, 2023 · 2 comments · Fixed by #2927
Closed

vdk-oracle: ingestion #2907

antoniivanov opened this issue Nov 15, 2023 · 2 comments · Fixed by #2927
Assignees
Labels
story Task for an Epic
Milestone

Comments

@antoniivanov
Copy link
Collaborator

antoniivanov commented Nov 15, 2023

Support job_input.send_object_for_ingestion(method="oracle") or send_tabular_data_for_ingestion(method='oracle')

If user passes method=oracle then the data would be insert in pre-configured oracle instance

So what needs to be done:

@DeltaMichael DeltaMichael added the story Task for an Epic label Nov 17, 2023
@DeltaMichael
Copy link
Contributor

DeltaMichael commented Nov 22, 2023

Environment

Functional test running in PyCharm using debug mode.
Oracle Autonomous DB in Oracle Cloud

Load test result

Records Time Memory
100 000 10.4s 30Mb
1 000 000 60.76s 250Mb
10 000 000 568.58s 2.5GB

A simple ingestion job was used for testing and measured using the time and tracemalloc python modules.

    payload_with_types = {
        "str_data": "string",
        "int_data": 12,
        "float_data": 1.2,
        "bool_data": True,
        # "timestamp_data": datetime.datetime.fromtimestamp(1700554373),
        # "decimal_data": Decimal(0.1),
    }

    for i in range(10000000):
        payload = payload_with_types.copy()
        payload["int_data"] = i
        job_input.send_object_for_ingestion(
            payload=payload, destination_table="test_table"
        )

@DeltaMichael
Copy link
Contributor

DeltaMichael commented Nov 22, 2023

Note that the above is not the worst case scenario. The worst case scenario would be something like ingesting one million objects with the same schema, but randomized keys, e.g. one object would get a random number of keys from the schema. Let's say if we have an object with five keys, only 20% of objects will have all five keys, the others will have between one and four. This will trigger batching of the ingestion queries based on the keysets. Ingestion rows with the same keyset are batched together. This increases the number of queries, so it should theoretically be slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
story Task for an Epic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants