vdk-oracle: ingestion #2907

antoniivanov · 2023-11-15T17:14:54Z

Support job_input.send_object_for_ingestion(method="oracle") or send_tabular_data_for_ingestion(method='oracle')

If user passes method=oracle then the data would be insert in pre-configured oracle instance

So what needs to be done:

Create IngestToOracle which implements IIngesterPlugin
50% of the implementation is already done actually - in this branch - https://github.com/vmware/versatile-data-kit/blob/feature/vdk-oracle/projects/vdk-plugins/vdk-oracle/src/vdk/plugin/oracle/ingest_to_oracle.py . So use it as a starting point
Create functional tests that have an ingestion job . Make sure to cover all possible data types (that is have columns in the dataset with all types include datetime and decimal!!!) . Example job could be https://github.com/vmware/versatile-data-kit/blob/main/projects/vdk-plugins/vdk-trino/tests/test_ingest_to_trino.py . The tests should cover both cases into non-existing table (so auto-create works) and existing table (so it's updated)
Load test (can be manual one time , ad-hoc, no need for special tools, even could be functional test). Record results in this ticket), Try to ingest 1,000,000 and 10 million rows, record time and memory consumption.
Update README.md to indicate ingestion support and example (short code snippet)

The text was updated successfully, but these errors were encountered:

DeltaMichael · 2023-11-22T20:48:32Z

Environment

Functional test running in PyCharm using debug mode.
Oracle Autonomous DB in Oracle Cloud

Load test result

Records	Time	Memory
100 000	10.4s	30Mb
1 000 000	60.76s	250Mb
10 000 000	568.58s	2.5GB

A simple ingestion job was used for testing and measured using the time and tracemalloc python modules.

    payload_with_types = {
        "str_data": "string",
        "int_data": 12,
        "float_data": 1.2,
        "bool_data": True,
        # "timestamp_data": datetime.datetime.fromtimestamp(1700554373),
        # "decimal_data": Decimal(0.1),
    }

    for i in range(10000000):
        payload = payload_with_types.copy()
        payload["int_data"] = i
        job_input.send_object_for_ingestion(
            payload=payload, destination_table="test_table"
        )

DeltaMichael · 2023-11-22T20:54:06Z

Note that the above is not the worst case scenario. The worst case scenario would be something like ingesting one million objects with the same schema, but randomized keys, e.g. one object would get a random number of keys from the schema. Let's say if we have an object with five keys, only 20% of objects will have all five keys, the others will have between one and four. This will trigger batching of the ingestion queries based on the keysets. Ingestion rows with the same keyset are batched together. This increases the number of queries, so it should theoretically be slower.

murphp15 assigned DeltaMichael Nov 16, 2023

DeltaMichael added the story Task for an Epic label Nov 17, 2023

DeltaMichael added this to the VDK Oracle milestone Nov 22, 2023

DeltaMichael linked a pull request Nov 22, 2023 that will close this issue

vdk-oracle: create oracle plugin #2927

Merged

DeltaMichael closed this as completed in #2927 Nov 24, 2023

This was referenced Nov 27, 2023

vdk-oracle: optimize batching of payload rows with different keysets #2931

Closed

vdk-oracle: further load testing #2933

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vdk-oracle: ingestion #2907

vdk-oracle: ingestion #2907

antoniivanov commented Nov 15, 2023 •

edited

Loading

DeltaMichael commented Nov 22, 2023 •

edited

Loading

DeltaMichael commented Nov 22, 2023 •

edited

Loading

vdk-oracle: ingestion #2907

vdk-oracle: ingestion #2907

Comments

antoniivanov commented Nov 15, 2023 • edited Loading

DeltaMichael commented Nov 22, 2023 • edited Loading

DeltaMichael commented Nov 22, 2023 • edited Loading

antoniivanov commented Nov 15, 2023 •

edited

Loading

DeltaMichael commented Nov 22, 2023 •

edited

Loading

DeltaMichael commented Nov 22, 2023 •

edited

Loading