You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We still want to be able to batch queries using cursor.executemany(). This is not feasible when the keysets differ, because we're substituting in an insert query with a static number of columns, e.g.
f"INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({', '.join([':' + str(i + 1) for i in range(len(columns))])})"
We call this for each ingestion payload. This means that the payload should be uniform, e.g. have the same keys and same number of values for every row (object). However, this contradicts our desired use case. We've solved this by further batching the payload by key set and then doing separate cursor.executemany() calls for each key set.
There are a few drawbacks to this approach.
We create a frozenset for each row. Depending on the number of columns, this could be a problem, because it's an O(n) operation.
Frozen sets are hashable, but depending on the hash function, there is probably a better option to use as a key. This should be researched further.
We might be constrained by linear time, because batching requires seeing at least every row. Making the data uniform maybe does not require seeing every value of every row, so there's room for optimization.
Proposed solution
A good alternative approach might be to get the sum of all key sets in the payload (all possible columns). Then, for each row, if there are missing keys, we just set them to null and do a single cursor.executemany().
Acceptance criteria
Decide if this is worth optimizing
Implement optimization
Measure results
The text was updated successfully, but these errors were encountered:
## Why?
In order to support more use cases, vdk should support connecting and
ingesting to an oracle database
## What?
Add oracle plugin. Plugin supports simple queries, cli queries and
ingestion.
## How was this tested?
Local functional tests, CI tests are part of a separate task
## What kind of change is this?
Feature/non-breaking
## Follow-up
[Set up testcontainers for
CI](#2928)
[Support type inference when
ingesting](#2929)
[Support passing math.nAn and None for
ingestion](#2930)
[Optimize batching of payload rows with different
keysets](#2931)
[ORA-01002: fetch out of sequence error in _cache_tables when some rows
fail to
ingest](#2932)
[Further load
testing](#2933)
[Investigate possible
segfaults](#2934)
Signed-off-by: Dilyan Marinov <[email protected]>
Co-authored-by: Antoni Ivanov <[email protected]>
Overview
Prerequisites
#2933
Benchmark values
Use case of payload objects with different key sets.
https://github.com/vmware/versatile-data-kit/blob/main/projects/vdk-plugins/vdk-oracle/tests/jobs/oracle-ingest-job-different-payloads-no-table/10_ingest.py#L6
We still want to be able to batch queries using
cursor.executemany()
. This is not feasible when the keysets differ, because we're substituting in an insert query with a static number of columns, e.g.We call this for each ingestion payload. This means that the payload should be uniform, e.g. have the same keys and same number of values for every row (object). However, this contradicts our desired use case. We've solved this by further batching the payload by key set and then doing separate
cursor.executemany()
calls for each key set.There are a few drawbacks to this approach.
frozenset
for each row. Depending on the number of columns, this could be a problem, because it's an O(n) operation.We might be constrained by linear time, because batching requires seeing at least every row. Making the data uniform maybe does not require seeing every value of every row, so there's room for optimization.
Proposed solution
A good alternative approach might be to get the sum of all key sets in the payload (all possible columns). Then, for each row, if there are missing keys, we just set them to null and do a single
cursor.executemany()
.Acceptance criteria
The text was updated successfully, but these errors were encountered: