Source file: fix csv schema discovery #15870

davydov-d · 2022-08-23T07:03:19Z

What

https://github.com/airbytehq/alpha-beta-issues/issues/174
When trying to discover the schema of a csv file, the connector iterates over dataframes and maps columns to its types. The problem is the type is overwritten every iteration, so the final schema is equivalent to the last dataframe types

How

Do not ignore dataframes. Do not narrow data types, make them only wider

davydov-d · 2022-08-23T07:04:13Z

/test connector=connectors/source-file

🕑 connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/2909581827
✅ connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/2909581827
Python tests coverage:

Name                      Stmts   Miss  Cover
---------------------------------------------
source_file/__init__.py       2      0   100%
source_file/client.py       270     38    86%
source_file/source.py        51     27    47%
---------------------------------------------
TOTAL                       323     65    80%
Name                      Stmts   Miss  Cover
---------------------------------------------
source_file/__init__.py       2      0   100%
source_file/source.py        51     17    67%
source_file/client.py       270    116    57%
---------------------------------------------
TOTAL                       323    133    59%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          10      4    60%   15-18
	 source_acceptance_test/config.py                        83      6    93%   78-80, 84-86
	 source_acceptance_test/conftest.py                     164    164     0%   6-282
	 source_acceptance_test/plugin.py                        48     48     0%   6-104
	 source_acceptance_test/tests/test_core.py              329    111    66%   39, 50-58, 63-70, 74-75, 79-80, 164, 202-219, 228-236, 240-245, 251, 284-289, 327-334, 374-376, 379, 439-448, 477-478, 484, 487, 520-530, 543-568, 573-577
	 source_acceptance_test/tests/test_full_refresh.py       52      2    96%   34, 65
	 source_acceptance_test/tests/test_incremental.py       121     25    79%   21-23, 29-31, 36-43, 48-61, 208-216
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  77     17    78%   15-16, 24-30, 47-54, 64, 67
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       110     48    56%   23-26, 32, 36, 39-64, 67-69, 72-74, 77-79, 82-84, 87-89, 92-110, 144-146
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1322    463    65%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:60: Skipping TestIncremental.test_two_sequential_reads because not found in the config
=================== 26 passed, 1 skipped in 62.65s (0:01:02) ===================

davydov-d · 2022-08-23T09:12:49Z

/publish connector=connectors/source-file

🕑 Publishing the following connectors:
connectors/source-file
https://github.com/airbytehq/airbyte/actions/runs/2910308711

Connector	Did it publish?	Were definitions generated?
connectors/source-file	✅	✅

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

lmossman · 2022-08-23T21:38:11Z

@davydov-d since this connector is used in cloud, we also need to publish the cloud version of this (i.e. with the -secure suffix on the end, similar to here) otherwise this breaks the cloud build when trying to update the OSS dependency to the latest version. I'll kick off this publish below

lmossman · 2022-08-23T21:38:25Z

/publish connector=connectors/source-file-secure

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

lmossman · 2022-08-23T22:02:33Z

Actually I had to also bump the source-file-secure version to match the source-file version, so I opened a separate PR to do that here: #15896

davydov-d · 2022-08-24T03:51:15Z

@lmossman thank you so much, sorry I missed that!

lmossman · 2022-08-24T18:37:04Z

No problem! It's really easy to miss - definitely a flaw in our current image setup that we will hopefully be addressing soon

* #174 source file: fix csv schema discovery * #174 source file: upd changelog * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <[email protected]>

#174 source file: fix csv schema discovery

590f829

github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Aug 23, 2022

octavia-squidington-iii added the connectors/source/file label Aug 23, 2022

#174 source file: upd changelog

a973b35

davydov-d self-assigned this Aug 23, 2022

davydov-d requested review from midavadim and lazebnyi August 23, 2022 07:04

grubberr approved these changes Aug 23, 2022

View reviewed changes

auto-bump connector version [ci skip]

371dce1

davydov-d merged commit 81bfb5c into master Aug 23, 2022

davydov-d deleted the ddavydov/#174-source-file-numeric-value-not-recognized branch August 23, 2022 09:38

lmossman mentioned this pull request Aug 23, 2022

Bump source-file-secure version and publish #15896

Merged

octavia-squidington-iii mentioned this pull request Aug 23, 2022

Bump Airbyte version from 0.40.1 to 0.40.2 #15909

Closed

octavia-squidington-iii mentioned this pull request Aug 24, 2022

Bump Airbyte version from 0.40.1 to 0.40.2 #15931

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source file: fix csv schema discovery #15870

Source file: fix csv schema discovery #15870

davydov-d commented Aug 23, 2022

davydov-d commented Aug 23, 2022 •

edited by github-actions bot

Loading

davydov-d commented Aug 23, 2022 •

edited by github-actions bot

Loading

lmossman commented Aug 23, 2022

lmossman commented Aug 23, 2022 •

edited by github-actions bot

Loading

lmossman commented Aug 23, 2022

davydov-d commented Aug 24, 2022

lmossman commented Aug 24, 2022

Source file: fix csv schema discovery #15870

Source file: fix csv schema discovery #15870

Conversation

davydov-d commented Aug 23, 2022

What

How

davydov-d commented Aug 23, 2022 • edited by github-actions bot Loading

Build Passed

davydov-d commented Aug 23, 2022 • edited by github-actions bot Loading

lmossman commented Aug 23, 2022

lmossman commented Aug 23, 2022 • edited by github-actions bot Loading

lmossman commented Aug 23, 2022

davydov-d commented Aug 24, 2022

lmossman commented Aug 24, 2022

davydov-d commented Aug 23, 2022 •

edited by github-actions bot

Loading

davydov-d commented Aug 23, 2022 •

edited by github-actions bot

Loading

lmossman commented Aug 23, 2022 •

edited by github-actions bot

Loading