Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source file: fix csv schema discovery #15870

Merged

Conversation

davydov-d
Copy link
Collaborator

What

https://github.com/airbytehq/alpha-beta-issues/issues/174
When trying to discover the schema of a csv file, the connector iterates over dataframes and maps columns to its types. The problem is the type is overwritten every iteration, so the final schema is equivalent to the last dataframe types

How

Do not ignore dataframes. Do not narrow data types, make them only wider

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Aug 23, 2022
@davydov-d
Copy link
Collaborator Author

davydov-d commented Aug 23, 2022

/test connector=connectors/source-file

🕑 connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/2909581827
✅ connectors/source-file https://github.com/airbytehq/airbyte/actions/runs/2909581827
Python tests coverage:

Name                      Stmts   Miss  Cover
---------------------------------------------
source_file/__init__.py       2      0   100%
source_file/client.py       270     38    86%
source_file/source.py        51     27    47%
---------------------------------------------
TOTAL                       323     65    80%
Name                      Stmts   Miss  Cover
---------------------------------------------
source_file/__init__.py       2      0   100%
source_file/source.py        51     17    67%
source_file/client.py       270    116    57%
---------------------------------------------
TOTAL                       323    133    59%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          10      4    60%   15-18
	 source_acceptance_test/config.py                        83      6    93%   78-80, 84-86
	 source_acceptance_test/conftest.py                     164    164     0%   6-282
	 source_acceptance_test/plugin.py                        48     48     0%   6-104
	 source_acceptance_test/tests/test_core.py              329    111    66%   39, 50-58, 63-70, 74-75, 79-80, 164, 202-219, 228-236, 240-245, 251, 284-289, 327-334, 374-376, 379, 439-448, 477-478, 484, 487, 520-530, 543-568, 573-577
	 source_acceptance_test/tests/test_full_refresh.py       52      2    96%   34, 65
	 source_acceptance_test/tests/test_incremental.py       121     25    79%   21-23, 29-31, 36-43, 48-61, 208-216
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  77     17    78%   15-16, 24-30, 47-54, 64, 67
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       110     48    56%   23-26, 32, 36, 39-64, 67-69, 72-74, 77-79, 82-84, 87-89, 92-110, 144-146
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1322    463    65%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/source_acceptance_test/plugin.py:60: Skipping TestIncremental.test_two_sequential_reads because not found in the config
=================== 26 passed, 1 skipped in 62.65s (0:01:02) ===================

@davydov-d davydov-d self-assigned this Aug 23, 2022
@davydov-d
Copy link
Collaborator Author

davydov-d commented Aug 23, 2022

/publish connector=connectors/source-file

🕑 Publishing the following connectors:
connectors/source-file
https://github.com/airbytehq/airbyte/actions/runs/2910308711


Connector Did it publish? Were definitions generated?
connectors/source-file

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@davydov-d davydov-d merged commit 81bfb5c into master Aug 23, 2022
@davydov-d davydov-d deleted the ddavydov/#174-source-file-numeric-value-not-recognized branch August 23, 2022 09:38
@lmossman
Copy link
Contributor

@davydov-d since this connector is used in cloud, we also need to publish the cloud version of this (i.e. with the -secure suffix on the end, similar to here) otherwise this breaks the cloud build when trying to update the OSS dependency to the latest version. I'll kick off this publish below

@lmossman
Copy link
Contributor

lmossman commented Aug 23, 2022

/publish connector=connectors/source-file-secure

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@lmossman
Copy link
Contributor

Actually I had to also bump the source-file-secure version to match the source-file version, so I opened a separate PR to do that here: #15896

@davydov-d
Copy link
Collaborator Author

@lmossman thank you so much, sorry I missed that!

@lmossman
Copy link
Contributor

No problem! It's really easy to miss - definitely a flaw in our current image setup that we will hopefully be addressing soon

rodireich pushed a commit that referenced this pull request Aug 25, 2022
* #174 source file: fix csv schema discovery

* #174 source file: upd changelog

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants