-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Dataset API add save
method
#180
feat: Dataset API add save
method
#180
Conversation
8cd49e7
to
96ecbdc
Compare
Hello, do you know if there is documentation that needs to be updated apart from the docstring of the class ? |
Hello @McDonnellJoseph, thanks for this contribution! In this case "documentation" only refers to the docstrings, yes. The CI is failing because some lines are not covered by tests:
|
Thanks for pointing this out, I just added full test coverage ! |
I haven't changed any code since last commit which passed CI. |
Otherwise I think this is ready for review |
@McDonnellJoseph it's an intermittent issue with unknown root cause (see kedro-org/kedro#2570), I restarted it but I'm confident the tests will pass. |
save
method
@astrojuanlu Yes that's what I thought but I don't think contributors can restart the tests. Anyway, I think the code is good for review now, got full test coverage + some better documentation for the new save feature and how it works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, it looks pretty solid to me! Just thinking top of my head, does anyone ever use this for POST request? It's seem possible but I find it a bit weird. Maybe the right thing to do is GET
for load
and PUT
DELETE
POST
for save
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to only enable GET
for load
and POST
PUT
for save
. In fact we have already refactor some of the arguments in #181 (It was done in kedro develop but we forgot to sync it).
So we should have load_args and save_args, which make the dataset more consistent with other datasets too.
Addition - I would suggest to do the refactoring after kedro-org/kedro#2570 is merged as it changes the argument a bit. |
Hello, maybe I misunderstood but don't you mean pr #184 ? |
@McDonnellJoseph Sorry about that, you are right! #184 is merged now, please update this branch. There are quite are few changes because of the change in arguments. Let me know if you get stuck. |
Hello again, I added your suggested modifications to the code and I also merged the recent modifications made to the APIDataSet with my work. |
Hello, I'm still waiting for a review and would be happy to make a contribution to the project. Is something missing in the PR to get a review ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @McDonnellJoseph ! I've left some comments. I'll be happy to approve when those are addressed 🙂
855912a
to
b396ad3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@McDonnellJoseph Thanks for persevering on this. I missed the Github notification, feel free to @me or find me on Slack if I haven't responded for a long time.
b396ad3
to
dab7584
Compare
Looks like unsigned commit is 07eb097, which is already in |
Thanks ! I got lost with the rebase which is the reason behind the large amount of commits for the PR |
Signed-off-by: jmcdonnell <[email protected]>
Signed-off-by: jmcdonnell <[email protected]>
Signed-off-by: jmcdonnell <[email protected]>
Signed-off-by: jmcdonnell <[email protected]>
Signed-off-by: jmcdonnell <[email protected]>
134ab8a
to
0e35eef
Compare
Git magic
Git magic worked ! Thanks a lot, won't the commits be squashed once the branch is merged ? |
👽 git magic is indeed miraculous! Yes, the commits will be squashed on merge. But to achieve that, even with this nice linearized history, first you'll need to fix the current conflicts. |
9f37087
to
f8631cc
Compare
save
methodsave
method
Well done @McDonnellJoseph, thanks for persevering! 🙌🏽 I think this is ready for another round of reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the PR is looking better now.
If the DCO fails, I suggested just leave it until everything is reviewed, as this is a little bit hard to review now since all previous review comments are gone.
In general it's good to have documentation consistent with other dataset, I would suggest to have a look at the popular one (i.e. CSVDataSet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the PR is looking better now.
If the DCO fails, I suggested just leave it until everything is reviewed, as this is a little bit hard to review now since all previous review comments are gone.
In general it's good to have documentation consistent with other dataset, I would suggest to have a look at the popular one (i.e. CSVDataSet).
Do you think there is somewhere more appropriate in the file for the explanation paragraph on how the save method works ? Or should I just dump it altogether ? @noklam |
f8631cc
to
1f0b3de
Compare
I think it's fine to leave it within the docstring, @stichbury for comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done @McDonnellJoseph!
Signed-off-by: jmcdonnell <[email protected]>
Congrats @McDonnellJoseph and thank you for your contribution! 🙌🏽 |
* [FEAT] add save method to APIDataset Signed-off-by: jmcdonnell <[email protected]> * [ENH] create save_args parameter for api_dataset Signed-off-by: jmcdonnell <[email protected]> * [ENH] add tests for socket + http errors Signed-off-by: <[email protected]> Signed-off-by: jmcdonnell <[email protected]> * [ENH] check save data is json Signed-off-by: <[email protected]> Signed-off-by: jmcdonnell <[email protected]> * [FIX] clean code Signed-off-by: jmcdonnell <[email protected]> * [ENH] handle different data types Signed-off-by: jmcdonnell <[email protected]> * [FIX] test coverage for exceptions Signed-off-by: jmcdonnell <[email protected]> * [ENH] add examples in APIDataSet docstring Signed-off-by: jmcdonnell <[email protected]> * sync APIDataSet from kedro's `develop` (kedro-org#184) * Update APIDataSet Signed-off-by: Nok Chan <[email protected]> * Sync ParquetDataSet Signed-off-by: Nok Chan <[email protected]> * Sync Test Signed-off-by: Nok Chan <[email protected]> * Linting Signed-off-by: Nok Chan <[email protected]> * Revert Unnecessary ParquetDataSet Changes Signed-off-by: Nok Chan <[email protected]> * Sync release notes Signed-off-by: Nok Chan <[email protected]> --------- Signed-off-by: Nok Chan <[email protected]> Signed-off-by: jmcdonnell <[email protected]> * [FIX] remove support for delete method Signed-off-by: jmcdonnell <[email protected]> * [FIX] lint files Signed-off-by: jmcdonnell <[email protected]> * [FIX] fix conflicts Signed-off-by: jmcdonnell <[email protected]> * [FIX] remove fail save test Signed-off-by: jmcdonnell <[email protected]> * [ENH] review suggestions Signed-off-by: jmcdonnell <[email protected]> * [ENH] fix tests Signed-off-by: jmcdonnell <[email protected]> * [FIX] reorder arguments Signed-off-by: jmcdonnell <[email protected]> --------- Signed-off-by: jmcdonnell <[email protected]> Signed-off-by: <[email protected]> Signed-off-by: Nok Chan <[email protected]> Co-authored-by: jmcdonnell <[email protected]> Co-authored-by: Nok Lam Chan <[email protected]> Signed-off-by: Tom Kurian <[email protected]>
* Fix links on GitHub issue templates (#150) Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * add spark_stream_dataset.py Signed-off-by: Tingting_Wan <[email protected]> * Migrate most of `kedro-datasets` metadata to `pyproject.toml` (#161) * Include missing requirements files in sdist Fix gh-86. Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Migrate most project metadata to `pyproject.toml` See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Move requirements to `pyproject.toml` Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> --------- Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * restructure the strean dataset to align with the other spark dataset Signed-off-by: Tingting_Wan <[email protected]> * adding README.md for specification Signed-off-by: Tingting_Wan <[email protected]> * Update kedro-datasets/kedro_datasets/spark/spark_stream_dataset.py Co-authored-by: Nok Lam Chan <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * rename the dataset Signed-off-by: Tingting_Wan <[email protected]> * resolve comments Signed-off-by: Tingting_Wan <[email protected]> * fix format and pylint Signed-off-by: Tingting_Wan <[email protected]> * Update kedro-datasets/kedro_datasets/spark/README.md Co-authored-by: Deepyaman Datta <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * add unit tests and SparkStreamingDataset in init.py Signed-off-by: Tingting_Wan <[email protected]> * add unit tests Signed-off-by: Tingting_Wan <[email protected]> * update test_save Signed-off-by: Tingting_Wan <[email protected]> * Upgrade Polars (#171) * Upgrade Polars Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Update Polars to 0.17.x --------- Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * if release is failed, it return exit code and fail the CI (#158) Signed-off-by: Tingting_Wan <[email protected]> * Migrate `kedro-airflow` to static metadata (#172) * Migrate kedro-airflow to static metadata See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Add explicit PEP 518 build requirements for kedro-datasets Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Typos Co-authored-by: Merel Theisen <[email protected]> Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Remove dangling reference to requirements.txt Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> --------- Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * Migrate `kedro-telemetry` to static metadata (#174) * Migrate kedro-telemetry to static metadata See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> --------- Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * ci: port lint, unit test, and e2e tests to Actions (#155) * Add unit test + lint test on GA * trigger GA - will revert Signed-off-by: Ankita Katiyar <[email protected]> * Fix lint Signed-off-by: Ankita Katiyar <[email protected]> * Add end to end tests * Add cache key Signed-off-by: Ankita Katiyar <[email protected]> * Add cache action Signed-off-by: Ankita Katiyar <[email protected]> * Rename workflow files Signed-off-by: Ankita Katiyar <[email protected]> * Lint + add comment + default bash Signed-off-by: Ankita Katiyar <[email protected]> * Add windows test Signed-off-by: Ankita Katiyar <[email protected]> * Update workflow name + revert changes to READMEs Signed-off-by: Ankita Katiyar <[email protected]> * Add kedro-telemetry/RELEASE.md to trufflehog ignore Signed-off-by: Ankita Katiyar <[email protected]> * Add pytables to test_requirements remove from workflow Signed-off-by: Ankita Katiyar <[email protected]> * Revert "Add pytables to test_requirements remove from workflow" This reverts commit 8203daa. * Separate pip freeze step Signed-off-by: Ankita Katiyar <[email protected]> --------- Signed-off-by: Ankita Katiyar <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * Migrate `kedro-docker` to static metadata (#173) * Migrate kedro-docker to static metadata See kedro-org/kedro#2334. Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Address packaging warning Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Fix tests Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Actually install current plugin with dependencies Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> --------- Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * Introdcuing .gitpod.yml to kedro-plugins (#185) Currently opening gitpod will installed a Python 3.11 which breaks everything because we don't support it set. This PR introduce a simple .gitpod.yml to get it started. Signed-off-by: Tingting_Wan <[email protected]> * sync APIDataSet from kedro's `develop` (#184) * Update APIDataSet Signed-off-by: Nok Chan <[email protected]> * Sync ParquetDataSet Signed-off-by: Nok Chan <[email protected]> * Sync Test Signed-off-by: Nok Chan <[email protected]> * Linting Signed-off-by: Nok Chan <[email protected]> * Revert Unnecessary ParquetDataSet Changes Signed-off-by: Nok Chan <[email protected]> * Sync release notes Signed-off-by: Nok Chan <[email protected]> --------- Signed-off-by: Nok Chan <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * formatting Signed-off-by: Tingting_Wan <[email protected]> * formatting Signed-off-by: Tingting_Wan <[email protected]> * formatting Signed-off-by: Tingting_Wan <[email protected]> * formatting Signed-off-by: Tingting_Wan <[email protected]> * add spark_stream_dataset.py Signed-off-by: Tingting_Wan <[email protected]> * restructure the strean dataset to align with the other spark dataset Signed-off-by: Tingting_Wan <[email protected]> * adding README.md for specification Signed-off-by: Tingting_Wan <[email protected]> * Update kedro-datasets/kedro_datasets/spark/spark_stream_dataset.py Co-authored-by: Nok Lam Chan <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * rename the dataset Signed-off-by: Tingting_Wan <[email protected]> * resolve comments Signed-off-by: Tingting_Wan <[email protected]> * fix format and pylint Signed-off-by: Tingting_Wan <[email protected]> * Update kedro-datasets/kedro_datasets/spark/README.md Co-authored-by: Deepyaman Datta <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> * add unit tests and SparkStreamingDataset in init.py Signed-off-by: Tingting_Wan <[email protected]> * add unit tests Signed-off-by: Tingting_Wan <[email protected]> * update test_save Signed-off-by: Tingting_Wan <[email protected]> * formatting Signed-off-by: Tingting_Wan <[email protected]> * formatting Signed-off-by: Tingting_Wan <[email protected]> * formatting Signed-off-by: Tingting_Wan <[email protected]> * formatting Signed-off-by: Tingting_Wan <[email protected]> * lint Signed-off-by: Tingting_Wan <[email protected]> * lint Signed-off-by: Tingting_Wan <[email protected]> * lint Signed-off-by: Tingting_Wan <[email protected]> * update test cases Signed-off-by: Tingting_Wan <[email protected]> * add negative test Signed-off-by: Tingting_Wan <[email protected]> * remove code snippets fpr testing Signed-off-by: Tingting_Wan <[email protected]> * lint Signed-off-by: Tingting_Wan <[email protected]> * update tests Signed-off-by: Tingting_Wan <[email protected]> * update test and remove redundacy Signed-off-by: Tingting_Wan <[email protected]> * linting Signed-off-by: Tingting_Wan <[email protected]> * refactor file format Signed-off-by: Tom Kurian <[email protected]> * fix read me file Signed-off-by: Tom Kurian <[email protected]> * docs: Add community contributions (#199) * Add community contributions Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Use newer link to docs Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> --------- Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * adding test for raise error Signed-off-by: Tingting_Wan <[email protected]> * update test and remove redundacy Signed-off-by: Tingting_Wan <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * linting Signed-off-by: Tingting_Wan <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * refactor file format Signed-off-by: Tom Kurian <[email protected]> * fix read me file Signed-off-by: Tom Kurian <[email protected]> * adding test for raise error Signed-off-by: Tingting_Wan <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * fix readme file Signed-off-by: Tom Kurian <[email protected]> * fix readme Signed-off-by: Tom Kurian <[email protected]> * fix conflicts Signed-off-by: Tom Kurian <[email protected]> * fix ci erors Signed-off-by: Tom Kurian <[email protected]> * fix lint issue Signed-off-by: Tom Kurian <[email protected]> * update class documentation Signed-off-by: Tom Kurian <[email protected]> * add additional test cases Signed-off-by: Tom Kurian <[email protected]> * add s3 read test cases Signed-off-by: Tom Kurian <[email protected]> * add s3 read test cases Signed-off-by: Tom Kurian <[email protected]> * add s3 read test case Signed-off-by: Tom Kurian <[email protected]> * test s3 read Signed-off-by: Tom Kurian <[email protected]> * remove redundant test cases Signed-off-by: Tom Kurian <[email protected]> * fix streaming dataset configurations Signed-off-by: Tom Kurian <[email protected]> * update streaming datasets doc Signed-off-by: Tingting_Wan <[email protected]> * resolve comments re documentation Signed-off-by: Tingting_Wan <[email protected]> * bugfix lint Signed-off-by: Tingting_Wan <[email protected]> * update link Signed-off-by: Tingting_Wan <[email protected]> * revert the changes on CI Signed-off-by: Nok Chan <[email protected]> * test(docker): remove outdated logging-related step (#207) * fixkedro- docker e2e test Signed-off-by: Nok Chan <[email protected]> * fix: add timeout to request to satisfy bandit lint --------- Signed-off-by: Nok Chan <[email protected]> Co-authored-by: Deepyaman Datta <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * ci: ensure plugin requirements get installed in CI (#208) * ci: install the plugin alongside test requirements * ci: install the plugin alongside test requirements * Update kedro-airflow.yml * Update kedro-datasets.yml * Update kedro-docker.yml * Update kedro-telemetry.yml * Update kedro-airflow.yml * Update kedro-datasets.yml * Update kedro-airflow.yml * Update kedro-docker.yml * Update kedro-telemetry.yml * ci(telemetry): update isort config to correct sort * Don't use profile ¯\_(ツ)_/¯ Signed-off-by: Deepyaman Datta <[email protected]> * chore(datasets): remove empty `tool.black` section * chore(docker): remove empty `tool.black` section --------- Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * ci: Migrate the release workflow from CircleCI to GitHub Actions (#203) * Create check-release.yml * change from test pypi to pypi * split into jobs and move version logic into script * update github actions output * lint * changes based on review * changes based on review * fix script to not append continuously * change pypi api token logic Signed-off-by: Tom Kurian <[email protected]> * build: Relax Kedro bound for `kedro-datasets` (#140) * Less strict pin on Kedro for datasets Signed-off-by: Merel Theisen <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * ci: don't run checks on both `push`/`pull_request` (#192) * ci: don't run checks on both `push`/`pull_request` * ci: don't run checks on both `push`/`pull_request` * ci: don't run checks on both `push`/`pull_request` * ci: don't run checks on both `push`/`pull_request` Signed-off-by: Tom Kurian <[email protected]> * chore: delete extra space ending check-release.yml (#210) Signed-off-by: Tom Kurian <[email protected]> * ci: Create merge-gatekeeper.yml to make sure PR only merged when all tests checked. (#215) * Create merge-gatekeeper.yml * Update .github/workflows/merge-gatekeeper.yml --------- Co-authored-by: Sajid Alam <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * ci: Remove the CircleCI setup (#209) * remove circleci setup files and utils * remove circleci configs in kedro-telemetry * remove redundant .github in kedro-telemetry * Delete continue_config.yml * Update check-release.yml * lint * increase timeout to 40 mins for docker e2e tests Signed-off-by: Tom Kurian <[email protected]> * feat: Dataset API add `save` method (#180) * [FEAT] add save method to APIDataset Signed-off-by: jmcdonnell <[email protected]> * [ENH] create save_args parameter for api_dataset Signed-off-by: jmcdonnell <[email protected]> * [ENH] add tests for socket + http errors Signed-off-by: <[email protected]> Signed-off-by: jmcdonnell <[email protected]> * [ENH] check save data is json Signed-off-by: <[email protected]> Signed-off-by: jmcdonnell <[email protected]> * [FIX] clean code Signed-off-by: jmcdonnell <[email protected]> * [ENH] handle different data types Signed-off-by: jmcdonnell <[email protected]> * [FIX] test coverage for exceptions Signed-off-by: jmcdonnell <[email protected]> * [ENH] add examples in APIDataSet docstring Signed-off-by: jmcdonnell <[email protected]> * sync APIDataSet from kedro's `develop` (#184) * Update APIDataSet Signed-off-by: Nok Chan <[email protected]> * Sync ParquetDataSet Signed-off-by: Nok Chan <[email protected]> * Sync Test Signed-off-by: Nok Chan <[email protected]> * Linting Signed-off-by: Nok Chan <[email protected]> * Revert Unnecessary ParquetDataSet Changes Signed-off-by: Nok Chan <[email protected]> * Sync release notes Signed-off-by: Nok Chan <[email protected]> --------- Signed-off-by: Nok Chan <[email protected]> Signed-off-by: jmcdonnell <[email protected]> * [FIX] remove support for delete method Signed-off-by: jmcdonnell <[email protected]> * [FIX] lint files Signed-off-by: jmcdonnell <[email protected]> * [FIX] fix conflicts Signed-off-by: jmcdonnell <[email protected]> * [FIX] remove fail save test Signed-off-by: jmcdonnell <[email protected]> * [ENH] review suggestions Signed-off-by: jmcdonnell <[email protected]> * [ENH] fix tests Signed-off-by: jmcdonnell <[email protected]> * [FIX] reorder arguments Signed-off-by: jmcdonnell <[email protected]> --------- Signed-off-by: jmcdonnell <[email protected]> Signed-off-by: <[email protected]> Signed-off-by: Nok Chan <[email protected]> Co-authored-by: jmcdonnell <[email protected]> Co-authored-by: Nok Lam Chan <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * ci: Automatically extract release notes for GitHub Releases (#212) * ci: Automatically extract release notes Signed-off-by: Ankita Katiyar <[email protected]> * fix lint Signed-off-by: Ankita Katiyar <[email protected]> * Raise exceptions Signed-off-by: Ankita Katiyar <[email protected]> * Lint Signed-off-by: Ankita Katiyar <[email protected]> * Lint Signed-off-by: Ankita Katiyar <[email protected]> --------- Signed-off-by: Ankita Katiyar <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * feat: Add metadata attribute to datasets (#189) * Add metadata attribute to all datasets Signed-off-by: Ahdra Merali <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * feat: Add ManagedTableDataset for managed Delta Lake tables in Databricks (#206) * committing first version of UnityTableCatalog with unit tests. This datasets allows users to interface with Unity catalog tables in Databricks to both read and write. Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * renaming dataset Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * adding mlflow connectors Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * fixing mlflow imports Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * cleaned up mlflow for initial release Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * cleaned up mlflow references from setup.py for initial release Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * fixed deps in setup.py Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * adding comments before intiial PR Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * moved validation to dataclass Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * bug fix in type of partition column and cleanup Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * updated docstring for ManagedTableDataSet Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * added backticks to catalog Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * fixing regex to allow hyphens Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/test_requirements.txt Co-authored-by: Jannic <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Jannic <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * adding backticks to catalog Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> * Require pandas < 2.0 for compatibility with spark < 3.4 Signed-off-by: Jannic Holzer <[email protected]> * Replace use of walrus operator Signed-off-by: Jannic Holzer <[email protected]> * Add test coverage for validation methods Signed-off-by: Jannic Holzer <[email protected]> * Remove unused versioning functions Signed-off-by: Jannic Holzer <[email protected]> * Fix exception catching for invalid schema, add test for invalid schema Signed-off-by: Jannic Holzer <[email protected]> * Add pylint ignore Signed-off-by: Jannic Holzer <[email protected]> * Add tests/databricks to ignore for no-spark tests Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Nok Lam Chan <[email protected]> * Update kedro-datasets/kedro_datasets/databricks/managed_table_dataset.py Co-authored-by: Nok Lam Chan <[email protected]> * Remove spurious mlflow test dependency Signed-off-by: Jannic Holzer <[email protected]> * Add explicit check for database existence Signed-off-by: Jannic Holzer <[email protected]> * Remove character limit for table names Signed-off-by: Jannic Holzer <[email protected]> * Refactor validation steps in ManagedTable Signed-off-by: Jannic Holzer <[email protected]> * Remove spurious checks for table and schema name existence Signed-off-by: Jannic Holzer <[email protected]> --------- Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> Co-authored-by: Danny Farah <[email protected]> Co-authored-by: Danny Farah <[email protected]> Co-authored-by: Nok Lam Chan <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * docs: Update APIDataset docs and refactor (#217) * Update APIDataset docs and refactor * Acknowledge community contributor * Fix more broken doc Signed-off-by: Nok Chan <[email protected]> * Lint Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Fix release notes of upcoming kedro-datasets --------- Signed-off-by: Nok Chan <[email protected]> Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> Co-authored-by: Juan Luis Cano Rodríguez <[email protected]> Co-authored-by: Jannic <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * feat: Release `kedro-datasets` version `1.3.0` (#219) * Modify release version and RELEASE.md Signed-off-by: Jannic Holzer <[email protected]> * Add proper name for ManagedTableDataSet Signed-off-by: Jannic Holzer <[email protected]> * Update kedro-datasets/RELEASE.md Co-authored-by: Juan Luis Cano Rodríguez <[email protected]> * Revert lost semicolon for release 1.2.0 Signed-off-by: Jannic Holzer <[email protected]> --------- Signed-off-by: Jannic Holzer <[email protected]> Co-authored-by: Juan Luis Cano Rodríguez <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * docs: Fix APIDataSet docstring (#220) * Fix APIDataSet docstring Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Add release notes Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> * Separate [docs] extras from [all] in kedro-datasets Fix gh-143. Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> --------- Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * Update kedro-datasets/tests/spark/test_spark_streaming_dataset.py Co-authored-by: Deepyaman Datta <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * Update kedro-datasets/kedro_datasets/spark/spark_streaming_dataset.py Co-authored-by: Deepyaman Datta <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * Update kedro-datasets/setup.py Co-authored-by: Deepyaman Datta <[email protected]> Signed-off-by: Tom Kurian <[email protected]> * fix linting issue Signed-off-by: Tom Kurian <[email protected]> --------- Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> Signed-off-by: Tingting_Wan <[email protected]> Signed-off-by: Juan Luis Cano Rodríguez <[email protected]> Signed-off-by: Ankita Katiyar <[email protected]> Signed-off-by: Nok Chan <[email protected]> Signed-off-by: Tom Kurian <[email protected]> Signed-off-by: Deepyaman Datta <[email protected]> Signed-off-by: Merel Theisen <[email protected]> Signed-off-by: jmcdonnell <[email protected]> Signed-off-by: <[email protected]> Signed-off-by: Ahdra Merali <[email protected]> Signed-off-by: Danny Farah <[email protected]> Signed-off-by: Jannic Holzer <[email protected]> Co-authored-by: Juan Luis Cano Rodríguez <[email protected]> Co-authored-by: Tingting Wan <[email protected]> Co-authored-by: Nok Lam Chan <[email protected]> Co-authored-by: Deepyaman Datta <[email protected]> Co-authored-by: Nok Lam Chan <[email protected]> Co-authored-by: Ankita Katiyar <[email protected]> Co-authored-by: Juan Luis Cano Rodríguez <[email protected]> Co-authored-by: Tom Kurian <[email protected]> Co-authored-by: Sajid Alam <[email protected]> Co-authored-by: Merel Theisen <[email protected]> Co-authored-by: McDonnellJoseph <[email protected]> Co-authored-by: jmcdonnell <[email protected]> Co-authored-by: Ahdra Merali <[email protected]> Co-authored-by: Jannic <[email protected]> Co-authored-by: Danny Farah <[email protected]> Co-authored-by: Danny Farah <[email protected]> Co-authored-by: kuriantom369 <[email protected]>
Description
We sometimes want to save some data with a REST Api (say with Django Rest Framework), most of the time issuing a POST request. So far, the APIDataSet is read only, with the _save method raising a DataSetError, we'd want to extend it to make a request.
Closes #166
Development notes
If applied, this commit will:
Checklist
RELEASE.md
file