-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Forward args to _get_remote_config() and honour core/no_scm if present #10719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #10719 +/- ##
==========================================
+ Coverage 90.68% 91.07% +0.39%
==========================================
Files 504 504
Lines 39795 39965 +170
Branches 3141 3158 +17
==========================================
+ Hits 36087 36400 +313
+ Misses 3042 2938 -104
+ Partials 666 627 -39 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dvc/repo/open_repo.py
Outdated
# It seems some tests might be passing a 'config' key that is not a dict | ||
if not isinstance(user_config, dict): | ||
user_config = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this, some tests send kwargs = {'config': None, ...
; this safeguard protects against this.
dvc/repo/open_repo.py
Outdated
|
||
if no_scm_flag is not None: | ||
# Honour specific SCM treatment if requested in the call | ||
repo = Repo(url, config={"core": {"no_scm": no_scm_flag}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR, Repo(config=...)
should just work.
repo = Repo(url, config={"core": {"no_scm": no_scm_flag}}) | |
repo = Repo(url, config=kwargs.get("config")) |
I don't want to specialize core.no_scm
in any way or handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it doesn't feel ideal to handle core.no_scm
itself. Your solution makes sense and it works for my specific use case, but it triggers other errors in the dvc
test suite; which makes me think there are other non-core.no_scm
configuration options that are being used that _get_remote_config()
doesn't like.
(I went with the core.no_scm
specific approach to highlight the need.)
These are the errors I get when using repo = Repo(url, config=kwargs.get("config"))
:
FAILED tests/func/test_import.py::test_import_no_hash[files1-expected_info_calls1] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_repo_index.py::test_data_index - dvc_data.index.index.DataIndexDirError: failed to load directory ('edir',)
FAILED tests/func/repro/test_repro_pull.py::test_repro_pulls_missing_import - dvc.exceptions.ReproductionError: failed to reproduce 'foo.dvc'
FAILED tests/func/test_data_cloud.py::test_pull_external_dvc_imports - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_data_cloud.py::test_pull_external_dvc_imports_mixed - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/repro/test_repro.py::test_repro_pulls_missing_import - dvc.exceptions.ReproductionError: failed to reproduce 'foo.dvc'
FAILED tests/func/test_import.py::test_import_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_file_from_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_file_from_dir_to_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_rev - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_import.py::test_pull_imported_stage - dvc.exceptions.CheckoutError: Checkout failed for following targets:
FAILED tests/func/test_import.py::test_pull_import_no_download - FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rodrigo.goya/pytest-17/popen-gw13/test_pull_import_no_download0/.dvc/cache/fs/local/e3501e821bcee8f40107794afbe767d1/.F0VDC8H_fGFnvnEcrt...
FAILED tests/func/test_import.py::test_pull_import_no_download_rev_lock - dvc.exceptions.DownloadError: 1 files failed to download
FAILED tests/func/test_import.py::test_pull_imported_directory_stage[dir] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_pull_imported_directory_stage[dir/] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_pull_wildcard_imported_directory_stage - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir123',)
FAILED tests/func/test_update.py::test_update_import[True] - FileNotFoundError: [Errno 2] No storage files available: 'version'
FAILED tests/func/test_import.py::test_pull_non_workspace - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_update.py::test_update_import_after_remote_updates_to_dvc - FileNotFoundError: [Errno 2] No storage files available: 'version'
FAILED tests/func/test_import.py::test_import_with_jobs - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir1',)
91cf291
to
f598ecd
Compare
f598ecd
to
652560a
Compare
@skshetry, Ok, I finally got some time to track down the errors. I took your suggestion of using something closer to I tracked the errors that were triggered by that call to the way that Since the function of I modified the code to to the above, and added some comments to the code for future reference. I also added a related test to the test suite. Please let me know what you think. Note: FYI, I've been seeing some flakiness in the test suite from what seems to be a race condition in the logs, where a message of
|
Great research, @rgoya. I have taken a quick look at the PR and it looks good. Please give me some time (maybe this week) to look at this more closely. |
This is a proposed fix for #10608, the code here makes steps 9 and 10 described in the issue work.
Summary:
This change allows a user to access the dvc information in an environment that is disconnected from the original Git backend (e.g. in a deployed container, see #10608), by using something like:
Description:
Mainly, a call to
dvc/repo/open_repo.py:open_repo(url, *args, **kwargs)
may contain a parameterconfig
in**kwargs
. With thisconfig
a user might indicate they do not want to access the repo with Git support, by usingconfig={"core": {"no_scm": True}}
.During the execution of
dvc/repo/open_repo.py:open_repo()
, there is a call to a functiondvc/repo/open_repo.py:_get_remote_config()
that returns the remote configuration({"core": {"remote"}}
. This is then merged to the user providedconfig
parameter before callingRepo(url, *args, **kwargs)
.dvc/repo/open_repo.py:_get_remote_config()
, in turn, does a quickRepo()
call to get the remote configuration. However, it does not use any of the parameters requested viadvc/repo/open_repo.py:open_repo()
and thus relies entirely on the contents of.dvc/config
. This means that even if the user requested no SCM support, it will try to look for a Git repo if.dvc/config
says so, and fail if it does not find it.This PR modifies
dvc/repo/open_repo.py:_get_remote_config()
to receive*args, **kwargs
and honour the request to use or ignore Git support when accessing the dvc repo.❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏