-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove find
implementation
#285
Conversation
@skshetry this is implementation most likely needed (it creates a cache of top level folders most likely and their IDs). I think btw we need to update it to handle the new cache format - it might be broken, not sure. |
Shouldn't that happen as part of Line 465 in b6c83b2
|
Since |
yup, agreed. |
Again, all the changes to cache structure can be and should be done without implementing (We will only need to implement |
I don't understand the implications of this, If have time to research - compare the number of API requests, performance, MT safety that would be great. Or at least some explanation why did we have this. W/o that let's not merge this please. We need to test it on the new cache format and fix it if needed. |
Some context that I remember. In Google drive:
Thus it was important to do:
And, again, we need to review it for the new structure, otherwise it's indeed can be quite broken for DVC. |
I see that in Line 495 in a5dc1d9
which should be implemented in Lines 440 to 444 in a5dc1d9
|
The patch now caches the ID of folders on Regarding multithreading, it should be safe (thanks to GIL). |
Note that it is broken for DVC right now. |
The failure in Python 3.7 on macOS is unrelated, see actions/setup-python#682. It is also fixed in fsspec, see fsspec/filesystem_spec#1295. PyDrive2/.github/workflows/test.yml Line 27 in a5dc1d9
|
Just to be on the same page. Even with this patch it's broken for the new DVC. E.g. here when we initialize we cache root ids, we don't cache all of the now. Upstream we should be also creating them once, etc. For this patch. How do we use / used |
I got confirmation from user that it's broken for legacy ODB from a user. I need to investigate that.
It gets cached when Line 430 in f976876
This is where we use |
Seems like as always gdrive is very involved and the state of pydrive2 and related stuff is not the best. Along with pydrive2 using legacy API version that might get dropped in the future, I just want to say, for the record, that dropping gdrive is also an option that we should consider here if we can't fix it quickly. It is great for onboarding, but probably doesn't contribute anything to the customers and we need to carefully consider if we can spend time on this. Not saying we should drop it now, but that we should keep that as a possible solution. |
This
Things that are potentially broken / not optimal is that we don't pre-cache Otherwise I don't see why it would be broken. Were you able to reproduce it? I can run and probably fix it quickly then. |
This does cache ids lazily now during
This is the one that is broken at this time, and this PR avoids that for simplicity.
I have an alternative patch that caches |
@skshetry was the user problem reproduced? Can we start with that please. If there is way to reproduce there should be probably a simple fix? Number of API calls is critical for GDrive DVC- thus this optimization. Unless we really have a food reason for this, let's not try to simplify it for now please? |
@shcheklein, there were two issues:
|
Closing in favour of #286. |
fsspec by default provides a
find
implementation that is built on top ofls
andinfo
. Looking at the current implementation, it did not seem like it was recursive (maybe I am wrong here?), and looked complicated that I'd rather simplify and default to fsspec, and specialize only if needed.