Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce synchronous I/O when checking ImageCache.isCached #1480

Merged
merged 2 commits into from
Jul 5, 2020

Conversation

raja-baz
Copy link
Contributor

We use ImageCache.isCached on the main thread in our application and we noticed that it would sporadically cause the main thread to hang on I/O. Some investigation showed that this was due to 2 things.

  1. A straight-up bug when building the file URL in DiskStorage which uses a variant of URLByAppendingPathComponent: that does disk I/O to figure out if the path it's building is for a directory or a file. In our case we know it's a file and this is completely unnecessary and we can simply use URLByAppendingPathComponent:isDirectory: and avoid this.
  2. While KingFisher's ultimate source-of-truth on whether an image is cached or not is whether the file exists on disk, it is possible to cheaply put an in-memory hash table check before doing so that filters out most unnecessary trips to the file system without changing the underlying semantics. Note that this should be carefully reviewed before merging as I am unsure if that structure needs to be synchronized(it currently isn't).

raja-baz added 2 commits June 30, 2020 11:09
For more details see the discussion on:

https://developer.apple.com/documentation/foundation/nsurl/1410614-urlbyappendingpathcomponent

Since we know that we're looking for a file and a trailing slash will
never be necessary, we can safely use the more specific version of the function
This is a low-cost best-effort strategy that triest to reduce as much
as possible the need for I/O to check if something is cached. A file
name we've never seen can just be skipped immediately. Something that
got added and was removed later would still end up hitting the disk to
figure that out, but that's not different from before.

Unsure about the exact multi-threading semantics of that Set though,
we might need to do some king of synchronization
@onevcat
Copy link
Owner

onevcat commented Jul 2, 2020

@raja-baz Thanks for it.

A straight-up bug when building the file URL in DiskStorage which uses a variant of URLByAppendingPathComponent: that does disk I/O

Oh, I never thought that it would be an expensive operation. I will check it.

While KingFisher's ultimate source-of-truth on whether an image is cached or not is whether the file exists on disk, it is possible to cheaply put an in-memory hash table check before doing so that filters out most unnecessary trips to the file system without changing the underlying semantics.

It is a good idea to create a cache table in memory to prevent checking the file system. But the implementation in your PR would bring quite a few false positive, which is unacceptable. Think the case that disk cache is purged by the system (it is in the cache folder so that is totally possible) after the cache table being created. The cache system would think that the cache is on disk, but finally, nothing is found.

Anyway, I will consider whether there is a better way to improve the performance here. Thank you for your valuable suggestions!

@raja-baz
Copy link
Contributor Author

raja-baz commented Jul 2, 2020

My design goal with the cache was to do a best-effort approach that changes as little and as low-risk as possible while being strictly better in terms of performance.

A false positive means we check the file system unnecessarily before finding out that the image isn't cached. However, right now that's the case for all operations. So it is strictly better than before with no visible downside. Unless I'm missing something?

Basically: I'm okay with false positive (avoidable I/O that we didn't avoid) but absolutely must never have a false negative (bug which causes network fetch when we could've loaded from disk). And I wanted to achieve that with as little change and intrusiveness in the code

@raja-baz
Copy link
Contributor Author

raja-baz commented Jul 2, 2020

PS: I even had "maybe" in the name to indicate that this could have false positives

@onevcat
Copy link
Owner

onevcat commented Jul 2, 2020

Ah! I got your idea. It seems that I misunderstood your implementation. It makes sense to add these as a "precondition" check for the actual disk file checking.

I will take a detailed look at these again.

@onevcat onevcat mentioned this pull request Jul 5, 2020
@onevcat onevcat merged commit 6969b57 into onevcat:master Jul 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants