Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make getAvailableSpace return available space instead of max requested space #773

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

deividasstr
Copy link

Hey, fixing a bug when disk persistence is not being used if requested max cache size (in DiskBufferingConfiguration.builder().setMaxCacheSize) is less than 7mb. Default 60mb sounds alot.

This log gave me a confusing hint: Insufficient folder cache size: -715243, it must be at least: 1048576.

The issue lies within fixed method getAvailableSpace, which returns available space only if max needed space is higher than available space.

It seems the fixed behavior is intended, as this method is used in DiskManager, names the returned value as availableCacheSize.

Screenshot of my debug breakpoint (requesting max 1_000_000 cache storage):
Screenshot 2025-01-28 at 15 09 04

From the numbers here, in current solution, given unlimited available disk space, more than 6_000_000 bytes have to be requested for max cache size for the if (calculatedSize < maxCacheFileSize) { to be false (and have persistence enabled).

Now writing this I got confused why would there be a min limit for cache file? Why do each signals folder have to be bigger than maxCacheFileSize with another maxCacheFileSize subtracted from the size? When CacheStorage already allocates the requested max bytes?

@deividasstr deividasstr requested a review from a team as a code owner January 28, 2025 15:32
Copy link

linux-foundation-easycla bot commented Jan 28, 2025

CLA Signed


The committers listed above are authorized under a signed CLA.

@breedx-splk
Copy link
Contributor

@deividasstr Thanks for the contribution! That error message sure is confusing and maybe even a little embarassing. 😁 We will take a look at the PR soon, but in the meantime can you please sign the contributor license agreement (CLA)? Thanks again!

@deividasstr
Copy link
Author

Hey @breedx-splk , I have signed the CLA.

So to sum up, this PR turned into a question, if there a minimum cache size required or not.

If yes, I can update the methods and docs in a separate PR.
If no, please let me know how to proceed. Maybe my proposed changes aren't required and something around if (calculatedSize < maxCacheFileSize) needs to be adjusted.

@breedx-splk
Copy link
Contributor

Hey @deividasstr . I thought @LikeTheSalad was going to comment on this, but I think the overall reason we tried to limit the amount of disk used was to try and be a good citizen on the device. If the instrumentation were to get too aggressive with disk usage (due to being offline, or for whatever reason, like excessive telemetry), this could limit the app's own ability to function. We tried to find a reasonable compromise that would allow the app to continue using the cache while the instrumentation also used part of the cache.

The idea was to eventually drop telemetry or stop capturing data if the device is too limited...that way the app is unharmed.

there a minimum cache size required or not.

I think there is. Something too small would prevent our ability to correctly buffer even a few spans or events to disk. Curious on how @LikeTheSalad thinks we should proceed.

@LikeTheSalad
Copy link
Contributor

LikeTheSalad commented Jan 31, 2025

Hi @deividasstr, thank you for your PR!

Regarding a minimum cache size requirement, at first, I thought the same as @breedx-splk, that it would be useful to ensure we have some space for the disk buffering functionality to work properly. The way the cache works here is that each signal has a folder in which we store multiple files of 1MB size max each iirc, though they can be smaller than that, so it made sense to try and ensure that there's at least 1MB available per signal, plus 1MB to account for the data that's read-only to not get cleared before getting exported.

However, after taking another look at it and also after having some extra validations for when a file is corrupted, it seems like probably having a minimum cache size might not be needed after all, given that if something goes wrong, the data will be sent to the exporter right away, which is what would happen anyway in case the disk buffering can't get properly configured due to setting a max folder size of zero, which can happen because of this validation.

So to summarise, unless I missed something important, it seems like we can remove this validation altogether.

We still need to try and use as few resources as possible from the host app though, so I don't like the idea of setting the whole available disk space size as a max cache size for the APM use case. I'm not sure what could be the best approach here, though one option could be to avoid executing this method and instead always use the getUsableSpace() approach and then use a fraction of what it returns.

@deividasstr
Copy link
Author

deividasstr commented Feb 3, 2025

Thanks for being so involving in the discussion!

My suggestion would be:

  • Remove min size requirement (this validation)
  • Reduce default cache size 60 -> 10mb? Datadog default is ~12mb, Firebase - 10mb.
  • As a developer, I would like to see how often events are lost due to insufficient storage (and if it is due to general storage shortage, or because max cache size was insufficient). Are there some logs of persistence failure? Maybe even a callback of such case could be useful.

@deividasstr
Copy link
Author

What do you think @LikeTheSalad @breedx-splk?

@LikeTheSalad
Copy link
Contributor

What do you think @LikeTheSalad @breedx-splk?

I agree with the points you mentioned earlier, I just wanted to give some time for people to provide some other ideas in case we missed something, though that doesn't seem to be the case. I've added this topic for tomorrow's Android SIG meeting to discuss the default cache size value and, if there are no other concerns raised on the call then I think we can move forward with them. For your third item, I like the idea of adding a callback for those cases, that's something I can add to the disk buffering artifact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants