Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onyx optimizations #88

Closed
wants to merge 12 commits into from

Conversation

kidroca
Copy link
Contributor

@kidroca kidroca commented Jul 14, 2021

cc @marcaaron @tgolen

Details

This PR contains a few optimizations and refactors

  1. The first commit has and updated Onyx.printMetrics function. You can use it as a base to compare before/after results. It prints a general table on the console which is easy to copy and put in excel.
  2. Fix: some connections happen even before Onyx.init is called. This should not be allowed as the initialKeyStates are applied during init. Added a deferred tasks that would queue incoming connections and only make them after init, this also gives the rest of the app init some breathing space
  3. Perf: Update set to check with cache and skip writing to storage. Cache reflects what's already on disc, if we have the exact value in cache it means it didn't change and we don't need to make a write
  4. Perf: Updated cache clearing strategy. It's a mix of LRU and active connections. The cache works in the following way: each time a connection is terminated we run a cleanCache function. The function removes dated items from cache, if the item is still being used by a component it's kept in cache. By default it will try to keep up to ~150 keys in cache, can be a bit more due to active connections to old items
  5. Refactor: merge was refactored so that it's promises are resolved when a queue for a key is done. This helps with measuring the function more accurately
  6. Perf: Updated defaultKeyStates handling. Merging these keys one by one was taking a significant amount of time, even though they are only 4. Instead of writing them to storage, we read data from storage and merge it along with the defaults to the cache. Then we never allow for the defaultKeys to be removed from cache

Related Issues

Expensify/App#2667

Automated Tests

Added and updated tests related to changes in cache handling.
The rest of the changes are covered by the existing tests

Linked PRs

TBD

Benchmark results

There are overall improvements that can be seen in the stats here
https://docs.google.com/spreadsheets/d/1kHRvFL1ITbr4p9TzT_nsmtvCY4KoPwKCXp84FTCsGzk/edit?usp=sharing

Android
Before Updates After Updates
Total Onyx Time 55.06sec 19.67sec
Heavy reading ends at 8.27sec 5.44sec
iOS
Before Updates After Updates
Total Onyx Time 16.37sec 8.74sec
Heavy reading ends at 2.96sec 2.98sec
  • Total: the total time spent by Onyx functions data fetching/writing
  • Heavy reading: the point since app launch at which hard reads from disk are over
  • The link above contains device information and how the tests were performed

kidroca added 12 commits July 14, 2021 14:32
`isCollectionKey(mapping.key)` and `isSafeEvictionKey(mapping.key)` are not working
correctly in this case since neither `onyxKeys` or `safeEvictionKeys` are set

This also helps other logic running during init, as connections would wait a bit
and allow the rest of the app to initialize
…AccessedKeys

It seems `recentlyAccessedKeys` were not allow to contain non safe eviction keys
because of the `evictStorageAndRetry` logic
But the logic there can be update to filter out "unsafe" keys

Then `recentlyAccessedKeys` can do what it was meant to do and be a list of
accessed keys sorted in access order

This allows to clean dated keys from cache

Keys are added to the recent access list during retrieval instead of during connect
All read request happen through that place

Added a configurable limit of 150 keys in cache
This looks like a sane value from observations during chat browsing

Dated items with active connections aren't removed from cache for better optimization

The behavior can be switched off by providing maxCachedKeysCount=0
Then the cache will be handled as before -> remove keys that we're no longer connected to
The reading times do improve but the overall performance gets worse

It seems too much data is batched at once
when Onyx sends it to the subscribers there's a brief period where
the app is unresponsive
Instead of resolving ahead of time `Promise.resolve()` - capture the write promise
This way the promise resolves one time - when the queue for this key finish
The main benefit here is that the true timing of the merge is captured
The variable `defaultKeyStates` is already preserved in memory
for the life of the application, there's no need to keep it
in hard storage as well

This not only saves us a write, but instead we do a multiGet
to merge any data from storage to our defaults
This well we've prefilled the cache with some storage data
that is immediately needed
@kidroca kidroca requested a review from a team as a code owner July 14, 2021 18:13
@MelvinBot MelvinBot requested review from robertjchen and removed request for a team July 14, 2021 18:14
@kidroca
Copy link
Contributor Author

kidroca commented Jul 14, 2021

There's not much to be seen in a visual comparison. I expected to see the LHN much faster, but it seems Onyx is not the main bottleneck ATM

SBS Before/After Android Physical Device

Before_After_SbS.mp4
  • I've trimmed the videos to the point where the JS starts to run, otherwise there is like a ~10sec. splash screen
  • This should be faster with a prod build, but I don't have the signing certificate needed to make one

Overall I can feel the app is a bit smoother
There's a noticeable improvement on iOS as well, at one point I've removed the full screen loader, to test a bit and it wasn't even needed - chats rendered instantly

cc @quinthar
Benchmark info attached in the main description

@kidroca
Copy link
Contributor Author

kidroca commented Jul 14, 2021

Here's the E.cash PR where you can build and try out the changes: Expensify/App#4040 on E.cash

Comment on lines -157 to +154
if (isCollectionKey(key) || !isSafeEvictionKey(key)) {
if (isCollectionKey(key)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was updated to allow to keep more items in the access list
To preserve the same functionality the check was moved here: Line 481

const keyForRemoval = _.find(recentlyAccessedKeys, key => !evictionBlocklist[key]);
const keyForRemoval = _.find(
recentlyAccessedKeys,
key => !evictionBlocklist[key] && isSafeEvictionKey(key),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check isSafeEvictionKey was moved here to allow to use recentlyAccessedKeys to actually put all recently accessed keys to that list

@marcaaron
Copy link
Contributor

Hey @kidroca, Thanks for the changes + metrics! Super exciting stuff!

Before diving into any reviews I'd like to focus on one change at a time to make them easier to evaluate. It seems like there's a mix of refactoring, cleanup, and performance stuff going on - which is great to see, but makes it difficult to get a sense for the impact of each individual change + also harder still to test and track any regressions if they pop up.

Let's create PRs for each of these individual changes, review each one in isolation, and discuss anything that requires further explanation in that PR. It would also be good to prioritize the PRs based on which ones have the largest impacts on performance so we can get those out sooner than later.


Specific Notes / Questions:

The first commit has and updated Onyx.printMetrics function. You can use it as a base to compare before/after results. It prints a general table on the console which is easy to copy and put in excel.

Refactor: merge was refactored so that it's promises are resolved when a queue for a key is done. This helps with measuring the function more accurately

Sounds like these two could maybe be moved into a separate PR that intends to clean up the timing stuff. Also sounds like this would be a lower priority since there is no direct impact on performance.

Fix: some connections happen even before Onyx.init is called. This should not be allowed as the initialKeyStates are applied during init. Added a deferred tasks that would queue incoming connections and only make them after init, this also gives the rest of the app init some breathing space

Can you elaborate on breathing space? Is that another way of saying there was a performance improvement associated with this?

Perf: Update set to check with cache and skip writing to storage. Cache reflects what's already on disc, if we have the exact value in cache it means it didn't change and we don't need to make a write

Slightly confused about this. If we have a value in the cache why would we assume that we don't need to make a write? To clarify, are you proposing we remove the ability to call Onyx.set() and overwrite a value?

Perf: Updated cache clearing strategy. It's a mix of LRU and active connections. The cache works in the following way: each time a connection is terminated we run a cleanCache function. The function removes dated items from cache, if the item is still being used by a component it's kept in cache. By default it will try to keep up to ~150 keys in cache, can be a bit more due to active connections to old items

How is this related to performance? Seems like maybe it will help with memory consumption? I think this would be a good change to evaluate separate from the other changes so we can look at impacts on memory. But let me know if there's some other benefit here.

Perf: Updated defaultKeyStates handling. Merging these keys one by one was taking a significant amount of time, even though they are only 4. Instead of writing them to storage, we read data from storage and merge it along with the defaults to the cache. Then we never allow for the defaultKeys to be removed from cache

This would also be good to see as a separate PR. I'm not sure what the default behavior of this is now, but surprised that it is taking up a significant amount of time since, as you mentioned, there are only 4 keys.

@kidroca
Copy link
Contributor Author

kidroca commented Jul 15, 2021

@marcaaron

Specific Notes / Questions:

The first commit has and updated Onyx.printMetrics function. You can use it as a base to compare before/after results. It prints a general table on the console which is easy to copy and put in excel.

Refactor: merge was refactored so that it's promises are resolved when a queue for a key is done. This helps with measuring the function more accurately

Sounds like these two could maybe be moved into a separate PR that intends to clean up the timing stuff. Also sounds like this would be a lower priority since there is no direct impact on performance.

These are things that would help anyone making benchmarks and using the Onyx.printMetrics() function
If I'm the only one that's using it I'm fine to just have this change live locally and not be committed
But if more people use it, it would be best to merge this first so the same print format can be used on the before/after comparisons

Fix: some connections happen even before Onyx.init is called. This should not be allowed as the initialKeyStates are applied during init. Added a deferred tasks that would queue incoming connections and only make them after init, this also gives the rest of the app init some breathing space

Can you elaborate on breathing space? Is that another way of saying there was a performance improvement associated with this?

Yes, this carries performance improvements

  1. Postponing the connections until after Onyx.init finishes improved loading
  2. Also it gives a better batching opportunity. Since all the connections that happen will wait for init, when that happens they would be batched together (by AsyncStorage) and resolve with one multiGet (if we implement this)

Perf: Update set to check with cache and skip writing to storage. Cache reflects what's already on disc, if we have the exact value in cache it means it didn't change and we don't need to make a write

Slightly confused about this. If we have a value in the cache why would we assume that we don't need to make a write? To clarify, are you proposing we remove the ability to call Onyx.set() and overwrite a value?

Quite a few times we're trying to "overwrite" something that didn't change, the most notable one is modal where I would see 19 consecutive calls to write false and on top of that the value in storage is already false

When we set something we cache the updated value

function set(key, val) {
// Adds the key to cache when it's not available
cache.set(key, val);

Even if this happens via other means like mergeCollection - if cache has a value it's always the most recent up to date value
So when we try to write a value, but we have a cached key and the exact same value in cache we can skip the write
c5a4ba6
I've tested with a log in the condition here and very often we try to write the same value for modal, network, iou, session

Perf: Updated cache clearing strategy. It's a mix of LRU and active connections. The cache works in the following way: each time a connection is terminated we run a cleanCache function. The function removes dated items from cache, if the item is still being used by a component it's kept in cache. By default it will try to keep up to ~150 keys in cache, can be a bit more due to active connections to old items

How is this related to performance? Seems like maybe it will help with memory consumption? I think this would be a good change to evaluate separate from the other changes so we can look at impacts on memory. But let me know if there's some other benefit here.

This helps on a few fronts

  1. it helps with the optimization above - having more items in cache, we have more knowledge of what's already written on disk and prevent unnecessary writes
  2. Switching between the recently used chats happen faster as the information is still in cache

I've disabled clearing cache entirely and it improved speeds, then I came up with this LRU update, which gives pretty much the same performance while still clears cache when it can

Perf: Updated defaultKeyStates handling. Merging these keys one by one was taking a significant amount of time, even though they are only 4. Instead of writing them to storage, we read data from storage and merge it along with the defaults to the cache. Then we never allow for the defaultKeys to be removed from cache

This would also be good to see as a separate PR. I'm not sure what the default behavior of this is now, but surprised that it is taking up a significant amount of time since, as you mentioned, there are only 4 keys.

Yes I was surprised too, but it was a repeating pattern from the numerous benches that I did (Android) - these 4 keys would start fetching at 1.5sec after launch and resolve at the 3rd or 4th second after init - ~2sec to retrieve them
I think it's down to the fact that many thing are running through the bridge at that time
With the update I'm proposing it takes about 0.6sec and only one trip through the bridge


I'm ready to start opening PRs, just need to know whether I should start with the Onyx.printMetrics() changes (1st item in my response) or skip it altogether

@marcaaron
Copy link
Contributor

These are things that would help anyone making benchmarks and using the Onyx.printMetrics() function
If I'm the only one that's using it I'm fine to just have this change live locally and not be committed
But if more people use it, it would be best to merge this first so the same print format can be used on the before/after comparisons

Good point, let's start with that then!

Yes, this carries performance improvements Postponing the connections until after Onyx.init finishes improved loading Also it gives a better batching opportunity. Since all the connections that happen will wait for init, when that happens they would be batched together (by AsyncStorage) and resolve with one multiGet (if we implement this)

Ok I'm not too sure I get what we want to do here yet, but sounds like it should be it's own PR and we can evaluate the impact on app start time?

Quite a few times we're trying to "overwrite" something that didn't change, the most notable one is modal where I would see 19 consecutive calls to write false and on top of that the value in storage is already false

Nice. I agree it doesn't make sense to write data that hasn't changed. The part I was missing from this proposal is how should we tell the difference between the data that has changed and the data that has not changed? Looking at the commit you shared and it seems like we'd use _.isEqual() for that.

This makes sense to me, but it seems like there might be some cases where we know this data has changed and are going to perform an additional check on it now? e.g. merging some large JSON blob would now require that we perform this equality check on it in addition to saving it to storage.

it helps with the optimization above - having more items in cache, we have more knowledge of what's already written on disk and prevent unnecessary writes

Ok so sounds like we can maybe start with that first optimization and then add this one in after?

Yes I was surprised too, but it was a repeating pattern from the numerous benches that I did (Android) - these 4 keys would start fetching at 1.5sec after launch and resolve at the 3rd or 4th second after init - ~2sec to retrieve them I think it's down to the fact that many thing are running through the bridge at that time. With the update I'm proposing it takes about 0.6sec and only one trip through the bridge

Nice. It sounds like maybe this improvement can be combined with the deferred connection logic since they are both focusing on the app start time?

Let me know if you are good with this game plan...

PR1 - Add the new metrics improvements so that we can use them in the benchmarking the rest of the changes
PR2 - Focus on defaultKeyStates handling and postpone connections until after Onyx.init finishes (benchmark app start time)
PR3 - Prevent unnecessary writes when data has not changed by adding _.isEqual()
PR4 - Improve cache clearing logic so that data is retained longer and more unnecessary writes can be prevented

@kidroca
Copy link
Contributor Author

kidroca commented Jul 15, 2021

Let me know if you are good with this game plan...

Yes that works

e.g. merging some large JSON blob would now require that we perform this equality check on it in addition to saving it to storage.

The check happens really fast - we're dealing with objects available in memory, and we're not computing the full diff between the objects the check is over as soon the first difference is found.
Maybe we can find some information on how big a JSON object needs to be in order to negatively affects this, but IMO for data in memory it shouldn't be a problem.
Writing times improved (Android) like 3x for the worst case - 984ms down to 342ms, and 6x for the average case 384ms down to 64ms, but what's better is that we're removing this noise from the bridge channel and other stuff can run faster too
iOS improved as well - from 98ms on average down to 29ms

@marcaaron
Copy link
Contributor

Sounds great, thanks

@kidroca kidroca mentioned this pull request Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants