Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AsyncIO support for tuning readahead_size by block cache lookup #11936

Closed

Conversation

akankshamahajan15
Copy link
Contributor

@akankshamahajan15 akankshamahajan15 commented Oct 9, 2023

Summary: Add support for tuning of readahead_size by block cache lookup for async_io.

Design/ Implementation -

BlockBasedTableIterator.cc -

BlockCacheLookupForReadAheadSize callback API lookups in the block cache and tries to reduce the start
and end offset passed. This function looks into the block cache for the blocks between start_offset
and end_offset and add all the handles in the queue.

It then iterates from the end in the handles to find first miss block and update the end offset to that block.
It also iterates from the start and find first miss block and update the start offset to that block.

_read_curr_block_ argument : True if this call was due to miss in the cache and caller wants to read that block 
                             synchronously.
                             False if current call is to prefetch additional data in extra buffers 
                            (due to ReadAsync call in FilePrefetchBuffer)

In case there is no data to be read in that callback (because of upper_bound or all blocks are in cache),
it updates start and end offset to be equal and that FilePrefetchBuffer interprets that as 0 length to be read.

FilePrefetchBuffer.cc -

FilePrefetchBuffer calls the callback - ReadAheadSizeTuning and pass the start and end offset to that
callback to get updated start and end offset to read based on cache hits/misses.

  1. In case of Read calls (when offset passed to FilePrefetchBuffer is on cache miss and that data needs to be read), read_curr_block is passed true.
  2. In case of ReadAsync calls, when buffer is all consumed and can go for additional prefetching, the start offset passed is the initial end offset of prev buffer (without any updated offset based on cache hit/miss).

Foreg. if following are the data blocks with cache hit/miss and start offset
and Read API found miss on DB1 and based on readahead_size (50) it passes end offset to be 50.
[DB1 - miss- 0 ] [DB2 - hit -10] [DB3 - miss -20] [DB4 - miss-30] [DB5 - hit-40]
[DB6 - hit-50] [DB7 - miss-60] [DB8 - miss - 70] [DB9 - hit - 80] [DB6 - hit 90]

  • For Read call - updated start offset remains 0 but end offset updates to DB4, as DB5 is in cache.

  • Read calls saves initial end offset 50 as that was meant to be prefetched.

  • Now for next ReadAsync call - the start offset will be 50 (previous buffer initial end offset) and based on readahead_size, end offset will be 100

  • On callback, because of cache hits - callback will update the start offset to 60 and end offset to 80 to read only 2 data blocks (DB7 and DB8).

  • And for that ReadAsync call - initial end offset will be set to 100 which will again used by next ReadAsync call as start offset.

  • initial_end_offset_ in BufferInfo is used to save the initial end offset of that buffer.

  • If let's say DB5 and DB6 overlaps in 2 buffers (because of alignment), prev_buf_end_offset is passed to make sure already prefetched data is not prefetched again in second buffer.

Test Plan:

  • Ran crash_test several times.
  • New unit tests added.

Reviewers:

Subscribers:

Tasks:

Tags:

@akankshamahajan15 akankshamahajan15 force-pushed the async_support branch 4 times, most recently from 429aeff to 710a47b Compare October 24, 2023 17:28
@akankshamahajan15 akankshamahajan15 marked this pull request as ready for review October 24, 2023 17:33
@akankshamahajan15 akankshamahajan15 changed the title [WIP] Add AsyncIO support for tuning readahead_size by block cache lookup Add AsyncIO support for tuning readahead_size by block cache lookup Oct 24, 2023
@akankshamahajan15 akankshamahajan15 force-pushed the async_support branch 7 times, most recently from 6725798 to 7a31e73 Compare November 1, 2023 20:41
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A couple of minor comments.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 merged this pull request in c77b50a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants