Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for asynchronous dmu operations #10303

Closed
wants to merge 1 commit into from

Conversation

mattmacy
Copy link
Contributor

@mattmacy mattmacy commented May 7, 2020

The only user visible change for existing code is the removal of DMU_READ_NO_PREFETCH
and the replacement of DMU_READ_PREFETCH and DMU_READ_NO_DECRYPT

typedef enum {
       DMU_CTX_FLAG_READ       = 1 << 1,
       DMU_CTX_FLAG_UIO        = 1 << 2,
       DMU_CTX_FLAG_PREFETCH   = 1 << 3,
       DMU_CTX_FLAG_NO_HOLD    = 1 << 4,
       DMU_CTX_FLAG_SUN_PAGES  = 1 << 5,
       DMU_CTX_FLAG_NOFILL     = 1 << 6,
       DMU_CTX_FLAG_ASYNC      = 1 << 7,
       DMU_CTX_FLAG_NODECRYPT  = 1 << 8,
       DMU_CTX_WRITER_FLAGS    = DMU_CTX_FLAG_SUN_PAGES,
       DMU_CTX_READER_FLAGS    = DMU_CTX_FLAG_PREFETCH
} dmu_ctx_flag_t;

There are two new data structures:

  • dmu_ctx_t maintains dmu context during operations
  • dmu_buf_set_t maintains references to dbufs and is used for dbuf read completions.

For most purposes the new functions of interest are:

typedef void (*dmu_ctx_cb_t)(struct dmu_ctx *);
typedef uint64_t (*dmu_buf_transfer_cb_t)(struct dmu_buf_set *, dmu_buf_t *,
    uint64_t, uint64_t);


/* Initialize dmu context */
int dmu_ctx_init(dmu_ctx_t *dc, struct dnode *dn, objset_t *os,
    uint64_t object, uint64_t offset, uint64_t size, void *data_buf, void *tag,
    dmu_ctx_flag_t flags);

/* execute operation */
int dmu_issue(dmu_ctx_t *dc);

/* release local hold on dmu context */
void dmu_ctx_rele(dmu_ctx_t *dc);

/*
 *  Set completion function to be called when _all_ operations have been 
 *  completed.
 */
void dmu_ctx_set_complete_cb(dmu_ctx_t *dc, dmu_ctx_cb_t cb);

/*
 *  Set custom data transfer operation. See zvol, spa_checkpoint, dmu_redact,
 *  and dmu_read_pages for examples
 *  
 */
void dmu_ctx_set_buf_set_transfer_cb(dmu_ctx_t *dc, dmu_buf_set_cb_t cb);

Two helper functions have been added to simplify the most straightforward case.

/* 
* Takes an uninitialized dmu context and callback. Callback will be called when
* read completes.
*/ 
int dmu_read_async(dmu_ctx_t *dc, objset_t *os, uint64_t object,
    uint64_t offset, uint64_t size, void *buf, uint32_t flags,
    dmu_ctx_cb_t done_cb);
/* 
* Takes an uninitialized dmu context and callback. Callback will be called when
* write completes.
*/ 
int dmu_write_async(dmu_ctx_t *dc, objset_t *os, uint64_t object,
    uint64_t offset, uint64_t size, void *buf, dmu_tx_t *tx,
    dmu_ctx_cb_t done_cb);

These two functions are used to exercise the async completion logic in ztest.

Read and write in zvol on both FreeBSD and Linux no longer require a dedicated thread for async semantics. This could be later extended to support delete.

Platforms with VFS interfaces that accept completion routines can use the new interfaces to eliminate the need for a dedicated kernel thread to support posix AIO on ZFS.

Some further implementation details:
Internally, dmu_buf_set_t is used for managing state and tracking completions. The dbs_holds field is used as a count of pending read operations that need to be completed before the data transfer callback can be executed. The completions are handled by dmu_buf_set_rele when dbs_holds goes to zero it calls the data transfer callback. The initial set up for this occurs in dbuf_hold_impl. When dbuf_hold_impl is called in dmu_buf_set_setup_buffers, we now also pass it a dmu_buf_set_t. If the dbuf in question is not DB_CACHED it will add it to a list attached to the dbuf for dmu_buf_set_rele to be called at read completion time, otherwise it release the hold immediately.

As far as this change goes, only L0 buffer reads are asynchronous. dmu_ctx_setup_tx and dbuf_hold_{...} still issue synchronous reads. In #10317 I extended this to restart dmu_issue calls in dbuf_read completions if dbuf_hold_level_async (new wrapper for dbuf_hold_impl) would block. It would be straightforward to do the same for dmu_ctx_setup_tx in a future PR.

Authored-by: Will Andrews [email protected]
Co-authored-by: Matt Macy [email protected]
Signed-off-by: Matt Macy [email protected]

Motivation and Context

Description

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

@mattmacy mattmacy force-pushed the projects/async_dmu-pr branch 2 times, most recently from b9b53bb to 1610845 Compare May 8, 2020 02:31
@codecov-io
Copy link

codecov-io commented May 8, 2020

Codecov Report

Merging #10303 into master will decrease coverage by 0.08%.
The diff coverage is 87.02%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10303      +/-   ##
==========================================
- Coverage   79.67%   79.58%   -0.09%     
==========================================
  Files         390      390              
  Lines      123336   123867     +531     
==========================================
+ Hits        98263    98580     +317     
- Misses      25073    25287     +214     
Flag Coverage Δ
#kernel 79.82% <86.82%> (-0.35%) ⬇️
#user 66.22% <75.11%> (+0.09%) ⬆️
Impacted Files Coverage Δ
include/sys/dbuf.h 100.00% <ø> (ø)
module/os/linux/zfs/zfs_acl.c 57.44% <ø> (ø)
module/os/linux/zfs/zfs_vnops.c 69.79% <ø> (+0.05%) ⬆️
module/zfs/bpobj.c 86.86% <ø> (-4.29%) ⬇️
module/zfs/bptree.c 92.70% <ø> (ø)
module/zfs/dsl_bookmark.c 88.80% <ø> (ø)
module/zfs/space_map.c 93.06% <ø> (-0.25%) ⬇️
module/zfs/vdev.c 90.36% <ø> (+0.10%) ⬆️
module/zfs/vdev_indirect_births.c 89.53% <ø> (ø)
module/zfs/vdev_indirect_mapping.c 99.03% <ø> (ø)
... and 84 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7fcf824...378e6a1. Read the comment docs.

@mattmacy mattmacy force-pushed the projects/async_dmu-pr branch 17 times, most recently from 474c4de to c730c91 Compare May 10, 2020 23:16
@mattmacy mattmacy changed the title add preliminary async dmu support add support for asynchronous dmu operations May 11, 2020
@amotin
Copy link
Member

amotin commented May 11, 2020

Good to see it resurrected after so many years. I am still reviewing it.

@mattmacy mattmacy force-pushed the projects/async_dmu-pr branch 2 times, most recently from 378e6a1 to 916bff6 Compare May 11, 2020 08:04
@mattmacy mattmacy force-pushed the projects/async_dmu-pr branch 2 times, most recently from 1ec0c79 to ba8a896 Compare May 13, 2020 06:28
while (n && uio->uio_resid) {
cnt = MIN(iov->iov_len, n);
switch (uio->uio_segflg) {
case UIO_SYSSPACE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just have ASSERT(uio->uio_segflag == UIO_SYSSPACE)?

mutex_exit(&tq->tq_lock);
tq->tq_dtor(tq);
mutex_enter(&tq->tq_lock);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason, since this is all new code and the dtor callback would be new, to not simply have it deal with any locking if it needs to?

@kithrup
Copy link
Contributor

kithrup commented May 15, 2020

Other than my questions above, it looks ok to me.

@mattmacy mattmacy force-pushed the projects/async_dmu-pr branch from ba8a896 to b1dcf7f Compare May 18, 2020 23:01
@codecov-commenter
Copy link

codecov-commenter commented May 19, 2020

Codecov Report

Merging #10303 into master will decrease coverage by 8.25%.
The diff coverage is 81.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10303      +/-   ##
==========================================
- Coverage   79.24%   70.99%   -8.26%     
==========================================
  Files         390      387       -3     
  Lines      123336   123455     +119     
==========================================
- Hits        97737    87643   -10094     
- Misses      25599    35812   +10213     
Flag Coverage Δ
#kernel 70.92% <80.72%> (-8.99%) ⬇️
#user 60.09% <74.36%> (-4.08%) ⬇️
Impacted Files Coverage Δ
include/sys/dbuf.h 100.00% <ø> (ø)
module/os/linux/zfs/zfs_acl.c 54.81% <ø> (-2.63%) ⬇️
module/os/linux/zfs/zfs_vnops.c 51.83% <ø> (-18.07%) ⬇️
module/zfs/bpobj.c 75.87% <ø> (-15.29%) ⬇️
module/zfs/bptree.c 88.54% <ø> (-4.17%) ⬇️
module/zfs/dmu_recv.c 58.77% <0.00%> (-17.62%) ⬇️
module/zfs/dsl_bookmark.c 75.03% <ø> (-13.77%) ⬇️
module/zfs/spa_checkpoint.c 30.23% <0.00%> (-63.56%) ⬇️
module/zfs/spa_history.c 85.97% <0.00%> (-7.70%) ⬇️
module/zfs/space_map.c 75.24% <ø> (-18.07%) ⬇️
... and 194 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b29e31d...b1dcf7f. Read the comment docs.

@mattmacy mattmacy closed this May 27, 2020
@mattmacy mattmacy mentioned this pull request May 27, 2020
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants