-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a crash when attempting to read a block pointer with no valid DVAs #17078
base: master
Are you sure you want to change the base?
Conversation
If a block pointer isn't embedded yet doesn't have any valid DVAs, that's a data corruption bug. zdb should be able to handle the situation gracefully. Issue openzfs#17077 Sponsored by: ConnectWise Signed-off-by: Alan Somers <[email protected]>
Now instead of crashing when attempting to read the corrupt block pointer, ZFS will return ECKSUM, in a stack that looks like this: ``` none:set-error zfs.ko`arc_read+0x1d82 zfs.ko`dbuf_read+0xa8c zfs.ko`dmu_buf_hold_array_by_dnode+0x292 zfs.ko`dmu_read_uio_dnode+0x47 zfs.ko`zfs_read+0x2d5 zfs.ko`zfs_freebsd_read+0x7b kernel`VOP_READ_APV+0xd0 kernel`vn_read+0x20e kernel`vn_io_fault_doio+0x45 kernel`vn_io_fault1+0x15e kernel`vn_io_fault+0x150 kernel`dofileread+0x80 kernel`sys_read+0xb7 kernel`amd64_syscall+0x424 kernel`0xffffffff810633cb ``` This patch should hopefully also prevent such corrupt block pointers from being written to disk in the first place. Fixes openzfs#17077 Sponsored by: ConnectWise Signed-off-by: Alan Somers <[email protected]>
@@ -695,7 +694,8 @@ typedef struct blkptr { | |||
BP_GET_BYTEORDER(bp) == 0 ? "BE" : "LE", \ | |||
BP_IS_GANG(bp) ? "gang" : "contiguous", \ | |||
BP_GET_DEDUP(bp) ? "dedup" : "unique", \ | |||
copyname[copies], \ | |||
copies >= 0 && copies <= 3 ? copyname[copies] : \ | |||
"invalid", \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks confusing. I suppose we can get out of 0-3 range only due to a bug in this code itself.
module/zfs/zio.c
Outdated
@@ -1213,6 +1213,11 @@ zfs_blkptr_verify(spa_t *spa, const blkptr_t *bp, | |||
} | |||
return (errors == 0); | |||
} | |||
if (unlikely(!BP_IS_EMBEDDED(bp) && !BP_IS_HOLE(bp) && | |||
BP_GET_NDVAS(bp) <= 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!BP_IS_EMBEDDED()
check here is not needed, since it is checked above. But the other two checks seem to not cover all possibilities, for example, if second DVA is not empty, but has zero asize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what circumstances can a DVA legitimately be non empty but with zero asizse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably can't, but your code won't notice it if at least one DVA is valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying that my code will reject the BP if, for example, DVA 0 has no asize but DVA1 does have an asize? That's deliberate. Look for example at vdev_mirror_map_init
. It also uses the BP_GET_NDVAS
function, so it will ignore DVA1 if DVA0 is invalid. I suspect that many other places do the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we are on the same sheet still. I am saying that your code will report as valid a block pointer with zero asize in DVA0, but non-zero in DVA1. From your comment about BP_GET_NDVAS()
used in vdev_mirror_map_init()
seems like it should be considered invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so the opposite of what I thought you meant.
Auditing all callers of BP_GET_NDVAS finds that they all act like this: they expect DVAs to always be assigned in order. That is, dva[0] will always be valid if dva[1] is, and dva[1] will always be valid if dva[2] is. So maybe we should change BP_GET_NDVAS globally? Something like this:
#define BP_GET_NDVAS(bp) \
(BP_IS_EMBEDDED(bp) ? 0 : \
!DVA_GET_ASIZE(&(bp)->blk_dva[0]) ? 0 : \
!DVA_GET_ASIZE(&(bp)->blk_dva[1]) ? 1 : \
BP_IS_ENCRYPTED(bp) ? 2 : \
!DVA_GET_ASIZE(&(bp)->blk_dva[2]) ? 2 : 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also wondering why it is not so. So may be. But I think it is orthogonal to the zfs_blkptr_verify()
, which could be even more strict.
Every non-embedded, non-hole BP should have at least one valid DVA, and it must be the first one. Signed-off-by: Alan Somers <[email protected]>
This saves us the trouble of acquring the spa config lock, only to do nothing with it. Signed-off-by: Alan Somers <[email protected]>
Motivation and Context
Somehow ZFS wrote to my pool a block pointer that is neither embedded nor a hole, yet contains no DVAs with a non-zero asize. That should be impossible, but it happened. This PR will cause ZFS to return
ECKSUM
when attempting to read from that block pointer, rather than crashing. This PR will also fix zdb so it can display such block pointers. Finally, I believe that this PR will prevent such block pointers from being written to disk in the first place, by triggering a panic inzfs_blkptr_verify
in the write path.Description
zfs_blkptr_verify
to ensure that every block pointer is either embedded, hole, or has at least one valid DVA.How Has This Been Tested?
Tested on FreeBSD 15.0-CURRENT by trying to read from a suitably corrupted dataset.
Types of changes
Checklist:
Signed-off-by
.