Skip to content

Commit

Permalink
ocfs2: clear zero in unaligned direct IO
Browse files Browse the repository at this point in the history
Unused portion of a part-written fs-block-sized block is not set to zero
in unaligned append direct write.This can lead to serious data
inconsistencies.

Ocfs2 manage disk with cluster size(for example, 1M), part-written in one
cluster will change the cluster state from UN-WRITTEN to WRITTEN,
VFS(function dio_zero_block) doesn't do the cleaning because bh's state is
not set to NEW in function ocfs2_dio_wr_get_block when we write a WRITTEN
cluster.  For example, the cluster size is 1M, file size is 8k and we
direct write from 14k to 15k, then 12k~14k and 15k~16k will contain dirty
data.

We have to deal with two cases:
1.The starting position of direct write is outside the file.
2.The starting position of direct write is located in the file.

We need set bh's state to NEW in the first case.  In the second case, we
need mapped twice because bh's state of area out file should be set to NEW
while area in file not.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Jia Guo <[email protected]>
Reviewed-by: Yiwen Jiang <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Joseph Qi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
  • Loading branch information
guojia1992 authored and broonie committed Sep 26, 2019
1 parent 9238db8 commit a35cd61
Showing 1 changed file with 21 additions and 1 deletion.
22 changes: 21 additions & 1 deletion fs/ocfs2/aops.c
Original file line number Diff line number Diff line change
Expand Up @@ -2146,13 +2146,30 @@ static int ocfs2_dio_wr_get_block(struct inode *inode, sector_t iblock,
struct ocfs2_dio_write_ctxt *dwc = NULL;
struct buffer_head *di_bh = NULL;
u64 p_blkno;
loff_t pos = iblock << inode->i_sb->s_blocksize_bits;
unsigned i_blkbits = inode->i_sb->s_blocksize_bits;
loff_t pos = iblock << i_blkbits;
sector_t endblk = (i_size_read(inode) - 1) >> i_blkbits;
unsigned len, total_len = bh_result->b_size;
int ret = 0, first_get_block = 0;

len = osb->s_clustersize - (pos & (osb->s_clustersize - 1));
len = min(total_len, len);

/*
* bh_result->b_size is count in get_more_blocks according to write
* "pos" and "end", we need map twice to return different buffer state:
* 1. area in file size, not set NEW;
* 2. area out file size, set NEW.
*
* iblock endblk
* |--------|---------|---------|---------
* |<-------area in file------->|
*/

if ((iblock <= endblk) &&
((iblock + ((len - 1) >> i_blkbits)) > endblk))
len = (endblk - iblock + 1) << i_blkbits;

mlog(0, "get block of %lu at %llu:%u req %u\n",
inode->i_ino, pos, len, total_len);

Expand Down Expand Up @@ -2236,6 +2253,9 @@ static int ocfs2_dio_wr_get_block(struct inode *inode, sector_t iblock,
if (desc->c_needs_zero)
set_buffer_new(bh_result);

if (iblock > endblk)
set_buffer_new(bh_result);

/* May sleep in end_io. It should not happen in a irq context. So defer
* it to dio work queue. */
set_buffer_defer_completion(bh_result);
Expand Down

0 comments on commit a35cd61

Please sign in to comment.