Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: asynchronous writeback #150

Closed
wants to merge 7 commits into from

Conversation

sourcejedi
Copy link
Contributor

fsync() after each segment write is suboptimal :). It means you stop (cpu) processing to wait for the physical disk write. And the default segment size is 5MB. (I noticed bup avoids this issue by writing pack files of 1GB by default :).

Improvements will vary depending disk/cpu speed. (I guess the worst case was when they were evenly matched).

  • Writing 65M on SheevaPlug "NAS" went from 47s to 45s.
  • 920M on desktop HDD (read from SSD) went from 68s to 45s

fsync() after each segment write is suboptimal :).  It means
you stop (cpu) processing to wait for the physical disk write.
And the default segment size is 5MB. (I noticed bup avoids this
issue by writing pack files of 1GB by default :).

Improvements will vary depending disk/cpu speed
(I guess the worst case was when they were evenly matched).

Writing 65M on SheevaPlug "NAS" went from 47s to 45s.
920M on desktop HDD (read from SSD) went from 68s to 45s
TypeError: __init__() got an unexpected keyword argument 'daemon'
My beautiful code didn't include flush to start with, and certainly not
fadvise.  Now it needs a new name and verbose comments.  Such is life.

No semantic change here.  I did move the flush into WritebackThread.
However it will happen in exactly the same sequence (and still on the same
thread as the writes, which sounds good).
@codecov-io
Copy link

Current coverage is 84.94%

Merging #150 into master will increase coverage by +0.03% as of b9cbcdd

@@            master    #150   diff @@
======================================
  Files           28      28       
  Stmts         5003    5052    +49
  Branches         0       0       
  Methods          0       0       
======================================
+ Hit           4248    4291    +43
  Partial          0       0       
- Missed         755     761     +6

Review entire Coverage Diff as of b9cbcdd


Uncovered Suggestions

  1. +0.83% via borg/fuse.py#124...165
  2. +0.67% via borg/xattr.py#199...232
  3. +0.57% via borg/xattr.py#166...194
  4. See 7 more...

Powered by Codecov. Updated on successful CI builds.

@ThomasWaldmann
Copy link
Member

Not sure whether I should / would accept this code.
While it's doing the right thing, it is kind of a fraction what the (still experimental) code in multithreading branch does (see my repo). So maybe rather that code should be refined further / tested more to solve a lot of scheduling problems, not just one.

@sourcejedi
Copy link
Contributor Author

I see your point :). This is clearly incremental. E.g. I don't know whether the code would get refactored when expanding the use of threading, so all the threading code can use the same primitives and style.

I do think the concept is right. E.g. you should still do this if you were writing multiple segment files in parallel, you shouldn't just assume their blocking IO will be fortuitously out-of-sync.

@sourcejedi
Copy link
Contributor Author

Sorry, I meant to also say I will look at your threading code.

@ThomasWaldmann
Copy link
Member

for master branch (not multithreading), could we get a similar speedup with less code change by going from 5MB to e.g. 50MB?

self.channel.put(None) # tell thread to shutdown


class Channel(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, couldn't we just use a normal Queue with length 1 and just signal "task done" when we really have done it?

@ThomasWaldmann
Copy link
Member

Just as a side note:

I am currently doing a large-scale backup for testing. The fsync somehow slows it down unnecessarily.

So to speed it up, I just removed the fsync in my local version, it's now 2x faster.

@ThomasWaldmann
Copy link
Member

I am rejecting this for now and for master.

Multithreading code should go into multithreading branch (if not already implemented there) and will need a lot of testing before it gets merged into master.

anarcat added a commit to anarcat/borg that referenced this pull request Apr 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants