Optimization: asynchronous writeback #150

sourcejedi · 2015-08-13T14:46:09Z

fsync() after each segment write is suboptimal :). It means you stop (cpu) processing to wait for the physical disk write. And the default segment size is 5MB. (I noticed bup avoids this issue by writing pack files of 1GB by default :).

Improvements will vary depending disk/cpu speed. (I guess the worst case was when they were evenly matched).

Writing 65M on SheevaPlug "NAS" went from 47s to 45s.
920M on desktop HDD (read from SSD) went from 68s to 45s

fsync() after each segment write is suboptimal :). It means you stop (cpu) processing to wait for the physical disk write. And the default segment size is 5MB. (I noticed bup avoids this issue by writing pack files of 1GB by default :). Improvements will vary depending disk/cpu speed (I guess the worst case was when they were evenly matched). Writing 65M on SheevaPlug "NAS" went from 47s to 45s. 920M on desktop HDD (read from SSD) went from 68s to 45s

TypeError: __init__() got an unexpected keyword argument 'daemon'

My beautiful code didn't include flush to start with, and certainly not fadvise. Now it needs a new name and verbose comments. Such is life. No semantic change here. I did move the flush into WritebackThread. However it will happen in exactly the same sequence (and still on the same thread as the writes, which sounds good).

codecov-io · 2015-08-13T14:59:55Z

Current coverage is `84.94%`

Merging #150 into master will increase coverage by +0.03% as of b9cbcdd

@@            master    #150   diff @@
======================================
  Files           28      28       
  Stmts         5003    5052    +49
  Branches         0       0       
  Methods          0       0       
======================================
+ Hit           4248    4291    +43
  Partial          0       0       
- Missed         755     761     +6

Review entire Coverage Diff as of b9cbcdd

Uncovered Suggestions

Powered by Codecov. Updated on successful CI builds.

ThomasWaldmann · 2015-08-13T19:41:56Z

Not sure whether I should / would accept this code.
While it's doing the right thing, it is kind of a fraction what the (still experimental) code in multithreading branch does (see my repo). So maybe rather that code should be refined further / tested more to solve a lot of scheduling problems, not just one.

sourcejedi · 2015-08-13T19:59:08Z

I see your point :). This is clearly incremental. E.g. I don't know whether the code would get refactored when expanding the use of threading, so all the threading code can use the same primitives and style.

I do think the concept is right. E.g. you should still do this if you were writing multiple segment files in parallel, you shouldn't just assume their blocking IO will be fortuitously out-of-sync.

sourcejedi · 2015-08-13T20:27:34Z

Sorry, I meant to also say I will look at your threading code.

ThomasWaldmann · 2015-09-07T22:35:15Z

for master branch (not multithreading), could we get a similar speedup with less code change by going from 5MB to e.g. 50MB?

ThomasWaldmann · 2015-10-02T08:21:53Z

borg/repository.py

+            self.channel.put(None)  # tell thread to shutdown
+
+
+class Channel(object):


hmm, couldn't we just use a normal Queue with length 1 and just signal "task done" when we really have done it?

ThomasWaldmann · 2015-10-02T08:25:16Z

Just as a side note:

I am currently doing a large-scale backup for testing. The fsync somehow slows it down unnecessarily.

So to speed it up, I just removed the fsync in my local version, it's now 2x faster.

ThomasWaldmann · 2016-03-18T00:09:12Z

I am rejecting this for now and for master.

Multithreading code should go into multithreading branch (if not already implemented there) and will need a lot of testing before it gets merged into master.

see borgbackup#150

sourcejedi added 7 commits August 12, 2015 11:52

Make sure we propagate IO errors from async writeback

c31ddee

Python 3.2 compat fix in added code

f5cdca5

TypeError: __init__() got an unexpected keyword argument 'daemon'

Style fixes in added code

5abf53b

Merge branch 'master' into async_writeback

51d55df

Merge branch 'master' into async_writeback

a6b3e6d

sourcejedi force-pushed the async_writeback branch from 63d6130 to f7ff045 Compare August 13, 2015 14:55

ThomasWaldmann added the later label Sep 21, 2015

anarcat mentioned this pull request Sep 30, 2015

use the logging module #74

Closed

ThomasWaldmann reviewed Oct 2, 2015
View reviewed changes

ThomasWaldmann closed this Mar 18, 2016

anarcat added a commit to anarcat/borg that referenced this pull request Apr 3, 2016

add jessie-backports to supported platforms

a40e518

see borgbackup#150

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization: asynchronous writeback #150

Optimization: asynchronous writeback #150

sourcejedi commented Aug 13, 2015

codecov-io commented Aug 13, 2015

ThomasWaldmann commented Aug 13, 2015

sourcejedi commented Aug 13, 2015

sourcejedi commented Aug 13, 2015

ThomasWaldmann commented Sep 7, 2015

ThomasWaldmann Oct 2, 2015

ThomasWaldmann commented Oct 2, 2015

ThomasWaldmann commented Mar 18, 2016

		self.channel.put(None) # tell thread to shutdown


		class Channel(object):

Optimization: asynchronous writeback #150

Optimization: asynchronous writeback #150

Conversation

sourcejedi commented Aug 13, 2015

codecov-io commented Aug 13, 2015

Current coverage is 84.94%

Uncovered Suggestions

ThomasWaldmann commented Aug 13, 2015

sourcejedi commented Aug 13, 2015

sourcejedi commented Aug 13, 2015

ThomasWaldmann commented Sep 7, 2015

ThomasWaldmann Oct 2, 2015

Choose a reason for hiding this comment

ThomasWaldmann commented Oct 2, 2015

ThomasWaldmann commented Mar 18, 2016

Current coverage is `84.94%`