-
-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization: asynchronous writeback #150
Conversation
fsync() after each segment write is suboptimal :). It means you stop (cpu) processing to wait for the physical disk write. And the default segment size is 5MB. (I noticed bup avoids this issue by writing pack files of 1GB by default :). Improvements will vary depending disk/cpu speed (I guess the worst case was when they were evenly matched). Writing 65M on SheevaPlug "NAS" went from 47s to 45s. 920M on desktop HDD (read from SSD) went from 68s to 45s
TypeError: __init__() got an unexpected keyword argument 'daemon'
My beautiful code didn't include flush to start with, and certainly not fadvise. Now it needs a new name and verbose comments. Such is life. No semantic change here. I did move the flush into WritebackThread. However it will happen in exactly the same sequence (and still on the same thread as the writes, which sounds good).
63d6130
to
f7ff045
Compare
Current coverage is
|
Not sure whether I should / would accept this code. |
I see your point :). This is clearly incremental. E.g. I don't know whether the code would get refactored when expanding the use of threading, so all the threading code can use the same primitives and style. I do think the concept is right. E.g. you should still do this if you were writing multiple segment files in parallel, you shouldn't just assume their blocking IO will be fortuitously out-of-sync. |
Sorry, I meant to also say I will look at your threading code. |
for master branch (not multithreading), could we get a similar speedup with less code change by going from 5MB to e.g. 50MB? |
self.channel.put(None) # tell thread to shutdown | ||
|
||
|
||
class Channel(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, couldn't we just use a normal Queue with length 1 and just signal "task done" when we really have done it?
Just as a side note: I am currently doing a large-scale backup for testing. The fsync somehow slows it down unnecessarily. So to speed it up, I just removed the fsync in my local version, it's now 2x faster. |
I am rejecting this for now and for master. Multithreading code should go into multithreading branch (if not already implemented there) and will need a lot of testing before it gets merged into master. |
fsync() after each segment write is suboptimal :). It means you stop (cpu) processing to wait for the physical disk write. And the default segment size is 5MB. (I noticed bup avoids this issue by writing pack files of 1GB by default :).
Improvements will vary depending disk/cpu speed. (I guess the worst case was when they were evenly matched).