-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the cold startup of RaftEngine when enable-log-recycle
== true
#278
Conversation
Details: This commit optimize the startup of Engine when enable-log-recycle == true, by preparing a bunch of logs named with special suffix ".fakelog", so-called "Fake logs", to curtail the side effect when there is no enough stale files for log recycling. Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
Codecov ReportBase: 97.74% // Head: 97.73% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #278 +/- ##
==========================================
- Coverage 97.74% 97.73% -0.01%
==========================================
Files 30 30
Lines 11287 11411 +124
==========================================
+ Hits 11032 11153 +121
- Misses 255 258 +3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Signed-off-by: Lucasliang <[email protected]>
@tabokie PTAL, thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing multiple log queues is too ugly. I'm thinking something like this:
// builder.rs
let suffix = "append.reserved";
let reserved_files = ReservedFiles::scan_from_path(fs, path, suffix)?;
if capacity > active_files.len() + reserved_files.len() {
reserved_files.generate_some(..);
}
let pipe = SinglePipe::build(reserved_files, active_files);
// pipe.rs
fn purge() {
for f in purged_files {
if f.format == V2 {
self.reserved_files.add(f.handle);
}
}
self.reserved_files.truncate(self.capacity - self.active_files.len());
}
…en starting in cold. Signed-off-by: Lucasliang <[email protected]>
@tabokie PTAL, thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned before, keeping obsolete files across restart (i.e. do not delete them when Pipe is dropped) is of higher priority. Which means all obsolete files (not just "fake" logs) will be managed by reserved_files
. They are renamed (this detail is preferably put inside ReservedFiles
) before retired so that we won't waste time recovering them later.
…ion into ActiveFileCollection & StaleFileCollection. Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
@tabokie PTAL, thx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this change is so difficult for me to review... Maybe part of it has to do with the interactions between structures are not properly encapsulated. E.g. prepare_dummy_logs_for_recycle
with so many arguments, concat
that accepts files without checking if they have the proper names.
To improve, I think you can just keep thing even simpler, make FileCollection
as simple as possible (as simple as a Vec):
pub struct FileList { seq, files };
impl FileList {
fn append(&mut self, rhs: &mut FileList ) { ... }
// Different than `VecDeque`, this function takes seqno as splitting point, and returns the first half.
fn split_front(&mut self, seq: u64) -> Self {}
// Caller must guarantee the file has the right seqno.
fn push_back(&mut self, file) {..}
fn pop_front(&mut self) -> Option<FileWithFormat> {...};
fn span(&self) -> (usize, usize);
fn len(&self) -> usize;
fn get(&self, seq: usize) -> Option<&FileWithFormat> {...}
}
Then all the complicated logic can be put in-place, or wrapped into a bigger struct:
pub struct FileCollection { capacity, active_files: Mutex<FileList>, stale_files: Mutex<FileList>, fs }
impl FileCollection {
fn rotate(&self) -> FileHandle {
let f = self.stale_files.pop_front().or_else(|| fs.create(self.active_files.span().1 + 1));
self.active_files.lock().push_back(f);
f.handle
}
fn purge(&self, seq: usize) {
let stale_files = self.active_files.split_front(seq);
for f in stale_files {
if f.version > 1 {
self.fs.reuse(f, new_stale_file_name(self.stale_files.span().1 + 1));
self.stale_files.lock().push_back(f);
}
}
}
fn initialize(&mut self) {
for _ in self.stale_files.len()..capacity {
let f = self.fs.create(new_stale_file_name(self.stale_files.span().1 + 1));
self.stale_files.push_back(f);
}
}
fn get(&self, seq: usize) -> Option<FileHandle> {
self.active_files.lock().get(seq)
}
}
Thx for your advices. struct FileCollection {
stale_files: FileList<F>,
active_files: FileList<F>,
...
} is that: And the confusing parts, such as |
@LykxSassinator It doesn't contradict with that I said.
The code snippet I wrote refers to the way to "wrapped into a bigger struct". It can be done as "put in-place" as well. |
Yes, i got what u meaned. I will polish the current codes. |
+ Generalize previous StaleFileCollection and ActiveFileCollection by FileList; + Encapsulate all operations to file into FileCollection. Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
ping @tabokie PTAL, thx |
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
ping @tabokie PTAL, thx. |
Signed-off-by: tabokie <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the liberty of cleaning up the abstraction myself, it grows to be, somewhat superfluous IMO.
And by the way, #278 (comment) is not addressed. When you use &mut self
, there's no need to lock again, because mutable reference guarantees exclusive access.
Thx for your suggestions. I'll take some time to comprehend the newly introduced refactoring works. |
Signed-off-by: Lucasliang <[email protected]>
src/file_pipe_log/pipe_builder.rs
Outdated
@@ -429,10 +429,13 @@ impl<F: FileSystem> DualPipesBuilder<F> { | |||
fn initialize_files(&mut self) -> Result<()> { | |||
let now = Instant::now(); | |||
let target_file_size = self.cfg.target_file_size.0 as usize; | |||
let mut target = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return early if !prefill_for_recycle
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not recommended.
When !prefill_for_recycle
, it should check whether there still exists remained recycled_files
and clear them if exists. And by the way, the later processing when !prefill_for_recycle
just do several checking jobs, not costing much on CPU or other resources.
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
ping @tabokie PTAL, thx. |
src/file_pipe_log/pipe.rs
Outdated
seq, | ||
let (len, purged_files) = { | ||
let mut files = self.active_files.write(); | ||
let off = (file_seq - files[0].seq) as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file_seq < files[0].seq
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @LykxSassinator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch has already been tested in test_pipe_log
:
...
// cannot purge active file
assert!(pipe_log.purge_to(FileId { queue, seq: 4 }).is_err());
...
And more boundary test cases have been supplemented into test_pipe_log_with_recycle
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this in mind. Line coverage can only be useful when there's something not covered. What's covered is far from enough and cannot be used as a proof of software quality. (MC/DC coverage, on the other hand, is a much better criterion.) Whenever there's a bug, it's vital to keep the condition covered before fixing it.
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs changelog under "New Features", rest looks good.
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
ping @tabokie, thx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//
src/file_pipe_log/pipe.rs
Outdated
seq, | ||
let (len, purged_files) = { | ||
let mut files = self.active_files.write(); | ||
let off = (file_seq - files[0].seq) as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping again.
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
Signed-off-by: Lucasliang <[email protected]>
…cle` == true (tikv#278)" This reverts commit 75d6d6e. Signed-off-by: tabokie <[email protected]>
Discription
This commit optimize the startup of Engine when
enable-log-recycle
== true, by preparing a bunch of logs named with special suffix ".fakelog", so-called "Fake logs", to curtail the side effect when there is no enough stale files for log recycling.Related Issue: close #277
Signed-off-by: Lucasliang [email protected]