-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
record restic backup progress in PodVolumeBackup #1821
Conversation
Still WIP, but wanted some feedback on the approach I'm taking here with the callback (updateFunc) function to update the progress from outside the controller. https://github.com/fusor/velero/pull/4/files implemented the RunBackup function in the controller package which avoids the need for the callback, but I'm thinking that it will be more easily testable this way, albeit a bit more confusing? |
dcce611
to
e8a6a99
Compare
|
||
// RunBackup runs a `restic backup` command and watches the output to provide | ||
// progress updates to the caller. | ||
func RunBackup(backupCmd *Command, log logrus.FieldLogger, updateFunc func(velerov1api.PodVolumeOperationProgress)) (string, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation largely follows https://github.com/fusor/velero/pull/4/files. The main differences are:
- we keep track of the number of bytes we've read so we can skip lines we've read when scanning through
- we quit the progress checking goroutine as soon as the backup command finishes to avoid it having to wait for another cycle to find that the backup is 100% complete (a ticker is used instead of a sleep so we don't block for the quit signal). The controller takes care of updating the progress to 100% once the backup has completed successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue with this is that if a backup completes too quickly, the progress will never get set. This can easily happen for smaller volumes or if only a few files have changed in an incremental backup. I'm now thinking that this function should call updateFunc
one last time with the final progress before returning, and we don't need to set the progress to 100% in the controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approach LGTM - one question so far
Signed-off-by: Adnan Abdulhussein <[email protected]>
I've been trying a different approach here in order to address #1821 (comment), but I'm seeing something strange. The
I'm trying to read this line after the command has finished to extract Might need some extra eyes on this next week @skriss. |
Just ran a {"message_type":"status","seconds_elapsed":9,"percent_done":1,"total_files":428,"files_done":428,"total_bytes":2168012,"bytes_done":2168012}
{"message_type":"status","seconds_elapsed":10,"percent_done":1,"total_files":428,"files_done":428,"total_bytes":2168012,"bytes_done":2168012}
{"message_type":"status","seconds_elapsed":11,"percent_done":1,"total_files":428,"files_done":428,"total_bytes":2168012,"bytes_done":2168012}
{"message_type":"summary","files_new":428,"files_changed":0,"files_unmodified":0,"dirs_new":0,"dirs_changed":0,"dirs_unmodified":0,"data_blobs":428,"tree_blobs":1,"data_added":2168388,"total_files_processed":428,"total_bytes_processed":2168012,"total_duration":11.171735739,"snapshot_id":"4e9479ef"}
{"message_type":"summary","files_new":428,"files_changed":0,"files_unmodified":0,"dirs_new":0,"dirs_changed":0,"dirs_unmodified":0,"data_blobs":428,"tree_blobs":1,"data_added":2168388,"total_files_processed":428,"total_bytes_processed":2168012,"total_duration":11.171735739,"snapshot_id":"4e9479ef"}
{"message_type":"status","seconds_elapsed":11,"percent_done":1,"total_files":428,"files_done":428,"total_bytes":2168012,"bytes_done":2168012} So duplicate "summary" lines, and a "status" line after the "summary" ones. Seems like there's probably some bugs in the status reporter in restic. |
Okay I've filed restic/restic#2389 upstream to see if we can get help on this issue. In the meantime, we can workaround this by looking for the summary line instead of just trying to get the last line. |
Signed-off-by: Adnan Abdulhussein <[email protected]>
e8a6a99
to
8e6cc51
Compare
Signed-off-by: Adnan Abdulhussein <[email protected]>
This is now ready for review, I'll work on the restore progress and CLI changes in a separate PR |
Signed-off-by: Adnan Abdulhussein <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -188,7 +188,8 @@ func describePodVolumeRestores(d *Describer, restores []v1.PodVolumeRestore, det | |||
restoresByPod := new(volumesByPod) | |||
|
|||
for _, restore := range restoresByPhase[phase] { | |||
restoresByPod.Add(restore.Spec.Pod.Namespace, restore.Spec.Pod.Name, restore.Spec.Volume) | |||
// TODO(adnan): replace last parameter with progress from status (#1749) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will address this in the PR for restore progress
Looks good on a quick pass - I'll do some testing and another pass later on today. |
Signed-off-by: Adnan Abdulhussein <[email protected]>
Signed-off-by: Adnan Abdulhussein <[email protected]>
pkg/restic/exec_commands.go
Outdated
|
||
cmd.Wait() | ||
close(quit) | ||
wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this wait group, do we? I think the "quit" logic could be simplified - after L120, call ticker.Stop()
(you'll need to move ticker
's declaration to outside the goroutine), then close(quit)
. The select
case that receives from quit
can just return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's mostly left over from the additional goroutines we had for copying stdout/stderr. One issue is that if we're in the middle of the ticker.C case and it takes some time to complete, then we might call updateFunc
in the goroutine with BytesDone < TotalBytes
after our call to it at the end of this function. If we wait for the goroutine to finish, we know this will definitely not happen.
That said, the stuff in the ticker.C
case should be pretty fast for backup, for restore I think we'll need the waitgroup since calculating the size of the volume can take an indeterminate amount of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's a good point. I guess the other way you could do this is to send on the quit
channel rather than closing it, since that would block until it's received from, but I'm fine with this as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skriss ah good point, I didn't think of that. I like the idea of removing the unnecessary waitgroup, so will update this to do that - thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Adnan Abdulhussein <[email protected]>
Signed-off-by: Adnan Abdulhussein <[email protected]>
Signed-off-by: Adnan Abdulhussein <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 great comments, and tests.
This PR modifies the way we run
restic backup
to include reporting the progress of the backup.Signed-off-by: Adnan Abdulhussein [email protected]