record restic backup progress in PodVolumeBackup #1821

prydonius · 2019-08-29T00:42:58Z

This PR modifies the way we run restic backup to include reporting the progress of the backup.

Signed-off-by: Adnan Abdulhussein [email protected]

prydonius · 2019-08-29T00:45:57Z

Still WIP, but wanted some feedback on the approach I'm taking here with the callback (updateFunc) function to update the progress from outside the controller. https://github.com/fusor/velero/pull/4/files implemented the RunBackup function in the controller package which avoids the need for the callback, but I'm thinking that it will be more easily testable this way, albeit a bit more confusing?

prydonius · 2019-08-29T01:00:47Z

pkg/restic/exec_commands.go

+
+// RunBackup runs a `restic backup` command and watches the output to provide
+// progress updates to the caller.
+func RunBackup(backupCmd *Command, log logrus.FieldLogger, updateFunc func(velerov1api.PodVolumeOperationProgress)) (string, string, error) {


This implementation largely follows https://github.com/fusor/velero/pull/4/files. The main differences are:

we keep track of the number of bytes we've read so we can skip lines we've read when scanning through

we quit the progress checking goroutine as soon as the backup command finishes to avoid it having to wait for another cycle to find that the backup is 100% complete (a ticker is used instead of a sleep so we don't block for the quit signal). The controller takes care of updating the progress to 100% once the backup has completed successfully.

One issue with this is that if a backup completes too quickly, the progress will never get set. This can easily happen for smaller volumes or if only a few files have changed in an incremental backup. I'm now thinking that this function should call updateFunc one last time with the final progress before returning, and we don't need to set the progress to 100% in the controller.

skriss

approach LGTM - one question so far

pkg/restic/exec_commands.go

Signed-off-by: Adnan Abdulhussein <[email protected]>

prydonius · 2019-08-30T23:41:53Z

I've been trying a different approach here in order to address #1821 (comment), but I'm seeing something strange. The restic backup --json command finishes with a summary line, something like:

{"message_type":"summary","files_new":0,"files_changed":0,"files_unmodified":10,"dirs_new":0,"dirs_changed":0,"dirs_unmodified":0,"data_blobs":0,"tree_blobs":0,"data_added":0,"total_files_processed":10,"total_bytes_processed":21043871744,"total_duration":0.363935634,"snapshot_id":"5e764a0b"}

I'm trying to read this line after the command has finished to extract total_bytes_processed and use that for a final update of the backup progress. This is consistently the last line when I manually run restic backup --json both on my local machine and when exec'ing in the Pod. However, in the controller, it is sometimes the last line and other times the penultimate line (followed by another "status" line). This could be because the last two lines are being written to the buffer around the same time and so the order can be different?

Might need some extra eyes on this next week @skriss.

skriss · 2019-09-03T19:02:07Z

Just ran a restic backup cmd locally using restic v0.9.5 - got the following output at the end:

{"message_type":"status","seconds_elapsed":9,"percent_done":1,"total_files":428,"files_done":428,"total_bytes":2168012,"bytes_done":2168012}
{"message_type":"status","seconds_elapsed":10,"percent_done":1,"total_files":428,"files_done":428,"total_bytes":2168012,"bytes_done":2168012}
{"message_type":"status","seconds_elapsed":11,"percent_done":1,"total_files":428,"files_done":428,"total_bytes":2168012,"bytes_done":2168012}
{"message_type":"summary","files_new":428,"files_changed":0,"files_unmodified":0,"dirs_new":0,"dirs_changed":0,"dirs_unmodified":0,"data_blobs":428,"tree_blobs":1,"data_added":2168388,"total_files_processed":428,"total_bytes_processed":2168012,"total_duration":11.171735739,"snapshot_id":"4e9479ef"}
{"message_type":"summary","files_new":428,"files_changed":0,"files_unmodified":0,"dirs_new":0,"dirs_changed":0,"dirs_unmodified":0,"data_blobs":428,"tree_blobs":1,"data_added":2168388,"total_files_processed":428,"total_bytes_processed":2168012,"total_duration":11.171735739,"snapshot_id":"4e9479ef"}
{"message_type":"status","seconds_elapsed":11,"percent_done":1,"total_files":428,"files_done":428,"total_bytes":2168012,"bytes_done":2168012}

So duplicate "summary" lines, and a "status" line after the "summary" ones. Seems like there's probably some bugs in the status reporter in restic.

prydonius · 2019-09-03T20:49:39Z

Okay I've filed restic/restic#2389 upstream to see if we can get help on this issue. In the meantime, we can workaround this by looking for the summary line instead of just trying to get the last line.

Signed-off-by: Adnan Abdulhussein <[email protected]>

prydonius · 2019-09-03T23:58:58Z

This is now ready for review, I'll work on the restore progress and CLI changes in a separate PR

Signed-off-by: Adnan Abdulhussein <[email protected]>

prydonius

added displaying the backup progress when running velero backup describe --details:

prydonius · 2019-09-04T18:24:01Z

pkg/cmd/util/output/restore_describer.go

@@ -188,7 +188,8 @@ func describePodVolumeRestores(d *Describer, restores []v1.PodVolumeRestore, det
 		restoresByPod := new(volumesByPod)

 		for _, restore := range restoresByPhase[phase] {
-			restoresByPod.Add(restore.Spec.Pod.Namespace, restore.Spec.Pod.Name, restore.Spec.Volume)
+			// TODO(adnan): replace last parameter with progress from status (#1749)


will address this in the PR for restore progress

skriss · 2019-09-04T18:39:16Z

Looks good on a quick pass - I'll do some testing and another pass later on today.

Signed-off-by: Adnan Abdulhussein <[email protected]>

skriss · 2019-09-05T15:15:46Z

pkg/restic/exec_commands.go

+
+	cmd.Wait()
+	close(quit)
+	wg.Wait()


I don't think we need this wait group, do we? I think the "quit" logic could be simplified - after L120, call ticker.Stop() (you'll need to move ticker's declaration to outside the goroutine), then close(quit). The select case that receives from quit can just return.

Yeah, it's mostly left over from the additional goroutines we had for copying stdout/stderr. One issue is that if we're in the middle of the ticker.C case and it takes some time to complete, then we might call updateFunc in the goroutine with BytesDone < TotalBytes after our call to it at the end of this function. If we wait for the goroutine to finish, we know this will definitely not happen.

That said, the stuff in the ticker.C case should be pretty fast for backup, for restore I think we'll need the waitgroup since calculating the size of the volume can take an indeterminate amount of time.

OK, that's a good point. I guess the other way you could do this is to send on the quit channel rather than closing it, since that would block until it's received from, but I'm fine with this as-is.

@skriss ah good point, I didn't think of that. I like the idea of removing the unnecessary waitgroup, so will update this to do that - thanks!

skriss

LGTM. @nrb @carlisia PTAL!

Signed-off-by: Adnan Abdulhussein <[email protected]>

changelogs/unreleased/1821-prydonius

Signed-off-by: Adnan Abdulhussein <[email protected]>

pkg/restic/exec_commands.go

Signed-off-by: Adnan Abdulhussein <[email protected]>

prydonius · 2019-09-09T22:06:01Z

addressed all the review comments here - PTAL @nrb @carlisia!

carlisia

👍 great comments, and tests.

prydonius force-pushed the 1749-restic-progress branch from dcce611 to e8a6a99 Compare August 29, 2019 00:56

prydonius commented Aug 29, 2019

View reviewed changes

prydonius requested review from skriss, nrb and carlisia August 29, 2019 20:55

skriss reviewed Aug 29, 2019

View reviewed changes

pkg/restic/exec_commands.go Show resolved Hide resolved

record restic backup progress in PodVolumeBackup

f884a44

Signed-off-by: Adnan Abdulhussein <[email protected]>

update approach and always get summary line

8e6cc51

Signed-off-by: Adnan Abdulhussein <[email protected]>

prydonius force-pushed the 1749-restic-progress branch from e8a6a99 to 8e6cc51 Compare September 3, 2019 23:07

remove setting progress to 100% in controller

ac4cd0b

Signed-off-by: Adnan Abdulhussein <[email protected]>

prydonius marked this pull request as ready for review September 3, 2019 23:58

show backup progress for podvolumebackups in velero cli

3618e51

Signed-off-by: Adnan Abdulhussein <[email protected]>

prydonius commented Sep 4, 2019

View reviewed changes

Adnan Abdulhussein added 2 commits September 4, 2019 13:31

update comment - getLastLine does not return the index

64d1712

Signed-off-by: Adnan Abdulhussein <[email protected]>

changelog

c4a1a33

Signed-off-by: Adnan Abdulhussein <[email protected]>

skriss suggested changes Sep 5, 2019

View reviewed changes

prydonius mentioned this pull request Sep 6, 2019

record restic restore progress in PodVolumeRestore #1854

Merged

skriss approved these changes Sep 9, 2019

View reviewed changes

simpler wait for goroutine to finish

cd5c288

Signed-off-by: Adnan Abdulhussein <[email protected]>

nrb reviewed Sep 9, 2019

View reviewed changes

changelogs/unreleased/1821-prydonius Outdated Show resolved Hide resolved

mention restic upgrade in changelog

5f8a4c0

Signed-off-by: Adnan Abdulhussein <[email protected]>

skriss reviewed Sep 9, 2019

View reviewed changes

pkg/restic/exec_commands.go Outdated Show resolved Hide resolved

drop close channel

dd62cfd

Signed-off-by: Adnan Abdulhussein <[email protected]>

prydonius mentioned this pull request Sep 9, 2019

Plan to move this repo to vmware-tanzu #1822

Merged

carlisia approved these changes Sep 10, 2019

View reviewed changes

carlisia merged commit 1e182e5 into vmware-tanzu:master Sep 10, 2019

prydonius deleted the 1749-restic-progress branch September 10, 2019 18:38

skriss mentioned this pull request Sep 10, 2019

add progress reporting for restic backups & restores #1749

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

record restic backup progress in PodVolumeBackup #1821

record restic backup progress in PodVolumeBackup #1821

prydonius commented Aug 29, 2019

prydonius commented Aug 29, 2019

prydonius Aug 29, 2019

prydonius Aug 30, 2019

skriss left a comment

prydonius commented Aug 30, 2019

skriss commented Sep 3, 2019

prydonius commented Sep 3, 2019

prydonius commented Sep 3, 2019

prydonius left a comment

prydonius Sep 4, 2019

skriss commented Sep 4, 2019

skriss Sep 5, 2019

prydonius Sep 5, 2019

skriss Sep 9, 2019

prydonius Sep 9, 2019

skriss left a comment

prydonius commented Sep 9, 2019

carlisia left a comment

record restic backup progress in PodVolumeBackup #1821

record restic backup progress in PodVolumeBackup #1821

Conversation

prydonius commented Aug 29, 2019

prydonius commented Aug 29, 2019

prydonius Aug 29, 2019

Choose a reason for hiding this comment

prydonius Aug 30, 2019

Choose a reason for hiding this comment

skriss left a comment

Choose a reason for hiding this comment

prydonius commented Aug 30, 2019

skriss commented Sep 3, 2019

prydonius commented Sep 3, 2019

prydonius commented Sep 3, 2019

prydonius left a comment

Choose a reason for hiding this comment

prydonius Sep 4, 2019

Choose a reason for hiding this comment

skriss commented Sep 4, 2019

skriss Sep 5, 2019

Choose a reason for hiding this comment

prydonius Sep 5, 2019

Choose a reason for hiding this comment

skriss Sep 9, 2019

Choose a reason for hiding this comment

prydonius Sep 9, 2019

Choose a reason for hiding this comment

skriss left a comment

Choose a reason for hiding this comment

prydonius commented Sep 9, 2019

carlisia left a comment

Choose a reason for hiding this comment