-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshotting should wait for user-callback to prepare/mount target filesystem before throwing errors #1142
Comments
perhaps the code should be aware of whether the backup has been invoked by a user or is the result of an unattended schedule, only asking for user input if there is a user likely to be present (i.e. not in the case of scheduled backup).
The user may be present even in the case of scheduled backup (for example if they backup hourly).
|
Yes, this is why I felt this needed more discussion. Perhaps an |
@kortschak @colintedford There is a new forming maintaining team and we do review all issues. Is this problem still relevant for you, can you reproduce it with a newer or the latest release? Did you find a solution? Tag: Bug This depends on a refactoring of the whole logging mechanic. If I understood the implications correct this is not a blocking or heavy bug. That bug can't be fixed isolated. Because of that I vote for close. No matter the logging mechanic will be recreated using Pythons own |
This is still relevant. I run the version at #1143 and have not moved from there because of the impact that that bug has on my use. I don't see any changes between that branch and master that would affect the behaviour here (the relevant code in config.py has been touched, but not in a way that would fix this). All the requirements to fix the issue are described in the issue, but depend on design decisions, which is why I did not send a fix. I don't think that the bug should be closed until the issue no longer exists, and I would say that there is a fundamental issue of logging non-error states as errors due to invalid logic, not a logging issue. |
Please let me paraphrase this to see if I understand it.
Does your script wait until the NAS has booted (e.g. via checking via ping?). Looking at the time code of your log output it doesn't wait.
What is the "impact" in your use-case? It sounds like you can't do backups without that PR? Am I right? |
Not relevant to me now because I'm not currently using backintime. I'm glad to hear about the team, tho. |
Correct.
This is the case. There is a callback to this program https://github.com/kortschak/bit-user-callback/blob/master/bit-user-callback.go using the following configuration
You can see here that the wake code waits 10 minutes for the NAS to come up before failing out (this is this loop). The NAS takes less than 2 minutes to boot.
I think that is correct; this issue is over a year old, so the details are fuzzy.
Error logs result in system notifications. BIT without this change can cause significant spamming of notifications that make the Gnome notification bar useless (in pathological cases causing Gnome to become unresponsive — this doesn't happen in this case, but can when other errors occur, this is a different (but related) and bigger issue that may be fixed by your plans to refactor the logging system if you keep track of the number of ERROR level log messages that you send). |
Maybe my next question is because I'm not deep enough into the BIT code. When your user-callback does wait 10 minutes the NAS is up by guarantee. Then why does the error message happens? Doesn't BIT wait until the end of the user-callback script? |
There are two issues here, the first is shown in the screenshot and is a result of the code in the first block, this ~always happens and is not an error, but spams the notifications. I don't know why it fails to find what it is looking for, but my first guess would be that while the NAS is up and responding to HTTP request, perhaps the FS is slow to respond early. The second is the error logging and I haven't seen that for over a year because I run a version that fixes it; I don't recall the details of when and why it happens. |
Now we come closer.
I understand and agree.
That is the important question for me. In my understanding (without reviewing the BIT code) this error shouldn't happen because your user-callback bring up the NAS and does wait long enough. , but my first guess would be that while the NAS is up and responding to HTTP request, perhaps the FS is slow to respond early. The second is the error logging and I haven't seen that for over a year because I run a version that fixes it; I don't recall the details of when and why it happens. |
Dan, thanks for your patience and clarity in dealing with this issue, and taking the time to explain your troubles and your proposed solutions. First off, let's agree that (as you explained above), you're dealing with two separate annoyances: 1. Spurious ERRORs in syslogThere are errors thrown by the Lines 1356 to 1357 in 88d19d4
which cause an annoyance in your logs, because they indicate a problem that resolves itself after a few seconds. That is discussed at length in #1143 (comment). It's not what this Issue is about. Moreover, it becomes irrelevant, once we fix …: 2. Spurious desktop notificationYou're receiving a desktop notification that says If you agree, I think the title of this Issue should be: … and that would be our new starting point for eliminating the annoyance for you and, hopefully, others :) |
//cc @aryoda, who is looking into improving user-callback functionality. There's a long-ish discussion here, but you really only need this summary from my above comment:
|
@emtiu I saw some (just merged) PRs for this but I think they "only" do fix the notfication spamming, right?
So if there is still work to do I need to understand what exactly. TLDR;What exactly should be changed now from your user point-of-view (after the above PRs are already merged in)? I see only one thing (but I jumped onto this issue very late):
backintime/common/snapshots.py Lines 672 to 691 in b4dd2d1
Anything missing? DetailsThe race condition can only happen if two++ processes are working in parallel, eg.
I assume this is sync call (waits for the NAS beeing up) - I have not checked your GO code (cool script BTW) in details
I think if the user-callback for WOL is a sync call and there could be a race between
I think this could be fixed in the user-mount by also waiting for the If BiT calls the user-callback to ask for a mount (reason 7 signal ) it should rely on an available mount to do further processing The additional 30 secs timeout for BiT-side mounting of ssh (or whatever is configured in the profile) comes on top
I assume with "user input" you mean the "user notification" in your Gnome notification bar when logged in
If really something different pops up requiring user interaction please provide more details or a screen shot here. I fully agree that a (head-less) scheduled backup should only write to logs and send user notifications only at the end (and only if a user is logged in). |
Yes. You can see the core logic for that here https://github.com/kortschak/bit-user-callback/blob/5fc406cfd52bb6279884127e2c10f4726c63c7cf/bit-user-callback.go#L335-L361 The basis for assessing whether the NAS is up is that it responds with a 200 on a get to the web portal for the NAS. A reasonable human would expect that a success here would mean the fs was up, ... but software. The remainder is there to ensure that the device is at on the correct wireless network and config handling.
This is correct.
Yes.
Partially. Logs should be written, but user notification absolutely should not be used in a headless situation. The logs that should be written should be at the appropriate level (I think this is fixed now). |
Yes, in 100 % headless situations a user notification is simply not possible and BiT offers the notification plugin for this headless use case (eg. to send emails). If a cron job is started under a certain user account and the user is logged-in in a desktop environment when the job finishes I would not consider this as a headless situation (eg. Ubuntu informs me via notifications - when I am logged-in - that some scheduled updates were not possible - eg. snap security updates). IMHO it would be very helpful for me as BiT user if scheduled backup jobs (in user's Anyhow these are the options I see: A) Implementing a "cron-zero-notification" change:
Note: I was surprised that the GUI-plugin backintime/common/pluginmanager.py Lines 206 to 210 in b4dd2d1
B) Alternative: Implementing a "if cron then notify only about final backup status" change:
@kortschak @emtiu @buhtzz Any opinions? |
Yeah, I should clarify, notifications are reasonable, but expecting user interaction is not. The final outcome notification looks like the best approach to me. I agree that it is easily possible for the user to not notice that a backup has failed otherwise. |
@aryoda wrote:
Yes, this is what I consider the core of this issue. If this was done correctly, @kortschak would not be seeing the unwelcome notifications. But other considerations regarding notifications also come into play: @aryoda wrote:
Yes, this problem (silent failure) has been mentioned in #450, and I've also experienced it myself. Different Issue ;) @aryoda wrote:
I think this is what #850 also wishes for. On the other hand, users might wish for that notification: It would remind them to connect a drive (for which they have 30 seconds), then run a snapshot, and then the drive might be removed again.
Yes, users might very well want to be notified of the final result of a snapshot by desktop notification, but currently, this does not happen. The "Saving config file" etc. notifications are "BiT-internal" (shown in the GUI and the tray icon), but not as desktop notifications. In summary, a lot could be done to improve notifications. It would probably be best to have per-profile configuration options like:
But this would require a major reworking, including the integration of the For the moment, I think several other issues and bugs are more important. But this Issue gives a good summary. |
I agree putting this issue into the backlog (esp. since the plugin and notification systems need a complete overhaul...) |
THX, I didn't realize that only selected messages are send as desktop notifications and just checked why... The backintime/qt/plugins/notifyplugin.py Lines 48 to 50 in b4dd2d1
|
I have a Back in Time profile that stores to a NAS that is powered down for most of the time. The NAS is set to wake with a WOL packet that is sent by a user-callback which waits after the WOL packet has been sent then returns a success status if the NAS comes up within the timeout. This all works very nicely since #654 was fixed.
However, the backup to the NAS is scheduled for the middle of the night and often in the morning I see a notification like so
despite the backup having succeeded. This is due to this
backintime/common/snapshots.py
Lines 669 to 674 in 9310acc
backintime
and the remote being mounted by the system since the user-callback waits until the NAS is up before it returns.backintime
iterates a test for the presence of the remote mount each second, but first notifying the user that the remote mount is not present.In looking into this issue, I found that
backintime
also logs as an ERROR to syslog for checks where the remote mount is not present (actually it checks whether it is not a directory):backintime/common/config.py
Lines 1356 to 1360 in 9310acc
ISTM that the check here should be a silent check for whether the file exists (if it doesn't after the 30 seconds of checking has elapsed it does finally error out) and only if it does exist then check that it's a directory, erroring out if not. The fix for this is simple and I will send a PR.
The fix for the underlying issue is less clear to me; ISTM that perhaps the code should be aware of whether the backup has been invoked by a user or is the result of an unattended schedule, only asking for user input if there is a user likely to be present (i.e. not in the case of scheduled backup).
The text was updated successfully, but these errors were encountered: