-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/handle sigterm #3730
Feature/handle sigterm #3730
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made two comments related to one single return where ok
was false. Let's discuss before merging.
Also, I rebased to latest develop
branch.
@@ -154,11 +171,11 @@ func (opts *ScrapeOptions) HandleScrape() error { | |||
|
|||
} else { | |||
// Consilidate a chunk (if possible). Only quit on catostrophic errors. Report and sleep otherwise. | |||
if err, ok = bm.Consolidate(blocks); !ok || err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ok
here is false
in the case were we've consoldiated and tried to write the files which means the chunk may be partial. (Here: https://github.com/TrueBlocks/trueblocks-core/blob/develop/src/apps/chifra/internal/scrape/scrape_consolidate.go#L112).
In this case, we quit, but this no longer happens.
See my note below on that line.
@@ -109,7 +104,7 @@ func (bm *BlazeManager) Consolidate(blocks []base.Blknum) (error, bool) { | |||
publisher := base.ZeroAddr | |||
var chunk index.Chunk | |||
if report, err := chunk.Write(chain, publisher, chunkPath, appMap, nAppearances); err != nil { | |||
return err, false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one place where the forever loop in the caller would have quit. We tried to write the file, it failed, but the file may still exist and be partial. In the C++ code, we would have been writing to a temporary file and only after it completed would be move it to the actual chunk. We don't do that here. (That's probably a bug.)
I think there may be two things to do:
- Can we cancel the context here so that the forever loop exits?
- Can we write to a temporary file and only if the write suceeds, move the temporary file on to the chunk?
Let's discuss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should bring back the ok
. That would be the first step and later we can add a temporary file.
Or, and I think it'd be the most idiomatic way, we have a custom error type, like CriticalScraperError
. If we see this kind of error, we have to abort. This pattern could be used elsewhere, too and it's easier to integrate than changing method definitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of a custom error, but I question whether we're actually doing the right thing in the previous code.
For all other paths, we "clean up" (I think it's called removeEphemeralFiles
). This means remove the stage. In this case, the error happened during creation of the chunk, so I think this is why it's different. In the previous code, we quit, but I don't think that's necessarily the right thing to do. We know the error happened during writing of the chunk, so unlike all other errors, we fear that the file may be incomplete.
Instead of quiting, we could perhaps just delete the chunk file (and allow the cleaning of ephemeral files to continue as normal).
Also, it's curious to me that we don't have a similar test during the writing of the bloom file.
Also, we should double check that the backup file mechanism
isn't already in place.
251e4b3
to
09d2dc3
Compare
@tjayrush I fixed the code accordingly to your feedback. There's now a new class of error, |
I'll review it and merge it tomorrow morning. In the meantime, this is the mechanism I was talking about where a backup file is created and replaces the failed write: https://github.com/TrueBlocks/trueblocks-core/blob/master/src/apps/chifra/pkg/index/chunk_write.go#L83. Also, I was worried that we were only returning Any error from chunk.Write will cause a |
Yes, that's correct, any error from I've seen the backup mechanism you mentioned. Let's keep in mind that it's there. |
This changes the scraper code, so that when user presses Ctrl-C, we print a message, finish the current job and exit. If the code is in a loop, it will break it.
The idea is to use the cleanup mechanism we already have.