Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/handle sigterm #3730

Merged
merged 8 commits into from
Jun 7, 2024
Merged

Feature/handle sigterm #3730

merged 8 commits into from
Jun 7, 2024

Conversation

dszlachta
Copy link
Contributor

This changes the scraper code, so that when user presses Ctrl-C, we print a message, finish the current job and exit. If the code is in a loop, it will break it.

The idea is to use the cleanup mechanism we already have.

@dszlachta dszlachta requested a review from tjayrush May 31, 2024 11:53
Copy link
Member

@tjayrush tjayrush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made two comments related to one single return where ok was false. Let's discuss before merging.

Also, I rebased to latest develop branch.

@@ -154,11 +171,11 @@ func (opts *ScrapeOptions) HandleScrape() error {

} else {
// Consilidate a chunk (if possible). Only quit on catostrophic errors. Report and sleep otherwise.
if err, ok = bm.Consolidate(blocks); !ok || err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ok here is false in the case were we've consoldiated and tried to write the files which means the chunk may be partial. (Here: https://github.com/TrueBlocks/trueblocks-core/blob/develop/src/apps/chifra/internal/scrape/scrape_consolidate.go#L112).

In this case, we quit, but this no longer happens.

See my note below on that line.

@@ -109,7 +104,7 @@ func (bm *BlazeManager) Consolidate(blocks []base.Blknum) (error, bool) {
publisher := base.ZeroAddr
var chunk index.Chunk
if report, err := chunk.Write(chain, publisher, chunkPath, appMap, nAppearances); err != nil {
return err, false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one place where the forever loop in the caller would have quit. We tried to write the file, it failed, but the file may still exist and be partial. In the C++ code, we would have been writing to a temporary file and only after it completed would be move it to the actual chunk. We don't do that here. (That's probably a bug.)

I think there may be two things to do:

  1. Can we cancel the context here so that the forever loop exits?
  2. Can we write to a temporary file and only if the write suceeds, move the temporary file on to the chunk?

Let's discuss

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should bring back the ok. That would be the first step and later we can add a temporary file.
Or, and I think it'd be the most idiomatic way, we have a custom error type, like CriticalScraperError. If we see this kind of error, we have to abort. This pattern could be used elsewhere, too and it's easier to integrate than changing method definitions

Copy link
Member

@tjayrush tjayrush Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of a custom error, but I question whether we're actually doing the right thing in the previous code.

For all other paths, we "clean up" (I think it's called removeEphemeralFiles). This means remove the stage. In this case, the error happened during creation of the chunk, so I think this is why it's different. In the previous code, we quit, but I don't think that's necessarily the right thing to do. We know the error happened during writing of the chunk, so unlike all other errors, we fear that the file may be incomplete.

Instead of quiting, we could perhaps just delete the chunk file (and allow the cleaning of ephemeral files to continue as normal).

Also, it's curious to me that we don't have a similar test during the writing of the bloom file.

Also, we should double check that the backup file mechanism isn't already in place.

@dszlachta dszlachta force-pushed the feature/handle_sigterm branch from 251e4b3 to 09d2dc3 Compare June 6, 2024 14:54
@dszlachta
Copy link
Contributor Author

@tjayrush I fixed the code accordingly to your feedback. There's now a new class of error, criticalError and it's returned from Consolidate if chunk write fails. When we get it, we remove both index and bloom files and then do the same thing like when a user hits Ctrl-C: break from the loop, cleanup and exit.

@tjayrush
Copy link
Member

tjayrush commented Jun 6, 2024

I'll review it and merge it tomorrow morning.

In the meantime, this is the mechanism I was talking about where a backup file is created and replaces the failed write: https://github.com/TrueBlocks/trueblocks-core/blob/master/src/apps/chifra/pkg/index/chunk_write.go#L83.

Also, I was worried that we were only returning criticalError when writing the index data, but this line looks like it would also fail when writing the Bloom filter: https://github.com/TrueBlocks/trueblocks-core/blob/master/src/apps/chifra/pkg/index/chunk_write.go#L122

Any error from chunk.Write will cause a criticalError, I think. Is that right?

@dszlachta
Copy link
Contributor Author

Yes, that's correct, any error from chunk.Write will cause a criticalError.

I've seen the backup mechanism you mentioned. Let's keep in mind that it's there.

@tjayrush tjayrush merged commit 4c1243d into develop Jun 7, 2024
3 checks passed
@tjayrush tjayrush deleted the feature/handle_sigterm branch June 7, 2024 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants