-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TTreeProcessorMP processes events multiple times when there are more threads than entries #15425
Comments
Not sure if it's a bug or a feature... With TTreeProcessorMP, if the number of files to process is larger than the number of worker processes, the process is invoked once per file. Otherwise entries are divided equally between workers.
you'll see:
But I might be wrong and @pcanal, @dpiparo, or @guitargeek can correct me... |
But isn't the report about the opposite case, i.e. when there are more workers than events to process? |
100 root files with 16 threads, so more files than workers. Or did I miss something? |
It's about having more threads in a file than events (there's one event per file). As mentioned above, it seems like the process is getting invoked once per thread per file, but if there aren't enough events, the extra threads seem to reprocess an old event. This is what we had noticed with the much more complex code that led us to identify this, which is that certain events were getting processed and histogrammed multiple times in cases like this. We were able to work around this successfully by checking if the same event is being processed multiple times in a row. |
In this case, the behaviour is confirmed: void fill(){
ROOT::RDataFrame(3).Define("a", [](){return 1;}).Snapshot("t", "f.root");
}
class TestSelector : public TSelector {
public:
TH1F *h;
virtual void SlaveBegin(TTree*) {
h = new TH1F("h", "h", 8, -1, 1);
h->SetDirectory(0);
}
virtual bool Process(Long64_t) {
h->Fill(0);
return true;
}
virtual void SlaveTerminate() {
GetOutputList()->Add(h);
}
};
void b(){
//fill();
ROOT::TTreeProcessorMP pool(12);
TestSelector sel;
auto h = pool.Process("f.root", sel);
h->At(0)->Print();
} prints TH1.Print Name = h, Entries= 11, Total sum= 11 |
When processing trees with less entries than workers with TTreeProcessorMP some entries were processed multiple times because of a mistake in the algorithm calculating the event ranges. Thanks to @hageboeck for the help with the output management of the test. Fixes root-project#15425
Thanks a lot for the report. Indeed it was a problem, which is hopefully fixed by #16147 . I will backport the change to 6.32, too. |
When processing trees with less entries than workers with TTreeProcessorMP some entries were processed multiple times because of a mistake in the algorithm calculating the event ranges. Fixes root-project#15425
When processing trees with less entries than workers with TTreeProcessorMP some entries were processed multiple times because of a mistake in the algorithm calculating the event ranges. Fixes root-project#15425
When processing trees with less entries than workers with TTreeProcessorMP some entries were processed multiple times because of a mistake in the algorithm calculating the event ranges. Fixes root-project#15425
When processing trees with less entries than workers with TTreeProcessorMP some entries were processed multiple times because of a mistake in the algorithm calculating the event ranges. Fixes root-project#15425
When processing trees with less entries than workers with TTreeProcessorMP some entries were processed multiple times because of a mistake in the algorithm calculating the event ranges. Fixes #15425
Reopening until the BP is merged in the 6.32 branch. |
When processing trees with less entries than workers with TTreeProcessorMP some entries were processed multiple times because of a mistake in the algorithm calculating the event ranges. Fixes root-project#15425
When processing trees with less entries than workers with TTreeProcessorMP some entries were processed multiple times because of a mistake in the algorithm calculating the event ranges. Fixes #15425
BP also merged. |
When processing trees with less entries than workers with TTreeProcessorMP some entries were processed multiple times because of a mistake in the algorithm calculating the event ranges. Fixes root-project#15425
When processing trees with less entries than workers with TTreeProcessorMP some entries were processed multiple times because of a mistake in the algorithm calculating the event ranges. Fixes root-project#15425
Check duplicate issues.
Description
TTreeProcessorMP process events multiple times when you read multiple files using TChain, and some of the files have fewer entries than there are threads. This has been observed using TChain and TSelector, but I haven't tried it out with any other methods of using TTreeProcessorMP.
The attached reproducer minimally shows this bug by showing that the Process method is run more than it should be. Another observations that are not shown in the reproducer is that the additional events are repeats of existing events. In the code where I first observed this, we frequently have files where one thread is processing multiple events (5-10ish), and most of them are real events, with the last few being repeats. It also seems like it always tries to process at least 15 events for each file if you use 16 threads, even if those events don't exist; if there are more than 15 events than it seems to behave as expected.
Reproducer
mp_bug.zip
Run:
This will create 100 root files with 1 entry each called
files/f_0###.root
. It will then use TTreeProcessorMP with 16 threads to read through the files and count the number of events read. This should be 100; instead it is 1500.ROOT version
v6.28/06 and 6.26/10
Installation method
Built from source
Operating system
Ubuntu 22.04
Additional context
No response
The text was updated successfully, but these errors were encountered: