-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patch missing cols #158
Patch missing cols #158
Conversation
|
||
if( cnt == 0 ){ // enter type only for first occurence | ||
auto col = evt->getCollection( name ) ; | ||
typeMap.insert( std::make_pair( name, col->getTypeName() ) ) ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typeMap.insert( std::make_pair( name, col->getTypeName() ) ) ; | |
const auto [it, inserted] = typeMap.emplace( name, col->getTypeName() ) ; |
slightly less cluttered and easier to read IMHO. (Plus we save a few "copies", aka moves, by constructing in-place).
Checking the return type here would give us the possibility to catch (the unlikely) case where collections with different types are stored under the same name in different events. (Would potentially make it possible to get rid of the cntMap
(if the main purpose is error checking).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always happy to learn more modern C++ - the above used to be the 'canonical' way ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: the counting is crucial here - as in the end I need to compare the count to the total number of events read. Could possibly merge both maps into one, though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow why that comparison is strictly necessary. The main purpose is to collect all names (and types) of the collections stored in all events in the files, which we do by filling the typeMap
. Having a counter for each collection just would give us the possibility of also displaying information on which collections were missing in some events, right?
auto it_c = cntMap.begin(); | ||
auto it_t = typeMap.begin(); | ||
for( ; it_c != cntMap.end() ; it_c++, it_t++ ){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work as intended? I.e. do the keys of both maps match in every iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why they wouldn't - but can check...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we put the creation of the name to type map into a function and simply call that in check_missing_cols
? That would make it much easier to use the same functionality in e.g. a standalone converter.
std::map<std::string,unsigned> cntMap ; | ||
std::map<std::string,std::string> typeMap ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::map<std::string,unsigned> cntMap ; | |
std::map<std::string,std::string> typeMap ; | |
std::unordered_map<std::string,unsigned> cntMap ; | |
std::unordered_map<std::string,std::string> typeMap ; |
For most use cases we are probably at a number of collection names, where we can profit from using a hash map. This will almost definitely break the loop over both maps using the iterators below.
Yes, that's the plan - working on it ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The majority of my comments is suggestions to sprinkle around some const
s, which should also make it possible to use this on multiple threads once configured. Also some &
s to avoid unnecessary copies.
As we just discussed, making the Set
a vector
instead of a set
would probably make sense, since the main purpose is to iterate in the end, with the downside that manipulating the internal _patchCols
during configuration is a bit more work, because merging vectors is not as straight forward.
*/ | ||
void addPatchCollections(Vector cols){ | ||
for(const auto& p : cols) | ||
_patchCols.push_back( p ) ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will now no longer work as before with the set
, as it is now possible to add the same collections multiple times, so this would have to be something like
_patchCols.push_back( p ) ; | |
if (const auto it = std::find(_patchCols.begin(), _patchCols.end(), p); it == _patchCols.end()) { | |
_patchCols.push_back( p ) ; | |
} |
/** Add a collection with (name,type) that should be added to events in patchEvent(). | ||
*/ | ||
void addPatchCollection(const std::string name, std::string type){ | ||
_patchCols.push_back( {name, type} ) ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_patchCols.push_back( {name, type} ) ; | |
addPatchCollections({{name, type}}); |
Not sure if this initializer list works, but to avoid having to duplicate the deduplication logic, I would just forward this to the overload taking the vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really would want to put such a check in, I'd do it the other way round, i.e. check in the individual addPatchCollection
for duplicates and use this in the other function. But I think this is really overdoing it, as duplicates would not cause any harm (except tiny extra runtime). Much more critical would be to check for viable LCIO type names as that would cause crashes. Not sure we need this for this 'expert' tool, though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I pointed this out is the changed behavior wrt the previous version that had a set
. In any case, this will most likely only be called once in any case, and with an empty _patchCols
as well, so I agree, deduplicating here is probably not necessary.
The LCIO type check is something we could/should actually do (also to catch typos in steering files), I think. It depends a bit on whether the crash is an exception or a segmentation fault in the collection creation factory. The latter might be hard to diagnose if there is no backtrace. The former, I would consider "good enough" for this, as we will have something printed to the screen with enough information, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess any container will do here, as all we need here is to get all elements. I had put in the vector as you suggested for further downstream use in the converter. Thinking about the type checking, again: this also is not needed or even useful to have here. We should get neither crash or exception, as the type string is really just used as 'information' and no code should depend on it. For empty collection no one will (or should) access this ever. Collections with incorrect type might get lost in routines that list/dump collections as LCTOOLS::dumpEventDetailed
- but no crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the 'WIP', so from my side it would be good to go.
Co-authored-by: Thomas Madlener <[email protected]>
4c6ef27
to
b8e04ef
Compare
BEGINRELEASENOTES
CheckCollections
that allows to parse lcio files for collections that are not present in every event and to patch such events with empty collections of the given (Name,Type) for further processingcheck_missing_cols
that checks for collections that are not in every event in a set of files and prints a summary to stdout:usage: check_missing_cols <input-file1> [[input-file2],...]
patch_missing_cols
that creates a copy of the input file with the same set of collections in all events:usage: patch_missing_cols <input-file> <output-file>
EDM4hep
ENDRELEASENOTES