Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue reading TTrees with different branch name and "leaf" name #8

Open
jcatmore opened this issue Apr 29, 2018 · 3 comments
Open

Issue reading TTrees with different branch name and "leaf" name #8

jcatmore opened this issue Apr 29, 2018 · 3 comments

Comments

@jcatmore
Copy link

Hi Dan,

I've encountered an issue when using ttree2hdf5 (AnalysisBase-21.2.20) on flat n-tuple files where the branch name and leaf name are different, e.g.:

root [4] TMVA_tree->Print()
******************************************************************************
*Tree    :TMVA_tree : Tre to be used for TMVA                                *
*Entries : 18375215 : Total =      4497409573 bytes  File  Size = 4495473044 *
*        :          : Tree compression factor =   1.00                       *
******************************************************************************
*Br    0 :lep_pT1   : b_lep_pT1/F                                            *
*Entries : 18375215 : Total  Size=   73729165 bytes  File Size  =   73682797 *
*Baskets :     2303 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    1 :lep_pT2   : b_lep_pT2/F                                            *
*Entries : 18375215 : Total  Size=   73729165 bytes  File Size  =   73682797 *
*Baskets :     2303 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    2 :lep_E1    : b_lep_E1/F                                             *
*Entries : 18375215 : Total  Size=   73726858 bytes  File Size  =   73680494 *
*Baskets :     2303 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    3 :lep_E2    : b_lep_E2/F                                             *
*Entries : 18375215 : Total  Size=   73726858 bytes  File Size  =   73680494 *
*Baskets :     2303 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    4 :lep_eta1  : b_lep_eta1/F                                           *
*Entries : 18375215 : Total  Size=   73731472 bytes  File Size  =   73685100 *
*Baskets :     2303 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*
*Br    5 :lep_eta2  : b_lep_eta2/F                                           *
*Entries : 18375215 : Total  Size=   73731472 bytes  File Size  =   73685100 *
*Baskets :     2303 : Basket Size=      32000 bytes  Compression=   1.00     *
*............................................................................*

As you can see the leaf names have this "b_" prepended to them (to be honest I don't know why they were written like this, but this is what we have in our hands).

The converter seems to get tripped up by this, e.g.

ttree2hdf5 --in-file TMVA_tree.root --out-file TMVA_tree.h5 --tree-name TMVA_tree --verbose --print-interval
tree: TMVA_tree
adding TMVA_tree.root
found b_lep_eta1
Error in <TTree::SetBranchStatus>: unknown branch -> b_lep_eta1
Error in <TChain::SetBranchAddress>: unknown branch -> b_lep_eta1
found b_lep_eta2
Error in <TTree::SetBranchStatus>: unknown branch -> b_lep_eta2
Error in <TChain::SetBranchAddress>: unknown branch -> b_lep_eta2
found b_lep_pT1
Error in <TTree::SetBranchStatus>: unknown branch -> b_lep_pT1
Error in <TChain::SetBranchAddress>: unknown branch -> b_lep_pT1
found b_lep_pT2
Error in <TTree::SetBranchStatus>: unknown branch -> b_lep_pT2
Error in <TChain::SetBranchAddress>: unknown branch -> b_lep_pT2
...
...

The converter then sets off and runs over the file, producing an output file of expected size, but when inspected with h5py the contents just seem to contain the values of uninitialized variables (e.g. all information is lost). I confirmed that with the same release, when a TTree with the branch and leaf names are the same, it works fine and the expected values can be extracted from the h5 file.

Do you know if there is some way around this, or is the solution that whatever package wrote that n-tuple needs to be re-written so that it makes the branch and the leaf name the same?

Thanks and best wishes,

James

@dguest
Copy link
Owner

dguest commented Apr 30, 2018

Hi @jcatmore,

To be to be perfectly honest I'm not really much of a ROOT expert, so I don't really understand the difference between a tree and a leaf (I only wrote this ttree2hdf5 as an afterthought, my main goal was to write the part that writes HDF5 from memory).

That said, internally we call GetListOfLeaves, so it's possible that we could just change this to GetListOfBranches. Again, I don't know what the differences are: it might be that the way we're doing it has only worked by chance so far, because most people create TTrees using simpler syntax that gives leafs and branches the same name. Maybe you have a better understanding of this: the lines that are causing the problem are right here.

It's also worth noting that the version of the code here has diverged slightly from the ATLAS internal version (I didn't want to add this as an external dependency because it seemed like more work for our SW team, and then the ATLAS reviewers made me change the naming conventions). So while the issue you're having is probably going to be a problem for both versions, the code is slightly different between the two.

@jcatmore
Copy link
Author

Thanks @dguest - indeed, I think GetListOfBranches will do the trick. I'll try it locally myself and let you know. Thanks for pointing me to the relevant lines!

James.

@dguest
Copy link
Owner

dguest commented Apr 30, 2018

Hi @jcatmore,

Ok let me know if you have other issues. I don't quite remember why I used GetListOfLeaves, but it may have had to do with the code needing to figure out the variable type. For leafs ROOT has a GetTypeName() function, but this doesn't seem to exist in TBranch. I guess we could probably add a few lines to dig down to the primitive type in branches too, but it might start to get ugly if we try to accommodate every possible way people are saving trees---my goal was never to cover every use case, just the "common" ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants