Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JITted code changes the execution order of computation graph nodes #15686

Closed
1 task
vepadulano opened this issue May 30, 2024 · 1 comment · Fixed by #15713
Closed
1 task

JITted code changes the execution order of computation graph nodes #15686

vepadulano opened this issue May 30, 2024 · 1 comment · Fixed by #15713

Comments

@vepadulano
Copy link
Member

Check duplicate issues.

  • Checked for duplicates

Description

The RDataFrame execution order for branches of the computation graph is bottom-up: actions request values to the upstream readers traversing them one by one in reverse order w.r.t. their insertion.

This logic seems to be flipped when the action is JITted. In a simple example with two Defined columns and one Graph, the order of execution of the Defines changes depending on whether the action needs to infer the column types or not.

Correc order (bottom-up)

./repro_graph_compiled.out 
Info in <[ROOT.RDF] Info /home/vpadulan/Programs/rootproject/rootsrc/tree/dataframe/src/RLoopManager.cxx:852 in void ROOT::Detail::RDF::RLoopManager::Run(bool)>: Starting event loop number 0.
Info in <[ROOT.RDF] Info /home/vpadulan/Programs/rootproject/rootsrc/tree/dataframe/src/RLoopManager.cxx:811 in void ROOT::Detail::RDF::RLoopManager::Jit()>: Nothing to jit and execute.
Defining 'b2': address: 0x7ffe1477198c, value: 42
Defining 'b1': address: 0x7ffe1477198c, value: 10
Info in <[ROOT.RDF] Info /home/vpadulan/Programs/rootproject/rootsrc/tree/dataframe/src/RLoopManager.cxx:889 in void ROOT::Detail::RDF::RLoopManager::Run(bool)>: Finished event loop number 0 (0s CPU, 6.10352e-05s elapsed).
graph: X: 10,Y:42
Info in <[ROOT.RDF] Info /home/vpadulan/Programs/rootproject/rootsrc/tree/dataframe/src/RLoopManager.cxx:811 in void ROOT::Detail::RDF::RLoopManager::Jit()>: Nothing to jit and execute.

Wrong order

./repro_graph_jitted.out 
Info in <[ROOT.RDF] Info /home/vpadulan/Programs/rootproject/rootsrc/tree/dataframe/src/RLoopManager.cxx:852 in void ROOT::Detail::RDF::RLoopManager::Run(bool)>: Starting event loop number 0.
Info in <[ROOT.RDF] Info /home/vpadulan/Programs/rootproject/rootsrc/tree/dataframe/src/RLoopManager.cxx:825 in void ROOT::Detail::RDF::RLoopManager::Jit()>: Just-in-time compilation phase completed in 1.388524 seconds.
Defining 'b1': address: 0x7ffd017cb8dc, value: 42
Defining 'b2': address: 0x7ffd017cb8dc, value: 42
Info in <[ROOT.RDF] Info /home/vpadulan/Programs/rootproject/rootsrc/tree/dataframe/src/RLoopManager.cxx:889 in void ROOT::Detail::RDF::RLoopManager::Run(bool)>: Finished event loop number 0 (0s CPU, 0.000119925s elapsed).
graph: X: 42,Y:42
Info in <[ROOT.RDF] Info /home/vpadulan/Programs/rootproject/rootsrc/tree/dataframe/src/RLoopManager.cxx:811 in void ROOT::Detail::RDF::RLoopManager::Jit()>: Nothing to jit and execute.

Reproducer

compiled version

#include <ROOT/RDataFrame.hxx>
#include <ROOT/RLogger.hxx>

#include <iostream>

auto verbosity =
    ROOT::Experimental::RLogScopedVerbosity(ROOT::Detail::RDF::RDFLogChannel(), ROOT::Experimental::ELogLevel::kInfo);

void run()
{
    ROOT::RDataFrame d{1};
    int i{42};
    auto graph = d.Define("x", [&i]()
                          { 
                std::cout << "Defining 'x': address: " << &i << ", value: " << i << "\n";
                return i; })
                     .Define("y", [&i]()
                             {
                    std::cout << "Defining 'y': address: " << &i << ", value: " << i << "\n";
                    auto j = i; i = 10; return j; })
                     .Graph<int, int>("x", "y");

    assert(graph->GetN() == 1);
    std::cout << "graph: X: " << graph->GetPointX(0) << ",Y:" << graph->GetPointY(0) << "\n";
}

int main()
{
    run();
}

JITted version

#include <ROOT/RDataFrame.hxx>
#include <ROOT/RLogger.hxx>

#include <iostream>

auto verbosity =
    ROOT::Experimental::RLogScopedVerbosity(ROOT::Detail::RDF::RDFLogChannel(), ROOT::Experimental::ELogLevel::kInfo);

void run()
{
    ROOT::RDataFrame d{1};
    int i{42};
    auto graph = d.Define("x", [&i]()
                          { 
                std::cout << "Defining 'x': address: " << &i << ", value: " << i << "\n";
                return i; })
                     .Define("y", [&i]()
                             {
                    std::cout << "Defining 'y': address: " << &i << ", value: " << i << "\n";
                    auto j = i; i = 10; return j; })
                     .Graph("x", "y");

    assert(graph->GetN() == 1);
    std::cout << "graph: X: " << graph->GetPointX(0) << ",Y:" << graph->GetPointY(0) << "\n";
}

int main()
{
    run();
}

ROOT version

Any

Installation method

Built from source

Operating system

Fedora 39

Additional context

No response

@vepadulano
Copy link
Member Author

The linked PR fixes the situation. The execution order of the functions passed to Define (within the same branch of the computation graph) was never really specified (neither in the RDataFrame docs, nor in the actual implementation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Issues
Status: Issues
Development

Successfully merging a pull request may close this issue.

1 participant