-
Notifications
You must be signed in to change notification settings - Fork 631
Executor 2.0: Stream assignment #5602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4b35a70
to
290128c
Compare
290128c
to
a28d72d
Compare
Signed-off-by: Michał Zientkiewicz <[email protected]>
Signed-off-by: Michał Zientkiewicz <[email protected]>
a28d72d
to
6aebb84
Compare
CI MESSAGE: [18059205]: BUILD STARTED |
Signed-off-by: Michal Zientkiewicz <[email protected]>
CI MESSAGE: [18060037]: BUILD STARTED |
Signed-off-by: Michal Zientkiewicz <[email protected]>
|
||
queue_.pop(); | ||
auto *node = sorted_nodes_[idx]; | ||
// This will be true for nodes which has no outputs or which doesn't contribute to any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This will be true for nodes which has no outputs or which doesn't contribute to any | |
// This will be true for a node which has no outputs or which doesn't contribute to any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went the other way.
void Assign(ExecGraph &graph) { | ||
// pre-fill the id pool with sequential numbers | ||
for (int i = 0, n = graph.Nodes().size(); i < n; i++) { | ||
free_stream_ids_.insert(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the minimality of the assignment, what happens for the bipartite graphs? Take 2x2 example. The left A, B ops contribute to both right C, D ops. Assuming that the output edges are visited in the topological order (i.e. C comes before D in A and B lists), can we end up with A0, B1, C0, D2 assignment?
If so, for larger bipartite graphs, it seems the need for free_stream_ids can exceed the number of nodes.
A -- C
\ /
/ \
B -- D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. The algorithm has been reworked and now the assignment is made as soon as a node is pushed to the queue. It's never re-pushed with a worse index.
Signed-off-by: Michal Zientkiewicz <[email protected]>
Signed-off-by: Michal Zientkiewicz <[email protected]>
CI MESSAGE: [18063682]: BUILD STARTED |
CI MESSAGE: [18063682]: BUILD PASSED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are cases when the stream_idx freed by the node is not the least avialable, which leads to non-minial assignment and changing the stream on a stright path.
Here are two examples:
TEST(Exec2Test, StreamAssignment_PerOperator_42) {
ExecGraph eg;
graph::OpGraph::Builder b;
b.Add("a",
SpecGPU()
.AddOutput("a->d", "gpu")
.AddOutput("a->e", "gpu")
.AddOutput("a->f", "gpu")
.AddOutput("a->c", "gpu"));
b.Add("b",
SpecGPU()
.AddOutput("b->d", "gpu")
.AddOutput("b->e", "gpu")
.AddOutput("b->f", "gpu")
.AddOutput("b->c", "gpu"));
b.Add("c",
SpecGPU()
.AddInput("a->c", "gpu")
.AddInput("b->c", "gpu")
.AddOutput("c->g", "gpu"));
b.Add("d",
SpecGPU()
.AddInput("a->d", "gpu")
.AddInput("b->d", "gpu")
.AddOutput("d->od", "gpu"));
b.Add("e",
SpecGPU()
.AddInput("a->e", "gpu")
.AddInput("b->e", "gpu")
.AddOutput("e->oe", "gpu"));
b.Add("f",
SpecGPU()
.AddInput("a->f", "gpu")
.AddInput("b->f", "gpu")
.AddOutput("f->of", "gpu"));
b.Add("g",
SpecGPU()
.AddInput("c->g", "gpu")
.AddOutput("g->og", "gpu"));
b.AddOutput("d->od_gpu");
b.AddOutput("e->oe_gpu");
b.AddOutput("f->of_gpu");
b.AddOutput("g->og_gpu");
auto g = std::move(b).GetGraph(true);
eg.Lower(g);
StreamAssignment<StreamPolicy::PerOperator> assignment(eg);
auto map = MakeNodeMap(eg);
EXPECT_EQ(assignment[map["a"]], 0);
EXPECT_EQ(assignment[map["b"]], 1);
EXPECT_EQ(assignment[map["c"]], 4);
EXPECT_EQ(assignment[map["d"]], 0);
EXPECT_EQ(assignment[map["e"]], 1);
EXPECT_EQ(assignment[map["f"]], 3);
// it's the only child of c with stream idx 4, but gets 2
EXPECT_EQ(assignment[map["g"]], 4);
}
TEST(Exec2Test, StreamAssignment_PerOperator_43) {
ExecGraph eg;
graph::OpGraph::Builder b;
b.Add("a",
SpecGPU()
.AddOutput("a->c", "gpu")
.AddOutput("a->d", "gpu")
.AddOutput("a->e", "gpu")
.AddOutput("a->f", "gpu"));
b.Add("b",
SpecGPU()
.AddOutput("b->c", "gpu")
.AddOutput("b->d", "gpu")
.AddOutput("b->e", "gpu")
.AddOutput("b->f", "gpu"));
b.Add("c",
SpecGPU()
.AddInput("a->c", "gpu")
.AddInput("b->c", "gpu")
.AddOutput("c->oc", "gpu"));
b.Add("d",
SpecGPU()
.AddInput("a->d", "gpu")
.AddInput("b->d", "gpu")
.AddOutput("d->od", "gpu"));
b.Add("e",
SpecGPU()
.AddInput("a->e", "gpu")
.AddInput("b->e", "gpu")
.AddOutput("e->g", "gpu"));
b.Add("f",
SpecGPU()
.AddInput("a->f", "gpu")
.AddInput("b->f", "gpu")
.AddOutput("f->h", "gpu"));
b.Add("g",
SpecGPU()
.AddInput("e->g", "gpu")
.AddOutput("g->og", "gpu"));
b.Add("h",
SpecGPU()
.AddInput("f->h", "gpu")
.AddOutput("h->oh", "gpu"));
b.AddOutput("c->oc_gpu");
b.AddOutput("d->od_gpu");
b.AddOutput("g->og_gpu");
b.AddOutput("h->oh_gpu");
auto g = std::move(b).GetGraph(true);
eg.Lower(g);
StreamAssignment<StreamPolicy::PerOperator> assignment(eg);
auto map = MakeNodeMap(eg);
EXPECT_EQ(assignment[map["a"]], 0);
EXPECT_EQ(assignment[map["b"]], 1);
EXPECT_EQ(assignment[map["c"]], 0);
EXPECT_EQ(assignment[map["d"]], 1);
EXPECT_EQ(assignment[map["e"]], 3);
EXPECT_EQ(assignment[map["f"]], 4);
// e gets 2 instead of 3
EXPECT_EQ(assignment[map["g"]], 3); // it's the only child of e
EXPECT_EQ(assignment[map["h"]], 4);
}
Signed-off-by: Michal Zientkiewicz <[email protected]>
CI MESSAGE: [18115353]: BUILD STARTED |
There are still some problems with per-operator assignment. I have some promising ideas, but I've removed per-operator assignment from this PR and will do it as a follow-up. |
CI MESSAGE: [18115353]: BUILD PASSED |
Category:
New feature (non-breaking change which adds functionality)
Description:
Executor 2.0 doesn't have "stages" and streams are assigned according to stream assignment policy.
Stream assignment policies
Executor2
assigns streams to operator nodes based on theStreamPolicy
configuration parameter. There are three stream policies:Single
- there's just one stream used by all operators which need one.PerBackend
- there's a distinct stream per backend (GPU, Mixed)PerOperator
- operators which can execute independently are assigned distinct streams - but operators with sequential dependency use the same stream - NOT IMPLEMENTED IN THIS PRAdditional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-4030