Skip to content

Commit

Permalink
Perf improvement to subgraph selection (#4155)
Browse files Browse the repository at this point in the history
Perf improvement to get_subset_graph
Co-authored-by: Ian Knox <[email protected]>
  • Loading branch information
leahwicz authored Oct 29, 2021
1 parent 178f74b commit dd7af47
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 4 deletions.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

### Features
- Allow nullable `error_after` in source freshness ([#3874](https://github.com/dbt-labs/dbt-core/issues/3874), [#3955](https://github.com/dbt-labs/dbt-core/pull/3955))

- Increase performance of graph subset selection ([#4135](https://github.com/dbt-labs/dbt-core/issues/4135),[#4155](https://github.com/dbt-labs/dbt-core/pull/4155))
### Fixes
- Changes unit tests using `assertRaisesRegexp` to `assertRaisesRegex`

Contributors:
- [@kadero](https://github.com/kadero) ([3955](https://github.com/dbt-labs/dbt-core/pull/3955))
- [@frankcash](https://github.com/frankcash) ([4136](https://github.com/dbt-labs/dbt-core/pull/4136)
- [@frankcash](https://github.com/frankcash) ([4136](https://github.com/dbt-labs/dbt-core/pull/4136))

## dbt-core 1.0.0b2 (October 25, 2021)

Expand Down
15 changes: 13 additions & 2 deletions core/dbt/graph/graph.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from typing import (
Set, Iterable, Iterator, Optional, NewType
)
from itertools import product
import networkx as nx # type: ignore

from dbt.exceptions import InternalException
Expand Down Expand Up @@ -77,17 +78,26 @@ def select_successors(self, selected: Set[UniqueId]) -> Set[UniqueId]:
successors.update(self.graph.successors(node))
return successors

def get_subset_graph(self, selected: Iterable[UniqueId]) -> 'Graph':
def get_subset_graph(self, selected: Iterable[UniqueId]) -> "Graph":
"""Create and return a new graph that is a shallow copy of the graph,
but with only the nodes in include_nodes. Transitive edges across
removed nodes are preserved as explicit new edges.
"""
new_graph = nx.algorithms.transitive_closure(self.graph)

new_graph = self.graph.copy()
include_nodes = set(selected)

for node in self:
if node not in include_nodes:
source_nodes = [x for x, _ in new_graph.in_edges(node)]
target_nodes = [x for _, x in new_graph.out_edges(node)]

new_edges = product(source_nodes, target_nodes)
non_cyclic_new_edges = [
(source, target) for source, target in new_edges if source != target
] # removes cyclic refs

new_graph.add_edges_from(non_cyclic_new_edges)
new_graph.remove_node(node)

for node in include_nodes:
Expand All @@ -96,6 +106,7 @@ def get_subset_graph(self, selected: Iterable[UniqueId]) -> 'Graph':
"Couldn't find model '{}' -- does it exist or is "
"it disabled?".format(node)
)

return Graph(new_graph)

def subgraph(self, nodes: Iterable[UniqueId]) -> 'Graph':
Expand Down

0 comments on commit dd7af47

Please sign in to comment.