Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify unexpected namespace filtering behavior #3448

Open
Tracked by #4343
marrrcin opened this issue Dec 20, 2023 · 6 comments
Open
Tracked by #4343

Clarify unexpected namespace filtering behavior #3448

marrrcin opened this issue Dec 20, 2023 · 6 comments

Comments

@marrrcin
Copy link
Contributor

Description

I have observed an unexpected behavior related to namespace filtering in modular pipelines.

Context

Specifically, when I have two modular pipelines under the same namespace, such as:

  • yolo.marketing
  • yolo.marketing_b2b
    and I attempt to run a single pipeline using kedro run --namespace yolo.marketing, both pipelines get executed.

This happens because of the filtering based on startswith:

if n.namespace and n.namespace.startswith(node_namespace)

Steps to Reproduce

def namespaced_pipeline():
    p = lambda ns: pipeline(
        [
            node(
                func=lambda: print(ns) or 666,
                inputs=None,
                outputs="not_important1",
                name="node1",
            )
        ]
    )
    namespaces = ["marketing", "marketing_b2b"]
    pipes = []
    for ns in namespaces:
        pipes.append(pipeline(p(ns), namespace=f"yolo.{ns}"))
    return sum(pipes)

Then in CLI:
kedro run --namespace yolo.marketing

Expected Result

Only single pipeline is executed (yolo.marketing one).

Actual Result

Both pipelines are executed.

Logs ⬇️
                    INFO     Running node: node1: <lambda>(None) -> [yolo.marketing.not_important1]                                                                                                                                                                                                                                                 node.py:331
marketing
                    INFO     Saving data to 'yolo.marketing.not_important1' (MemoryDataset)...                                                                                                                                                                                                                                              data_catalog.py:541
                    INFO     Completed 1 out of 2 tasks                                                                                                                                                                                                                                                                                 sequential_runner.py:85
                    INFO     Running node: node1: <lambda>(None) -> [yolo.marketing_b2b.not_important1]                                                                                                                                                                                                                                             node.py:331
marketing_b2b
                    INFO     Saving data to 'yolo.marketing_b2b.not_important1' (MemoryDataset)...                                                                                                                                                                                                                                          data_catalog.py:541
                    INFO     Completed 2 out of 2 tasks                                                                                                                                                                                                                                                                                 sequential_runner.py:85
                    INFO     Pipeline execution completed successfully.                                                                                                                                                                                                                                                                           runner.py:105
                    INFO     Loading data from 'yolo.marketing.not_important1' (MemoryDataset)...                                                                                                                                                                                                                                           data_catalog.py:502
                    INFO     Loading data from 'yolo.marketing_b2b.not_important1' (MemoryDataset)... 

Your Environment

  • Kedro version used (pip show kedro or kedro -V): 0.18.14
  • Python version used (python -V): 3.11.5
  • Operating system and version: macOS 13.0.1
@astrojuanlu
Copy link
Member

Thanks @marrrcin for reporting! The Framework team is at low capacity at the moment but we'll have a look at this soon.

@noklam
Copy link
Contributor

noklam commented Dec 20, 2023

Capture some relevant discussion in Slack. PR are welcome since I think this should be a straightforward fix without introducing more complicated tree structure. On a side note, kedro-viz probably did build this internal tree structure for the visualisation, so if we end up needing this we should check how it's done.

 def only_nodes_with_namespace(self, node_namespace: str) -> Pipeline:
        ...
        nodes = [
            n
            for n in self.nodes
            if n.namespace and n.namespace.startswith(node_namespace)

This is the related code in Pipeline API. The reason to use startswith is to deal with nested namespace such as x.y.

Nok
19 hours ago
Can we fix this with an extra condition. Basically we expect the namespace is a prefix, so the following character should be a . otherwise it's not really a targeted namespace?

@amitpoorab
Copy link

I can try to fix , are you accepting PR from new folks?

@noklam
Copy link
Contributor

noklam commented Mar 9, 2024

@amitpoorab feel free to make a draft PR

@noklam noklam added this to the Something about namespace milestone Apr 4, 2024
@merelcht merelcht changed the title Unexpected Namespace Filtering Behavior in Modular Pipelines Clarify Unexpected Namespace Filtering Behavior in Modular Pipelines Apr 8, 2024
@merelcht
Copy link
Member

merelcht commented Apr 8, 2024

Spike to clarify how namespace filtering behaves across Kedro and Kedro-Viz. Are they consistent currently and what would the implication be of changing the kedro run --namespace ... filter to be an exact match? Outcome of the spike is a summary of current behaviour and proposal of how it can be improved.

@noklam
Copy link
Contributor

noklam commented Apr 8, 2024

We need to clarify this case:

  • a
  • a.b
  • a.c

If one choosing namespace = a, should a.b, and a.c be filtered? Changing from .startswith to exact string matching will not filter out the sub-pipeline properly. Namespace is a tree structure (at least according to kedro-viz).

The case report in the description is definitely a bug regardless.

@astrojuanlu astrojuanlu changed the title Clarify Unexpected Namespace Filtering Behavior in Modular Pipelines Clarify unexpected namespace filtering behavior May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants