Iteratively open and close different Kedro sessions from code #4087

jround1 · 2024-08-13T21:27:15Z

Description

From Kedro, I would like to iterate over multiple Kedro sessions and their contexts to test. I am able to iterate from CLI through the different projects but in via pytest the second create session just returns the first. I understand there may be magic for notebooks with %kedro_reload.

We have different Kedro sessions/contexts because we only load the parts of the data catalog and parameters that we need for the pipeline we are running (patterns in settings.py), which eases integration with Kubeflow. Then we want to test that there are no undefined node inputs/outputs and that all catalog entries are being used, project by project.

Steps to Reproduce

Below is an example of the fixture and test. I'll save space and not copy the second fixture and test (session2 & test_project_catalog2) that have example_project2 and example_pipeline2:

@pytest.fixture
def session1():
    os.environ["PROJECT"] = "example_project1"
    os.environ["PIPELINE"] = "example_pipeline1"

    bootstrap_project(Path.cwd())
    try:
        return KedroSession.create(
            package_name="example_package",
            project_path=Path(Path.cwd()),
            save_on_close=False,
        )
    except Exception as exc:
        raise KedroCliError(
            f"Unable to instantiate Kedro session.\nError: {exc}"
        ) from exc

def test_project_catalog(session1):
    context = session1.load_context()

from settings.py:

    CONFIG_LOADER_ARGS = {
        "config_patterns": {
            "catalog": [f"catalogs/resolved/kf_catalog_{project}_{pipeline}.yml"],
            "parameters": [
                f"parameters/set_parameters/parameters_{project}_{pipeline}.yml",
                f"parameters/{project}/*.yml",
            ],
        },

Expected Result

session1 and session2 should be different. session2 is correct if I remove session1 or call it first.

Actual Result

session2 is just session1 again.

Your Environment

(Kedro version: 0.18.10)

The text was updated successfully, but these errors were encountered:

lrcouto · 2024-08-14T19:42:24Z

Hey @jround1, would it be possible to give some more context or more concrete examples so we can understand what are you trying to do with your testing.

You could also try to bring the issue up on our questions channel on the Kedro slack: https://slack.kedro.org/

jround1 · 2024-08-14T20:25:51Z

@lrcouto sure thing! Let me try again...
The Context:
We have a monorepo with many projects, and many more to come. The project specific code/pipelines, catalogs and parameters are all segregated using naming conventions retrieved via env vars at session/context creation time in order to exclude other project specific code/pipelines, catalogs and parameters. There is a lot of shared code/pipelines, catalogs and parameters as well that are included
So settings.py pattern matches by project and pipeline something like (modified from above to better demonstrate):

    CONFIG_LOADER_ARGS = {
        "config_patterns": {
            "catalog": [f"catalogs/catalog_{project}_{pipeline}.yml", "catalog_generic.yml"],
            "parameters": [
                f"parameters/{project}/*.yml", "parameters_generic.yml"
            ],
        },

And pipeline_registry.py only registers the pipelines defined for the specific project:

project_module = __import__(
    "monorepo." + project, fromlist=["pipeline_registry"]
).pipeline_registry

This creates a leaner session/context that is easier to manage between projects and in Kubeflow integration.

What I would like to do:
Test the catalogs and pipelines of each project in the monorepo programmatically using pytest along with all of the other unit tests. For example test that the catalogs don't have entries that are not used in pipelines, or test that all pipeline inputs and outputs are defined, etc. etc.

The problem:
After creating the first session, either in a fixture or a plain function, the subsequent sessions all contain the same context as the first one, they do not actually create a new context. Whichever project's session I create first will be the context, and so the pipelines and datacatalog for all of the subsequent sessions, preventing me from testing all of the projects.

Let me know if you have more questions or if it would be more appropriate in another forum. Thanks!

merelcht · 2024-09-19T15:16:13Z

Hi @jround1, Your setup is quite complex, so I'm not entirely sure I understand it all 100%. I think the problem here is that a session is created on a project level, not per pipeline. It's the session.run() that can be filtered per pipeline, but not session.create(). It's a bit misleading maybe, because it takes the package_name argument but that doesn't have a real use..

jround1 · 2024-10-03T19:30:47Z

Thanks @merelcht
It doesn't really matter what my setup is because in my testing I have removed the complexity. What I want is to set up unit tests involving Kedro session configurations. I also want to test using different Kedro session configurations. But after the first session is created there is no way to create any more sessions, they just have the first session's evaluated parameters, datacatalog, etc.

astrojuanlu · 2025-02-18T11:59:31Z

Moving this to discussions.

jround1 changed the title ~~<Title>~~ Iteratively open and close different Kedro sessions from code Aug 14, 2024

jround1 mentioned this issue Aug 14, 2024

Lazy Loading of Catalog Items #2829

Closed

github-actions bot mentioned this issue Sep 1, 2024

Monthly issue metrics report #4135

Closed

merelcht added the Community Issue/PR opened by the open-source community label Sep 19, 2024

merelcht added this to Kedro Wizard 🪄 Oct 31, 2024

merelcht added this to the Make the `Session` more lightweight and re-entrant milestone Nov 1, 2024

merelcht added Issue: Feature Request New feature or improvement to existing feature and removed Community Issue/PR opened by the open-source community labels Nov 1, 2024

merelcht removed this from Kedro Wizard 🪄 Nov 1, 2024

kedro-org locked and limited conversation to collaborators Feb 18, 2025

astrojuanlu converted this issue into discussion #4492 Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Iteratively open and close different Kedro sessions from code #4087

Iteratively open and close different Kedro sessions from code #4087

jround1 commented Aug 13, 2024 •

edited

Loading

lrcouto commented Aug 14, 2024

jround1 commented Aug 14, 2024

merelcht commented Sep 19, 2024

jround1 commented Oct 3, 2024

astrojuanlu commented Feb 18, 2025

This issue was moved to a discussion.

This issue was moved to a discussion.

Iteratively open and close different Kedro sessions from code #4087

Iteratively open and close different Kedro sessions from code #4087

Comments

jround1 commented Aug 13, 2024 • edited Loading

Description

Steps to Reproduce

Expected Result

Actual Result

Your Environment

lrcouto commented Aug 14, 2024

jround1 commented Aug 14, 2024

merelcht commented Sep 19, 2024

jround1 commented Oct 3, 2024

astrojuanlu commented Feb 18, 2025

This issue was moved to a discussion.

jround1 commented Aug 13, 2024 •

edited

Loading