Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iteratively open and close different Kedro sessions from code #4087

Closed
jround1 opened this issue Aug 13, 2024 · 5 comments
Closed

Iteratively open and close different Kedro sessions from code #4087

jround1 opened this issue Aug 13, 2024 · 5 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@jround1
Copy link

jround1 commented Aug 13, 2024

Description

From Kedro, I would like to iterate over multiple Kedro sessions and their contexts to test. I am able to iterate from CLI through the different projects but in via pytest the second create session just returns the first. I understand there may be magic for notebooks with %kedro_reload.

We have different Kedro sessions/contexts because we only load the parts of the data catalog and parameters that we need for the pipeline we are running (patterns in settings.py), which eases integration with Kubeflow. Then we want to test that there are no undefined node inputs/outputs and that all catalog entries are being used, project by project.

Steps to Reproduce

Below is an example of the fixture and test. I'll save space and not copy the second fixture and test (session2 & test_project_catalog2) that have example_project2 and example_pipeline2:

@pytest.fixture
def session1():
    os.environ["PROJECT"] = "example_project1"
    os.environ["PIPELINE"] = "example_pipeline1"

    bootstrap_project(Path.cwd())
    try:
        return KedroSession.create(
            package_name="example_package",
            project_path=Path(Path.cwd()),
            save_on_close=False,
        )
    except Exception as exc:
        raise KedroCliError(
            f"Unable to instantiate Kedro session.\nError: {exc}"
        ) from exc

def test_project_catalog(session1):
    context = session1.load_context()

from settings.py:

    CONFIG_LOADER_ARGS = {
        "config_patterns": {
            "catalog": [f"catalogs/resolved/kf_catalog_{project}_{pipeline}.yml"],
            "parameters": [
                f"parameters/set_parameters/parameters_{project}_{pipeline}.yml",
                f"parameters/{project}/*.yml",
            ],
        },

Expected Result

session1 and session2 should be different. session2 is correct if I remove session1 or call it first.

Actual Result

session2 is just session1 again.

Your Environment

(Kedro version: 0.18.10)

@jround1 jround1 changed the title <Title> Iteratively open and close different Kedro sessions from code Aug 14, 2024
@lrcouto
Copy link
Contributor

lrcouto commented Aug 14, 2024

Hey @jround1, would it be possible to give some more context or more concrete examples so we can understand what are you trying to do with your testing.

You could also try to bring the issue up on our questions channel on the Kedro slack: https://slack.kedro.org/

@jround1
Copy link
Author

jround1 commented Aug 14, 2024

@lrcouto sure thing! Let me try again...
The Context:
We have a monorepo with many projects, and many more to come. The project specific code/pipelines, catalogs and parameters are all segregated using naming conventions retrieved via env vars at session/context creation time in order to exclude other project specific code/pipelines, catalogs and parameters. There is a lot of shared code/pipelines, catalogs and parameters as well that are included
So settings.py pattern matches by project and pipeline something like (modified from above to better demonstrate):

    CONFIG_LOADER_ARGS = {
        "config_patterns": {
            "catalog": [f"catalogs/catalog_{project}_{pipeline}.yml", "catalog_generic.yml"],
            "parameters": [
                f"parameters/{project}/*.yml", "parameters_generic.yml"
            ],
        },

And pipeline_registry.py only registers the pipelines defined for the specific project:

project_module = __import__(
    "monorepo." + project, fromlist=["pipeline_registry"]
).pipeline_registry

This creates a leaner session/context that is easier to manage between projects and in Kubeflow integration.

What I would like to do:
Test the catalogs and pipelines of each project in the monorepo programmatically using pytest along with all of the other unit tests. For example test that the catalogs don't have entries that are not used in pipelines, or test that all pipeline inputs and outputs are defined, etc. etc.

The problem:
After creating the first session, either in a fixture or a plain function, the subsequent sessions all contain the same context as the first one, they do not actually create a new context. Whichever project's session I create first will be the context, and so the pipelines and datacatalog for all of the subsequent sessions, preventing me from testing all of the projects.

Let me know if you have more questions or if it would be more appropriate in another forum. Thanks!

@merelcht
Copy link
Member

Hi @jround1, Your setup is quite complex, so I'm not entirely sure I understand it all 100%. I think the problem here is that a session is created on a project level, not per pipeline. It's the session.run() that can be filtered per pipeline, but not session.create(). It's a bit misleading maybe, because it takes the package_name argument but that doesn't have a real use..

@merelcht merelcht added the Community Issue/PR opened by the open-source community label Sep 19, 2024
@jround1
Copy link
Author

jround1 commented Oct 3, 2024

Thanks @merelcht
It doesn't really matter what my setup is because in my testing I have removed the complexity. What I want is to set up unit tests involving Kedro session configurations. I also want to test using different Kedro session configurations. But after the first session is created there is no way to create any more sessions, they just have the first session's evaluated parameters, datacatalog, etc.

@merelcht merelcht added Issue: Feature Request New feature or improvement to existing feature and removed Community Issue/PR opened by the open-source community labels Nov 1, 2024
@astrojuanlu
Copy link
Member

Moving this to discussions.

@kedro-org kedro-org locked and limited conversation to collaborators Feb 18, 2025
@astrojuanlu astrojuanlu converted this issue into discussion #4492 Feb 18, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

4 participants