Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]Issues with Handling problem Parameter in generate_dataset Function #106

Closed
HerloConnell opened this issue Dec 5, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@HerloConnell
Copy link

Description:

I encountered issues with the generate_dataset function in the rl4co/data/generate_data.py file, specifically related to handling the problem parameter when generating datasets for specific problems like TSP.

AssertionError with Single Problem String

Passing a single string for the problem parameter (e.g., problem="tsp") triggers an assertion error:

assert filename is None or (
    len(problem) == 1 and len(graph_sizes) == 1
), "Can only specify filename when generating a single dataset"

This assertion fails because len("tsp") is 3, not 1. This seems contrary to the intended use.

Unhashable Type Error with Problem as List

When passing the problem as a list (e.g., problem=["tsp"]) to avoid the assertion error, another issue arises:

problems = {
    problem: distributions_per_problem[problem]
    if data_distribution == "all"
    else [data_distribution]
}

This code block throws an error: unhashable type: 'list', as the list cannot be used as a key in the dictionary comprehension.

I would greatly appreciate your attention to this matter. Thank you for your work on this project!

@HerloConnell HerloConnell added the bug Something isn't working label Dec 5, 2023
@fedebotu
Copy link
Member

fedebotu commented Dec 5, 2023

Hi there! Thanks for reporting the bug. Could you tell us more about how you run that function? In other words, could you share some minimal code to reproduce it (bash script or python function call?

@HerloConnell
Copy link
Author

Hi, and thank you for your prompt response! Below is a minimal Python script that I used to reproduce the issue:

from rl4co.data.generate_data import generate_dataset

datadir = "data"
problem = "tsp"
name = ["val", "test"]
seed = [4321, 1234]
graph_sizes = [200, 500, 1000]

for name_, seed_ in zip(name, seed):
    for graph_size in graph_sizes:
        fname = os.path.join(
            datadir,
            "{}{}_{}_seed{}.npz".format(problem, graph_size, name_, seed_)
        )
        generate_dataset(
            data_dir=datadir, name=name_, problem=[problem], seed=seed_,
            dataset_size=128, graph_sizes=[graph_size],
            filename=fname
        )

@fedebotu
Copy link
Member

fedebotu commented Dec 5, 2023

Now it should be fixed!

import os
from rl4co.data.generate_data import generate_dataset

datadir = "data"
problem = "tsp"
name = ["val", "test"]
seed = [4321, 1234]
graph_sizes = [200, 500, 1000]

for name_, seed_ in zip(name, seed):
    for graph_size in graph_sizes:
        fname = os.path.join(
            datadir,
            "{}{}_{}_seed{}.npz".format(problem, graph_size, name_, seed_)
        )

        generate_dataset(
            data_dir=datadir, name=name_, problem=problem, seed=seed_,
            dataset_size=128, graph_sizes=graph_size,
            filename=fname
        )

We used the above function mostly with Hydra, so that is why we never noticed the bug. Now it can accept lists and strings and should do everything by itself, so no need to wrap e.g. [problem] but problem can be passed directly. Let us know if you find any other issue!

@HerloConnell
Copy link
Author

Thank you very much for the swift fix and the detailed clarification! It's great to hear that the function now supports both lists and strings directly. If I come across any other issues, I'll be sure to let you know. Thanks again! 🤗🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants