Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate BugsInPy #184

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open

Integrate BugsInPy #184

wants to merge 43 commits into from

Conversation

t-sorger
Copy link
Collaborator

WIP: Issue #178

@andre15silva
Copy link
Member

Hi @t-sorger,

Any update here?

FYI, it would probably be a good idea to rebase with master, I made some updates recently.

@t-sorger
Copy link
Collaborator Author

Hi @andre15silva,

I am still encountering issues with BugsInPy and how it works.
I will send you an email so we can arrange a meeting to discuss the problems I’m facing.

Thanks for the hint, I will rebase with master as soon as possible!

@andre15silva
Copy link
Member

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates the BugsInPy benchmark into the elleelleaime framework. The changes include adding the BugsInPy submodule, creating classes to represent the BugsInPy benchmark and bugs, and adding a new prompting strategy for Python bugs. Overall, the integration seems well-structured, but there are a few areas that could be improved for clarity and maintainability.

Summary of Findings

  • Error Handling in Bug ID Parsing: The code includes error handling for invalid bug IDs, which is good. However, consider logging the specific error message for better debugging.
  • Duplicated Code: The code to checkout both the buggy and fixed versions of the code is duplicated in TestBugsInPy.checkout_bug. Consider refactoring this into a single function.
  • Hardcoded Paths: There are several hardcoded paths in the code, such as in extract_single_function and TestBugsInPy. Consider using os.path.join or pathlib to make the code more portable.

Assessment

The pull request introduces the BugsInPy benchmark, which is a valuable addition to the project. The code is generally well-structured, but there are some areas that could be improved for clarity, maintainability, and robustness. I recommend addressing the review comments before merging. Please ensure that others review and approve this code before merging.

Comment on lines +65 to +68
except ValueError:
logging.warning(
f"Skipping invalid bug ID: {bug_id.decode('utf-8')}"
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider logging the specific ValueError exception message for better debugging. This will help identify the cause of the invalid bug ID.

For example, you can log str(e) to capture the error message.

Suggested change
except ValueError:
logging.warning(
f"Skipping invalid bug ID: {bug_id.decode('utf-8')}"
)
except ValueError as e:
logging.warning(
f"Skipping invalid bug ID: {bug_id.decode('utf-8')}. Error: {str(e)}"
)

Optional[Tuple[str, str]]: None if the bug is not single-function, otherwise a tuple of the form (buggy_code, fixed_code)
"""
project_name, _ = bug.get_identifier().rsplit("-", 1)
path = f"./benchmarks/BugsInPy/projects/{project_name}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This path is hardcoded. Consider using os.path.join or pathlib to construct the path, making the code more portable and robust to directory structure changes.

Suggested change
path = f"./benchmarks/BugsInPy/projects/{project_name}"
path = Path("./benchmarks/BugsInPy/projects", project_name)

Comment on lines +30 to +55
try:
# Checkout buggy version
bug.checkout(bug_identifier, fixed=False)

project_name, _ = bug_identifier.rsplit("-", 1)
path = f"./benchmarks/BugsInPy/framework/bin/temp/{project_name}"

# Assert that there are files in the directories
if len(list(Path(path).glob("**/*"))) == 0:
return False
# Assert that we can reach some Python files
buggy_python_files = list(Path(path).glob("**/*.py"))
if len(buggy_python_files) == 0:
return False

# Checkout fixed version
bug.checkout(bug_identifier, fixed=True)
# Assert that there are files in the directories
if len(list(Path(path).glob("**/*"))) == 0:
return False
# Assert that we can reach some Python files
buggy_python_files = list(Path(path).glob("**/*.py"))
if len(buggy_python_files) == 0:
return False

return True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code to checkout both the buggy and fixed versions is duplicated. Consider refactoring this into a single function to reduce code duplication and improve maintainability.

You could create a helper function that takes the bug identifier and a boolean indicating whether to checkout the fixed version. This function would handle the checkout process and the assertions.

    def checkout_bug_version(self, bug: Bug, fixed: bool) -> bool:
        bug_identifier = bug.get_identifier()
        project_name, _ = bug_identifier.rsplit("-", 1)
        path = f"./benchmarks/BugsInPy/framework/bin/temp/{project_name}"

        try:
            # Checkout specified version
            bug.checkout(bug_identifier, fixed=fixed)

            # Assert that there are files in the directories
            if len(list(Path(path).glob("**/*"))) == 0:
                return False
            # Assert that we can reach some Python files
            buggy_python_files = list(Path(path).glob("**/*.py"))
            if len(buggy_python_files) == 0:
                return False

            return True
        finally:
            shutil.rmtree(path, ignore_errors=True)

    def test_checkout_bugs(self):
        bugs_in_py = get_benchmark("BugsInPy")
        assert bugs_in_py is not None
        bugs_in_py.initialize()

        # Run only the first 3 bugs to not take too long
        bugs = list(bugs_in_py.get_bugs())[:3]
        assert bugs is not None

        for bug in bugs:
            assert self.checkout_bug_version(bug, fixed=False), f"Failed checkout for {bug.get_identifier()} (buggy version)"
            assert self.checkout_bug_version(bug, fixed=True), f"Failed checkout for {bug.get_identifier()} (fixed version)"

bug.checkout(bug_identifier, fixed=False)

project_name, _ = bug_identifier.rsplit("-", 1)
path = f"./benchmarks/BugsInPy/framework/bin/temp/{project_name}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This path is hardcoded. Consider using os.path.join or pathlib to construct the path, making the code more portable and robust to directory structure changes.

            path = Path("./benchmarks/BugsInPy/framework/bin/temp", project_name)

)

# Assert we are dealing with the correct bug and strategy
assert sample["identifier"] == "Closure-115"
assert sample["identifier"] == "youtube-dl-1"
assert sample["prompt_strategy"] == "instruct"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The prompt strategy is hardcoded as "instruct". However, the class variable PROMPT_STRATEGY is set to instruct_python. Please ensure that these are consistent, and that the correct prompt strategy is being used.

Suggested change
assert sample["prompt_strategy"] == "instruct"
assert sample["prompt_strategy"] == "instruct_python"

@andre15silva
Copy link
Member

@t-sorger just testing the gemini code assistant, don't mind the comments

Copy link

Thanks for testing the Gemini code assistant, @andre15silva! I've reviewed the changes and have provided some feedback in the review comments. Let me know if you have any questions.

@t-sorger
Copy link
Collaborator Author

@andre15silva still interesting to see what feedback Gemini gives!

Copy link

Hi @t-sorger,

It's great to see the BugsInPy benchmark being integrated! It's definitely interesting to see the feedback from the Gemini code assistant. I hope the review was helpful!

@monperrus
Copy link
Contributor

hi all, how to complete this important task?

@andre15silva
Copy link
Member

hi all, how to complete this important task?

The major blocker right now is the execution of samples since executing them locally requires installing a lot of dependencies.

@t-sorger is working on dockerizing BugsInPy so the samples execute in an isolated environment with their own set of dependencies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants