-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate BugsInPy #184
base: master
Are you sure you want to change the base?
Integrate BugsInPy #184
Conversation
Hi @t-sorger, Any update here? FYI, it would probably be a good idea to rebase with master, I made some updates recently. |
Hi @andre15silva, I am still encountering issues with BugsInPy and how it works. Thanks for the hint, I will rebase with master as soon as possible! |
Update submodules when rebasing with master
Update submodules when rebasing with master
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request integrates the BugsInPy benchmark into the elleelleaime framework. The changes include adding the BugsInPy submodule, creating classes to represent the BugsInPy benchmark and bugs, and adding a new prompting strategy for Python bugs. Overall, the integration seems well-structured, but there are a few areas that could be improved for clarity and maintainability.
Summary of Findings
- Error Handling in Bug ID Parsing: The code includes error handling for invalid bug IDs, which is good. However, consider logging the specific error message for better debugging.
- Duplicated Code: The code to checkout both the buggy and fixed versions of the code is duplicated in
TestBugsInPy.checkout_bug
. Consider refactoring this into a single function. - Hardcoded Paths: There are several hardcoded paths in the code, such as in
extract_single_function
andTestBugsInPy
. Consider usingos.path.join
orpathlib
to make the code more portable.
Assessment
The pull request introduces the BugsInPy benchmark, which is a valuable addition to the project. The code is generally well-structured, but there are some areas that could be improved for clarity, maintainability, and robustness. I recommend addressing the review comments before merging. Please ensure that others review and approve this code before merging.
except ValueError: | ||
logging.warning( | ||
f"Skipping invalid bug ID: {bug_id.decode('utf-8')}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider logging the specific ValueError
exception message for better debugging. This will help identify the cause of the invalid bug ID.
For example, you can log str(e)
to capture the error message.
except ValueError: | |
logging.warning( | |
f"Skipping invalid bug ID: {bug_id.decode('utf-8')}" | |
) | |
except ValueError as e: | |
logging.warning( | |
f"Skipping invalid bug ID: {bug_id.decode('utf-8')}. Error: {str(e)}" | |
) |
Optional[Tuple[str, str]]: None if the bug is not single-function, otherwise a tuple of the form (buggy_code, fixed_code) | ||
""" | ||
project_name, _ = bug.get_identifier().rsplit("-", 1) | ||
path = f"./benchmarks/BugsInPy/projects/{project_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try: | ||
# Checkout buggy version | ||
bug.checkout(bug_identifier, fixed=False) | ||
|
||
project_name, _ = bug_identifier.rsplit("-", 1) | ||
path = f"./benchmarks/BugsInPy/framework/bin/temp/{project_name}" | ||
|
||
# Assert that there are files in the directories | ||
if len(list(Path(path).glob("**/*"))) == 0: | ||
return False | ||
# Assert that we can reach some Python files | ||
buggy_python_files = list(Path(path).glob("**/*.py")) | ||
if len(buggy_python_files) == 0: | ||
return False | ||
|
||
# Checkout fixed version | ||
bug.checkout(bug_identifier, fixed=True) | ||
# Assert that there are files in the directories | ||
if len(list(Path(path).glob("**/*"))) == 0: | ||
return False | ||
# Assert that we can reach some Python files | ||
buggy_python_files = list(Path(path).glob("**/*.py")) | ||
if len(buggy_python_files) == 0: | ||
return False | ||
|
||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code to checkout both the buggy and fixed versions is duplicated. Consider refactoring this into a single function to reduce code duplication and improve maintainability.
You could create a helper function that takes the bug identifier and a boolean indicating whether to checkout the fixed version. This function would handle the checkout process and the assertions.
def checkout_bug_version(self, bug: Bug, fixed: bool) -> bool:
bug_identifier = bug.get_identifier()
project_name, _ = bug_identifier.rsplit("-", 1)
path = f"./benchmarks/BugsInPy/framework/bin/temp/{project_name}"
try:
# Checkout specified version
bug.checkout(bug_identifier, fixed=fixed)
# Assert that there are files in the directories
if len(list(Path(path).glob("**/*"))) == 0:
return False
# Assert that we can reach some Python files
buggy_python_files = list(Path(path).glob("**/*.py"))
if len(buggy_python_files) == 0:
return False
return True
finally:
shutil.rmtree(path, ignore_errors=True)
def test_checkout_bugs(self):
bugs_in_py = get_benchmark("BugsInPy")
assert bugs_in_py is not None
bugs_in_py.initialize()
# Run only the first 3 bugs to not take too long
bugs = list(bugs_in_py.get_bugs())[:3]
assert bugs is not None
for bug in bugs:
assert self.checkout_bug_version(bug, fixed=False), f"Failed checkout for {bug.get_identifier()} (buggy version)"
assert self.checkout_bug_version(bug, fixed=True), f"Failed checkout for {bug.get_identifier()} (fixed version)"
bug.checkout(bug_identifier, fixed=False) | ||
|
||
project_name, _ = bug_identifier.rsplit("-", 1) | ||
path = f"./benchmarks/BugsInPy/framework/bin/temp/{project_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
|
||
# Assert we are dealing with the correct bug and strategy | ||
assert sample["identifier"] == "Closure-115" | ||
assert sample["identifier"] == "youtube-dl-1" | ||
assert sample["prompt_strategy"] == "instruct" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prompt strategy is hardcoded as "instruct". However, the class variable PROMPT_STRATEGY
is set to instruct_python
. Please ensure that these are consistent, and that the correct prompt strategy is being used.
assert sample["prompt_strategy"] == "instruct" | |
assert sample["prompt_strategy"] == "instruct_python" |
@t-sorger just testing the gemini code assistant, don't mind the comments |
Thanks for testing the Gemini code assistant, @andre15silva! I've reviewed the changes and have provided some feedback in the review comments. Let me know if you have any questions. |
@andre15silva still interesting to see what feedback Gemini gives! |
Hi @t-sorger, It's great to see the BugsInPy benchmark being integrated! It's definitely interesting to see the feedback from the Gemini code assistant. I hope the review was helpful! |
hi all, how to complete this important task? |
The major blocker right now is the execution of samples since executing them locally requires installing a lot of dependencies. @t-sorger is working on dockerizing BugsInPy so the samples execute in an isolated environment with their own set of dependencies |
WIP: Issue #178