-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for run-bug-run runbugrun #39 WIP #166
Conversation
Hi @cadddr ! There is currently a failure in the RunBugRun tests. Seems to be an error in loading the dataframe. |
Since this isn’t file not found, could be a version/deprecation issue with
pandas? What version is being installed, so I can reproduce? Thanks
…On Mon, Dec 2, 2024 at 10:28 PM André Silva ***@***.***> wrote:
Hi @cadddr <https://github.com/cadddr> !
There is currently a failure in the RunBugRun tests.
See
https://github.com/ASSERT-KTH/repairbench-framework/actions/runs/12126690735/job/33810455158?pr=166#step:13:337
Seems to be an error in loading the dataframe.
—
Reply to this email directly, view it on GitHub
<#166 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABWMCVO4CSBZWZCTHME2NUT2DVTUBAVCNFSM6AAAAABQ3CF6BWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJTHA2TCNZQGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
When you run Right now that is |
I also have pandas==2.2.3 In the log there is a deprecation warning for using path string as argument to read_json. Wrapped into file stream. Hopefully this passes, otherwise, not sure how to debug this. |
Now the problem seems to be related with a FileNotFound. |
My path: Double checked commands in setup.sh download and unpack the file correctly:
Why is the working dir in the log |
Trying to fix path, and rebased to latest master. Let's see if we can fix this. |
Fixed the file not found problem by changing the benchmark directory to a submodule. We not get another error, during the execution of a RunBugRun bug. |
Thanks for fixing the paths. Bug-related errors do not consistently reproduce since we're taking 3 bugs from an unordered dict. After fixing the order (and running 20 bugs instead of 3), I'm getting the first failure fail on print (result) print (test_output) |
Thanks for noticing the randomness bug! I fixed the issue for all benchmarks and rebased this PR with the latest commits from master. For the problem of comparing outputs, the straigh-forward solution would be to eval the strings and compare the values. |
@andre15silva PR back from the dead. I imported output post-processing logic from runbugrun, now tests pass. Please check if this is mergeable. |
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the RunBugRun benchmark to the elle-elle-aime project, along with necessary components like an output matcher and a bug representation. The changes also include integration of this benchmark into the existing framework and a new prompting strategy. Overall, the implementation seems well-structured and addresses the issue #39. However, there are a few areas that could benefit from further attention.
Summary of Findings
- Potential caching issue in
get_failing_tests
: Theget_failing_tests
function inrunbugrun.py
has a section that appears to handle caching of test results. However, the logic for determining whether to execute tests or use cached results could be improved to ensure consistent behavior. Specifically, the conditionif not already_cached
only applies when there isn't a runtime exception, potentially leading to missed caching opportunities or unnecessary test executions. - Inconsistent file handling in
RunBugRunBug.checkout
: TheRunBugRunBug.checkout
method inrunbugrunbug.py
uses a hardcoded path 'buggy' when copying files, which might not be flexible enough if the benchmark structure changes in the future. Consider making this more configurable or dynamically determined. - Missing error handling in
RunBugRunBug.execute_test_case
: TheRunBugRunBug.execute_test_case
function inrunbugrunbug.py
catchesOSError
andsubprocess.TimeoutExpired
, but it might be beneficial to log these exceptions or provide more context in the returned error message to aid in debugging.
Merge Readiness
The pull request introduces significant new functionality and requires careful review. While the code appears well-structured, the potential caching issue and file handling inconsistencies should be addressed before merging. I am unable to approve this pull request, and recommend that others review and approve this code before merging. At a minimum, the high severity issues should be addressed before merging.
Thanks @cadddr ! I went over the code again and also ran the Gemini review. Left some comments and waiting for the tests to run. |
Thanks @cadddr! I'm just responding here to note that I don't have the ability to run the linter or take any actions on the pull request. I can only respond to comments and provide code suggestions. It looks like you're asking the author to run the linter locally, which is the correct approach. |
@andre15silva sure I can run linting and remove FIXME comments. Tests are failing with run bug run json file being not found. I noticed download commands for that have been removed from setup.sh -- why is that? Should I simply rollback the file to the version that worked? |
Hi @andre15silva, Regarding the missing RunBugRun JSON file, it appears the download commands were removed from I noticed that the benchmark directory was changed to a submodule, but then commented out in Here's a suggestion to add the extraction of the test results in --- a/setup.sh
+++ b/setup.sh
@@ -22,3 +22,7 @@
if [ -z "$CI" ]; then
poetry run ./gitbug-java setup;
fi
+
+cd benchmarks/run_bug_run;
+tar -xvzf buggy_test_results.tgz;
+cd ../..; |
@andre15silva Data download commands for runbugrun were removed in this commit: cadddr@b83f855 Initial setting up of benchmarks/run_bug_run as submodule was a misunderstanding as it linked to rbugr original ruby repo, whereas the intention for this directory is to simply house the data files. Let me know how best to resolve this. |
@andre15silva also the submodule entry was already taken out here: cadddr@1622d83 |
Could have been problems with the merge, not sure what happened. We can fix the setup.sh, the version in WDYT? |
@andre15silva made all changes, let's see those tests pass. |
@andre15silva When running set up this file gets downloaded but not to the right directory as the preceding cd command fails: This directory exists in the repo e.g., https://github.com/cadddr/elle-elle-aime/blob/master/benchmarks/run_bug_run/buggy_test_results.tgz Is that because setup for GitBugs-Java forgets to exit from its directory? cd benchmarks/gitbug-java; For me pwd shows: /workspaces/elle-elle-aime, I recall we had this issue before e.g., #166 (comment) |
@andre15silva look like all tests pass! we're all set to merge? |
LGTM! Let's wait for @t-sorger to take a look and give his opinion and then we can merge. He's also been working on integrating another Python benchmark. |
The code generally looks good to me. I do have some concerns about the Python utilities, as I ran this code previously with slight modifications while developing my own version. It might not be fully compatible with the BugsInPy benchmark, so it’s something to keep in mind when integrating BugsInPy to ensure the utilities don’t break for run-bug-run (and vice versa). That said, it should be fine for now. |
Admittedly, I've put in the bare minimum needed just to make Run bug run work. Agreed they will have to be generalized as we onboard more python datasets, perhaps in the next PR. With this second dataset BugsInPy we should get a better idea of what parts are general vs dataset-specific. |
Sounds good, merging it. Good work @cadddr, thanks for the PR :)) |
#39 WIP