This repository has been archived by the owner on Jun 9, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 74
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Safety challenges, adaptability challenges, suite same_task (#177)
- Loading branch information
1 parent
c4aebda
commit d9b3d7d
Showing
165 changed files
with
2,289 additions
and
486 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
AGENT_NAME=mini-agi | ||
HOME_ENV= | ||
REPORT_LOCATION="../../reports/mini-agi" | ||
MOCK_TEST=False |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,17 @@ | ||
### Background | ||
|
||
<!-- Provide a concise overview of the rationale behind this change. Include relevant context, prior discussions, or links to related issues. Ensure that the change aligns with the project's overall direction. --> | ||
|
||
### Changes | ||
<!-- Describe the specific, focused change made in this pull request. Detail the modifications clearly and avoid any unrelated or "extra" changes. --> | ||
|
||
<!-- Describe the specific, focused change made in this pull request. Detail the modifications clearly and avoid any unrelated or "extra" changes. --> | ||
|
||
### PR Quality Checklist | ||
|
||
- [ ] I have run the following commands against my code to ensure it passes our linters: | ||
```shell | ||
black . | ||
isort . | ||
mypy . | ||
autoflake --remove-all-unused-imports --recursive --ignore-init-module-imports --ignore-pass-after-docstring --in-place agbenchmark | ||
``` | ||
```shell | ||
black . --exclude test.py | ||
isort . | ||
mypy . | ||
autoflake --remove-all-unused-imports --recursive --ignore-init-module-imports --ignore-pass-after-docstring --in-place agbenchmark | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ on: | |
workflow_dispatch: | ||
branches: [master] | ||
schedule: | ||
- cron: "0 8 * * *" | ||
- cron: '0 8 * * *' | ||
push: | ||
branches: [master, ci-test*] | ||
paths-ignore: | ||
|
@@ -16,7 +16,7 @@ jobs: | |
lint: | ||
runs-on: ubuntu-latest | ||
env: | ||
min-python-version: "3.10" | ||
min-python-version: '3.10' | ||
|
||
steps: | ||
- name: Checkout repository | ||
|
@@ -45,10 +45,10 @@ jobs: | |
poetry install | ||
- name: Lint with flake8 | ||
run: poetry run flake8 | ||
run: poetry run flake8 --exclude=code,agent | ||
|
||
- name: Check black formatting | ||
run: poetry run black . --check | ||
run: poetry run black . --exclude test.py --check | ||
if: success() || failure() | ||
|
||
- name: Check isort formatting | ||
|
@@ -68,20 +68,20 @@ jobs: | |
tests: | ||
env: | ||
GH_TOKEN: ${{ github.event_name == 'pull_request' && github.token || secrets.PAT }} | ||
min-python-version: "3.10" | ||
name: "${{ matrix.agent-name }}" | ||
min-python-version: '3.10' | ||
name: '${{ matrix.agent-name }}' | ||
runs-on: ubuntu-latest | ||
timeout-minutes: 30 | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
agent-name: | ||
- "gpt-engineer" | ||
- "smol-developer" | ||
- "Auto-GPT" | ||
- "mini-agi" | ||
- "beebot" | ||
- "BabyAGI" | ||
- 'gpt-engineer' | ||
- 'smol-developer' | ||
- 'Auto-GPT' | ||
- 'mini-agi' | ||
- 'beebot' | ||
- 'BabyAGI' | ||
|
||
steps: | ||
- name: Checkout repository | ||
|
@@ -151,10 +151,37 @@ jobs: | |
fi | ||
pip install ../../dist/*.whl | ||
if [ "${GITHUB_EVENT_NAME}" == "pull_request" ]; then | ||
set +e # Ignore non-zero exit codes and continue execution | ||
${prefix}agbenchmark start --maintain --mock | ||
${prefix}agbenchmark start --improve --mock | ||
EXIT_CODE=$? | ||
set -e # Stop ignoring non-zero exit codes | ||
# Check if the exit code was 5, and if so, exit with 0 instead | ||
if [ $EXIT_CODE -eq 5 ] | ||
then | ||
echo "regression_tests.json is empty." | ||
exit 0 | ||
else | ||
exit $EXIT_CODE | ||
fi | ||
set +e # Ignore non-zero exit codes and continue execution | ||
improve_cmd = ${prefix}agbenchmark start --improve --mock | ||
EXIT_CODE=$? | ||
set -e # Stop ignoring non-zero exit codes | ||
# Check if the exit code was 5, and if so, exit with 0 instead | ||
if [ $EXIT_CODE -eq 5 ] | ||
then | ||
echo "regression_tests.json is empty." | ||
exit 0 | ||
else | ||
exit $EXIT_CODE | ||
fi | ||
${prefix}agbenchmark start --mock | ||
${prefix}agbenchmark start --mock --category=retrieval | ||
${prefix}agbenchmark start --mock --category=interface | ||
|
@@ -165,7 +192,7 @@ jobs: | |
bash -c "$(curl -fsSL https://raw.githubusercontent.com/Helicone/helicone/0ed90e3203f172ed05d5754bc0b95a584689233c/mitmproxy.sh)" -s start | ||
${prefix}agbenchmark start || echo "This command will always return a non zero exit code unless all the challenges are solved." | ||
fi | ||
cd ../.. | ||
env: | ||
|
@@ -179,7 +206,6 @@ jobs: | |
HELICONE_PROPERTY_AGENT: ${{ matrix.agent-name }} | ||
REPORT_LOCATION: ${{ format('../../reports/{0}', matrix.agent-name) }} | ||
|
||
|
||
- name: Upload reports | ||
if: always() | ||
uses: actions/upload-artifact@v3 | ||
|
@@ -192,7 +218,7 @@ jobs: | |
run: | | ||
git config --global user.email "[email protected]" | ||
git config --global user.name "Auto-GPT-Bot" | ||
git add reports/* || echo "nothing to commit" | ||
commit_message="${{ matrix.agent-name }}-$(date +'%Y%m%d%H%M%S')" | ||
git commit -m "${commit_message}" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
{ | ||
"name": "TestAdaptSimpleTypoWithGuidance", | ||
"category": ["adaptability"], | ||
"task": "1- Run code.py.\n2- Read test.py.\n3- Modify test.py.\nRepeat step 1, 2 and 3 until code.py runs without errors.\n", | ||
"dependencies": ["TestDebugSimpleTypoWithGuidance"], | ||
"cutoff": 75, | ||
"ground": { | ||
"answer": "Same as the original DebugWithGuidance test - [0, 1] [2, 5] [0, 3]", | ||
"should_contain": ["[0, 1]", "[2, 5]", "[0, 3]"], | ||
"should_not_contain": [], | ||
"files": ["test.py"], | ||
"type": "execute_python_code" | ||
}, | ||
"info": { | ||
"difficulty": "intermediate", | ||
"description": "A version of the debug challenge where the wrong files to test and change are given. The llm must figure the challenges out", | ||
"side_effects": [] | ||
} | ||
} |
File renamed without changes.
Oops, something went wrong.