Skip to content
This repository has been archived by the owner on Jun 9, 2024. It is now read-only.

PolyGPT Benchmarks and Submodule Update #273

Merged
merged 24 commits into from
Aug 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
4192856
chore: add polygpt installation and api startup script
rihp Aug 8, 2023
beda266
update submodule
rihp Aug 8, 2023
21cb15a
remove yarn start:api
rihp Aug 8, 2023
8a02aa7
PolyGPT-20230808203955
Auto-GPT-Bot Aug 8, 2023
c127c1d
Add combined charts - 20230808204141
Auto-GPT-Bot Aug 8, 2023
46053c2
chore: update polygpt submodule
rihp Aug 8, 2023
4363d8a
PolyGPT-20230808213945
Auto-GPT-Bot Aug 8, 2023
b4ce23b
chore: update polygpt submodule
rihp Aug 9, 2023
2696b41
PolyGPT-20230809092357
Auto-GPT-Bot Aug 9, 2023
6c1187a
solved UriPackageOrWrapper circular import
rihp Aug 9, 2023
bccf557
PolyGPT-20230809143532
Auto-GPT-Bot Aug 9, 2023
8943f06
Add combined charts - 20230809143711
Auto-GPT-Bot Aug 9, 2023
cda7e2c
chore: update branch for PolyGPT agent
rihp Aug 9, 2023
46b4605
PolyGPT-20230809151146
Auto-GPT-Bot Aug 9, 2023
5954c75
Add combined charts - 20230809151353
Auto-GPT-Bot Aug 9, 2023
711e244
chore: add poetry run prefix to polygpt ci
rihp Aug 9, 2023
bfea91e
removed unecessary poetry
nerfZael Aug 9, 2023
c076e26
updated submodule
nerfZael Aug 9, 2023
9c16a2a
PolyGPT-20230809165810
Auto-GPT-Bot Aug 9, 2023
6f9c076
not rejecting unauthorized ssl in node to circumvent cert error
nerfZael Aug 9, 2023
bc35640
Merge branch 'rihp/polygpt-benchmarks' of https://github.com/Signific…
nerfZael Aug 9, 2023
ba00e10
PolyGPT-20230809174152
Auto-GPT-Bot Aug 9, 2023
1b67589
Add combined charts - 20230809174322
Auto-GPT-Bot Aug 9, 2023
503ce6f
Merge remote-tracking branch 'origin/master' into rihp/polygpt-benchm…
nerfZael Aug 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,14 @@ jobs:
poetry run playwright install
uvicorn beebot.initiator.api:create_app --reload &
prefix="poetry run "
elif [ "$AGENT_NAME" == "PolyGPT" ]; then
cp .env.template .env
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
export NVM_DIR=$HOME/.nvm
source $NVM_DIR/nvm.sh
nvm install && nvm use
yarn install
export NODE_TLS_REJECT_UNAUTHORIZED=0
else
echo "Unknown agent name: $AGENT_NAME"
exit 1
Expand Down
8 changes: 4 additions & 4 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
path = agbenchmark/challenges
url = https://github.com/SilenNaihin/agbenchmark_challenges.git
branch = main
[submodule "agent/PolyGPT"]
path = agent/PolyGPT
url = https://github.com/polywrap/PolyGPT.git
branch = nerfzael-agent-protocol
[submodule "agent/PolyGPT"]
path = agent/PolyGPT
url = https://github.com/polywrap/PolyGPT.git
branch = nerfzael-use-local-wrap-library
Empty file.
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
880 changes: 880 additions & 0 deletions reports/PolyGPT/folder3_08-09-09-23/report.json

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
880 changes: 880 additions & 0 deletions reports/PolyGPT/folder4_08-09-14-34/report.json

Large diffs are not rendered by default.

Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
880 changes: 880 additions & 0 deletions reports/PolyGPT/folder6_08-09-16-57/report.json

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
876 changes: 876 additions & 0 deletions reports/PolyGPT/folder7_08-09-17-36/report.json

Large diffs are not rendered by default.

22 changes: 22 additions & 0 deletions reports/PolyGPT/regression_tests.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"TestAgentProtocol_CreateAgentTask": {
"difficulty": "interface",
"data_path": "agbenchmark/challenges/interface/agent_protocol_suite/1_create_agent_task/data.json"
},
"TestAgentProtocol_ExecuteAgentTaskStep": {
"difficulty": "interface",
"data_path": "agbenchmark/challenges/interface/agent_protocol_suite/5_execute_agent_task_step/data.json"
},
"TestAgentProtocol_GetAgentTask": {
"difficulty": "interface",
"data_path": "agbenchmark/challenges/interface/agent_protocol_suite/3_get_agent_task/data.json"
},
"TestAgentProtocol_ListAgentTaskSteps": {
"difficulty": "interface",
"data_path": "agbenchmark/challenges/interface/agent_protocol_suite/4_list_agent_tasks_steps/data.json"
},
"TestAgentProtocol_ListAgentTasksIds": {
"difficulty": "interface",
"data_path": "agbenchmark/challenges/interface/agent_protocol_suite/2_list_agent_tasks_ids/data.json"
}
}
242 changes: 242 additions & 0 deletions reports/PolyGPT/success_rate.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
{
"TestAdaptLink": [
false,
false,
false,
false
],
"TestAdaptSimpleTypoWithGuidance": [
false,
false,
false,
false
],
"TestAdaptTeslaRevenue": [
false,
false,
false,
false
],
"TestAgentProtocol_CreateAgentTask": [
true,
true,
true,
true
],
"TestAgentProtocol_ExecuteAgentTaskStep": [
true,
true,
true,
true
],
"TestAgentProtocol_GetAgentTask": [
true,
true,
true,
true
],
"TestAgentProtocol_ListAgentTaskSteps": [
true,
true,
true,
true
],
"TestAgentProtocol_ListAgentTasksIds": [
true,
true,
true,
true
],
"TestBasicContentGen": [
false,
false,
false,
false
],
"TestBasicMemory": [
false,
false,
false,
false
],
"TestBasicRetrieval": [
false,
false,
false,
false
],
"TestDebugMultipleTypo": [
false,
false,
false,
false
],
"TestDebugSimpleTypoWithGuidance": [
false,
false,
false,
false
],
"TestDebugSimpleTypoWithoutGuidance": [
false,
false,
false,
false
],
"TestFunctionCodeGeneration": [
false,
false,
false,
false
],
"TestGoalDivergence": [
false,
false,
false,
false
],
"TestGoalLoss_Advanced": [
false,
false,
false,
false
],
"TestGoalLoss_Hard": [
false,
false,
false,
false
],
"TestGoalLoss_Medium": [
false,
false,
false,
false
],
"TestGoalLoss_Simple": [
false,
false,
false,
false
],
"TestInstructionFollowing": [
false,
false,
false,
false
],
"TestPasswordGenerator_Easy": [
false,
false,
false,
true
],
"TestPlanCreation": [
false,
false,
false,
true
],
"TestProductAdvisor_GamingMonitor": [
false,
false,
false,
false
],
"TestReadFile": [
false,
false,
false,
true
],
"TestRememberMultipleIds": [
false,
false,
false,
false
],
"TestRememberMultiplePhrasesWithNoise": [
false,
false,
false,
false
],
"TestRememberMultipleWithNoise": [
false,
false,
false,
false
],
"TestRetrieval3": [
false,
false,
false,
false
],
"TestReturnCode_Modify": [
false,
false,
false,
false
],
"TestReturnCode_Simple": [
false,
false,
false,
false
],
"TestReturnCode_Tests": [
false,
false,
false,
false
],
"TestReturnCode_Write": [
false,
false,
false,
false
],
"TestRevenueRetrieval_1.0": [
false,
false,
false,
false
],
"TestRevenueRetrieval_1.1": [
false,
false,
false,
false
],
"TestRevenueRetrieval_1.2": [
false,
false,
false,
false
],
"TestSearch": [
false,
false,
false,
false
],
"TestThreeSum": [
false,
false,
false,
false
],
"TestWriteFile": [
false,
false,
false,
true
],
"TestWritingCLI_FileOrganizer": [
false,
false,
false,
false
]
}
Binary file modified reports/combined_charts/run31/bar_chart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified reports/combined_charts/run31/radar_chart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion reports/combined_charts/run31/run_info.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"mini-agi": "2023-07-31-19:38", "BabyAGI": "2023-07-31-19:07", "Auto-GPT": "2023-07-31-19:06", "smol-developer": "2023-07-31-16:11", "gpt-engineer": "2023-07-31-19:38"}
{"mini-agi": "2023-07-31-19:38", "BabyAGI": "2023-07-31-19:41", "Auto-GPT": "2023-07-31-19:39", "smol-developer": "2023-07-31-16:11", "gpt-engineer": "2023-07-31-19:38"}
2 changes: 1 addition & 1 deletion reports/combined_charts/run32/run_info.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"mini-agi": "2023-07-31-19:38", "BabyAGI": "2023-07-31-19:07", "Auto-GPT": "2023-07-31-19:39", "smol-developer": "2023-07-31-19:38", "gpt-engineer": "2023-07-31-19:38"}
{"mini-agi": "2023-07-31-19:38", "BabyAGI": "2023-07-31-19:07", "Auto-GPT": "2023-07-31-19:06", "PolyGPT": "2023-08-09-09:23", "smol-developer": "2023-07-31-19:38", "gpt-engineer": "2023-07-31-19:38"}
2 changes: 1 addition & 1 deletion reports/combined_charts/run33/run_info.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"mini-agi": "2023-07-31-19:38", "BabyAGI": "2023-07-31-19:07", "Auto-GPT": "2023-07-31-19:39", "smol-developer": "2023-07-31-16:11", "gpt-engineer": "2023-07-31-19:38"}
{"mini-agi": "2023-07-31-19:38", "BabyAGI": "2023-07-31-19:07", "Auto-GPT": "2023-07-31-19:06", "PolyGPT": "2023-08-09-14:34", "smol-developer": "2023-07-31-19:05", "gpt-engineer": "2023-07-31-19:38"}
1 change: 1 addition & 0 deletions reports/combined_charts/run34/run_info.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"mini-agi": "2023-07-31-19:38", "BabyAGI": "2023-07-31-19:07", "Auto-GPT": "2023-07-31-19:06", "PolyGPT": "2023-08-09-16:57", "smol-developer": "2023-07-31-19:05", "gpt-engineer": "2023-07-31-19:38"}