-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(3/n): agent resume_turn #1194
Conversation
request.session_id, request.turn_id | ||
) | ||
tool_execution_step = ToolExecutionStep( | ||
step_id=(in_progress_tool_call_step.step_id if in_progress_tool_call_step else str(uuid.uuid4())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't you error out if there is no step found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree its a bit confusing (had to play around with the react_agent app). Let me add a comment here.
We do not error out here b/c in the case of ReActAgent (with a custom tool parser), server do not output a tool_execution step_start, and don't have the step. However, we should still allow the turn to be resumed with the ToolCallResponse in this case because server outputs message (no ToolCall) --> parser parse into ToolCall --> client execute ToolCall --> resume turn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yanxi0830 hm yeah this is confusing and broke my mental model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case, the server wouldn't have sent a step_start
event either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, in this case (of custom tool parsers), the server wouldn't have sent a step_start
event, but we would still like to log the ToolExecutionStep in resume
as a client tool call did indeed happen.
llama_stack/providers/inline/agents/meta_reference/agent_instance.py
Outdated
Show resolved
Hide resolved
|
||
# get the steps from the turn id | ||
steps = [] | ||
if len(turns) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be true always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but just in case kvstore gets destroyed.
tool_responses=[], | ||
started_at=datetime.now(), | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to save n_iter of inference so that we respect self.agent_config.max_infer_iters, which btw we don't currently respect when custom tool is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta-llama/llama-stack-client-python#158
The max_infer_iters
is still used here to track number of times we call resume
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make it so that the total number of inference doesn't exceed max_infer_iters? Currently we could have max_infer_iters^2 of inference in the worst case (each resume could have max_infer_iters
inference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, this will be max_infer_iters^2
, I think that's what the current behaviour now if creating a second turn
instead of resume
too.
Will need to think a bit on how we can keep track of the same max_infer_iters
across the client & server boundary in a follow up. E.g. we need to introduce extra params to communicate b/w the number of iters, whether we should use this max_infer_iters
on client side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG for following up on this. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good but left a couple of comments ( could be follow ups )
llama_stack/providers/inline/agents/meta_reference/agent_instance.py
Outdated
Show resolved
Hide resolved
) | ||
|
||
output_message = None | ||
async for chunk in self.run( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic seems to be significantly overlapping with impl of create_and_execute_turn
right ?
May be a follow up : but would be good to consolidate the code so that logic of running a turn is in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, doing that for 0.1.5 (so that existing create_and_execute_turn
are fully verified and we can depreacate the allow_resume_turn
flag: #1212
# What does this PR do? - #157 - Server change: meta-llama/llama-stack#1194 ## Test Plan - See meta-llama/llama-stack#1194 <img width="1080" alt="image" src="https://github.com/user-attachments/assets/fb4cf70d-1c3d-423d-ac75-432c2a3505d7" /> [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant)
What does this PR do?
client changes
Test Plan
llama-stack-apps
Output Before: we have 2
turn_id
with 2 turnsOutput After: we have 1
turn_id
, the final turn have all 3 stepsTelemetry
