Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SN1-419: MSRv2: Zero Sum Scoring Experiment #634

Draft
wants to merge 25 commits into
base: staging
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
665bfc1
Seperate Prompting, Remove TTI Endpoint, Add Json Flag
bkb2135 Feb 28, 2025
4563830
Initial draft
Feb 28, 2025
6ae46ce
Precommit Changes
richwardle Feb 28, 2025
15c951d
Precommit Fix
bkb2135 Mar 3, 2025
cb29ad8
Improving TTI Final Prompt, Add Unittests for Prompts
richwardle Mar 3, 2025
2464424
Merge branch 'SN1-423-restructure-prompting' of github.com:macrocosm-…
richwardle Mar 3, 2025
8d78f54
Finalising draft for new MSR task
Mar 3, 2025
2f96a50
Await Reward Models
richwardle Mar 3, 2025
a6132c0
Add Next Action For Final Prompt
richwardle Mar 3, 2025
fc470bf
Add Detailed Log For Scoring Response Failed
richwardle Mar 3, 2025
629bf1e
Generating follow-up task in generator reward config
Mar 3, 2025
5948dbb
Precommit Fixes
bkb2135 Mar 3, 2025
1fad896
Fixing various import errors
Mar 4, 2025
9a5ebb3
Simplify Prompt Structure
richwardle Mar 4, 2025
0245a2a
Fix Unittest and Precommit
bkb2135 Mar 4, 2025
46fe23a
Merge branch 'SN1-423-restructure-prompting' into 'SN1-419-r-d-resear…
richwardle Mar 4, 2025
9d8abfc
Add Get Entry for DiscriminatorDataset Entry
richwardle Mar 4, 2025
a243288
Fix Merge Overwrite
richwardle Mar 4, 2025
dfaa788
Use Random In Discriminator Dataset
richwardle Mar 4, 2025
9c5b90b
Fixing bugs with task appending
Mar 5, 2025
4d9d58b
Bug Fixes
bkb2135 Mar 8, 2025
dc9401b
Remove Miner Logs
bkb2135 Mar 8, 2025
583b02e
Restructuring and Formatting
richard-wardle Mar 10, 2025
8f2cee9
Remove Redundant Reward Model
richard-wardle Mar 12, 2025
66c584f
Extract Out Weighted Average Calculation
richard-wardle Mar 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix Unittest and Precommit
  • Loading branch information
bkb2135 committed Mar 4, 2025
commit 0245a2ac981b034c9e99d3489a22263462693ea0
7 changes: 5 additions & 2 deletions shared/prompts/test_time_inference.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
import textwrap


def intro_prompt() -> str:
"""
Returns the intro prompt.
"""

intro = textwrap.dedent(
"""\
"""\
You are a world-class expert in analytical reasoning and problem-solving. Your task is to break down complex problems through rigorous step-by-step analysis, carefully examining each aspect before moving forward. For each reasoning step:

OUTPUT FORMAT:
Expand Down Expand Up @@ -79,19 +80,21 @@ def intro_prompt() -> str:

return intro


def system_acceptance_prompt() -> str:
"""
Returns the system acceptance prompt.
"""

system_acceptance = textwrap.dedent(
"""\
"""\
I understand. I will now analyze the problem systematically, following the structured reasoning process while maintaining high standards of analytical rigor and self-criticism.
"""
).strip()

return system_acceptance


def final_answer_prompt() -> str:
"""
Returns the final answer prompt.
Expand Down
6 changes: 4 additions & 2 deletions tests/prompting/shared/test_get_prompt.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import pytest
from shared.prompts.test_time_inference import intro_prompt, system_acceptance_prompt, final_answer_prompt
from shared.prompts.test_time_inference import final_answer_prompt, intro_prompt, system_acceptance_prompt


def test_intro_prompt():
"""Test that intro_prompt returns the correct prompt."""
Expand All @@ -11,12 +11,14 @@ def test_intro_prompt():
assert "REQUIREMENTS:" in prompt
assert "CRITICAL THINKING CHECKLIST:" in prompt


def test_system_acceptance_prompt():
"""Test that system_acceptance_prompt returns the correct prompt."""
prompt = system_acceptance_prompt()
assert isinstance(prompt, str)
assert "I understand. I will now analyze the problem systematically" in prompt


def test_final_answer_prompt():
"""Test that final_answer_prompt returns the correct prompt."""
prompt = final_answer_prompt()
Expand Down
6 changes: 2 additions & 4 deletions validator_api/test_time_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from prompting.llms.apis.llm_messages import LLMMessage, LLMMessages
from prompting.llms.apis.llm_wrapper import LLMWrapper
from shared.prompts.test_time_inference import intro_prompt, system_acceptance_prompt, final_answer_prompt
from shared.prompts.test_time_inference import final_answer_prompt, intro_prompt, system_acceptance_prompt
from shared.timer import Timer
from validator_api.chat_completion import chat_completion

Expand Down Expand Up @@ -199,12 +199,10 @@ async def generate_response(
step_count += 1
yield steps, None

final_answer_prompt = final_answer_prompt()

messages.append(
{
"role": "user",
"content": final_answer_prompt,
"content": final_answer_prompt(),
}
)

Expand Down