Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/multiturn conversation #217

Merged
merged 21 commits into from
Apr 30, 2024

Conversation

steffencruz
Copy link
Collaborator

@steffencruz steffencruz commented Apr 23, 2024

Adds multi-turn conversation capabilities to the validator. This is important for the following reasons:

  • Miners are assessed on their ability to continue a conversation over multiple turns
  • Miners are required to parse the entire conversation history (essential for chat app)

Overview:

  • The approach reuses the QuestionAnsweringTask to create followup questions, with a dedicated followup prompt.
  • We generate a reference answer using a dedicated followup reference prompt, and use the normal QA reward stack.
  • After the initial task, all tasks are QA.
  • The intial context is used throughout the entire conversation.
  • The number of turns in the conversation is randomly determined. Currently, the probability of continuing is 50% at each turn. This seems to work fine. This produces conversations up to 10 turns (although most are less than 3).
  • We use the best completion in the conversation history (but we could alternatively use the reference, or both with some prob?)
  • We don't create a challenge for subsequent steps. Instead we prompt the LLM to continue the conversation in a consistent style as the user (challenge), which seems to work well...

Followup prompt

Several iterations of prompt engineering were carried out. Each was manually inspected, analyzed for obvious artefacts (QG+QA, length) and run through a battery of gpt4 evals.

A total of 6 iterations of prompt engineering were carried out. After refinement, the followup prompt we now appears to produce good, continuous conversations.

Reference answers, as judged by rewards distributions on tracked experiments. The plots show no clear deterioration in rewards with conversation turn, which indicates that the prompts are stable.
image
image

run_paths = {
    'mt-base': 'opentensor-dev/alpha-validators/h4ywaxzb', # my first attempt
    'mt-qg+qa' :'opentensor-dev/alpha-validators/v9ggnzha', # adds extra instruction to not answer the followup
    'mt-gpt1': 'opentensor-dev/alpha-validators/9w4oroso', # first gpt variation
    'mt-gpt2': 'opentensor-dev/alpha-validators/kc3lxgd0', # second gpt variation
    'mt-gpt3': 'opentensor-dev/alpha-validators/0swpcny1', #third gpt variation
    'mt-gpt2-v2': 'opentensor-dev/alpha-validators/d0iwnlc2', # refinement on second gpt variation
}

GPT-4 evals

The plots below show the quality of the followup questions based on GPT4 as a judge
image

@p-ferreira p-ferreira merged commit 046e40c into pre-staging Apr 30, 2024
2 checks passed
@p-ferreira p-ferreira mentioned this pull request May 1, 2024
@Hollyqui Hollyqui deleted the features/multiturn-conversation branch August 2, 2024 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants