Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for OpenAI API parallel sampling #640

Merged

Conversation

yichuan520030910320
Copy link
Contributor

@yichuan520030910320 yichuan520030910320 commented Jul 17, 2024

Add support for OpenAI API parallel sampling:

  1. add support for request.n>1 when using OpenAI API ; First send one prefilling request to increase cache hit rate, then async send n decoding request in parallel
  2. Do not support when there are m prompts organized as a List in OpenAI API
  3. Do not support request.n>1 while streaming

Example code:

import openai

client = openai.Client(base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")

# Text completion
response = client.completions.create(
    model="default",
    prompt="I am a robot and I want to study like humans. Now let's tell a story. Once upon a time, there was a little",
    n=3,
    temperature=0.8,
    max_tokens=32,
)
print(response)

# Chat completion
response = client.chat.completions.create(
    model="default",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant"},
        {"role": "user", "content": "List 3 countries and their capitals."},
    ],
    temperature=0.8,
    max_tokens=64,
    logprobs=True,
    n=3,
)
print(response)

The result of running python sglang/examples/usage/openai_parallel_sample.py is

Completion(id='6782cf18dd97421bbb72462eb404d93a', choices=[CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=0, logprobs=None, text=' robot named Bob. Bob was designed to learn and grow, but he was stuck in a rut. He kept repeating the same tasks over and over again,'), CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=1, logprobs=None, text=' robot named Robby who lived in a big factory with lots of other robots. Robby was very curious and wanted to learn new things, but the other'), CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=2, logprobs=None, text=' robot named Robby. Robby was different from the other robots, for he had a big dream: to learn how to think and learn like a human')], created=1721238118, model='default', object='text_completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=96, prompt_tokens=30, total_tokens=126))
ChatCompletion(id='4e589636d6bb4cae9ce307f4e0be4203', choices=[Choice(finish_reason='FINISH_MATCHED_TOKEN: 2', index=0, logprobs=None, message=ChatCompletionMessage(content='  Of course! Here are three countries and their capitals:\n\n1. Country: France\nCapital: Paris\n2. Country: Japan\nCapital: Tokyo\n3. Country: Brazil\nCapital: Brasília', role='assistant', function_call=None, tool_calls=None)), Choice(finish_reason='FINISH_LENGTH: 64', index=1, logprobs=None, message=ChatCompletionMessage(content='  Of course! Here are three countries and their capitals:\n\n1. Country: France\nCapital: Paris\n2. Country: Japan\nCapital: Tokyo\n3. Country: Brazil\nCapital: Brasília\n\nI hope this helps! Let me know if you need anything else.', role='assistant', function_call=None, tool_calls=None)), Choice(finish_reason='FINISH_MATCHED_TOKEN: 2', index=2, logprobs=None, message=ChatCompletionMessage(content="  Of course, I'd be happy to help! Here are three countries and their capitals:\n\n1. Italy - Rome\n2. Japan - Tokyo\n3. Mexico - Mexico City", role='assistant', function_call=None, tool_calls=None))], created=1721238120, model='default', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=155, prompt_tokens=38, total_tokens=193))

cc @Ying1123 @merrymercy @hnyls2002 and thanks @hnyls2002 for help in some guidance

@Ying1123 Ying1123 mentioned this pull request Jul 17, 2024
29 tasks
@yichuan520030910320
Copy link
Contributor Author

Now it supports when there are m prompts organized as a List in OpenAI API, and these m prompts can parallel sampling.

When we run this code

import openai

client = openai.Client(base_url="http://127.0.0.1:30000/v1", api_key="EMPTY")

# Text completion
response = client.completions.create(
    model="default",
    prompt="I am a robot and I want to study like humans. Now let's tell a story. Once upon a time, there was a little",
    n=1,
    temperature=0.8,
    max_tokens=32,
)
print(response)


# Text completion
response = client.completions.create(
    model="default",
    prompt="I am a robot and I want to study like humans. Now let's tell a story. Once upon a time, there was a little",
    n=3,
    temperature=0.8,
    max_tokens=32,
)
print(response)


# Text completion
response = client.completions.create(
    model="default",
    prompt=["The name of the famous soccer player is ", "The capital of US is"],
    n=1,
    temperature=0.8,
    max_tokens=32,
)
print(response)


# Text completion
response = client.completions.create(
    model="default",
    prompt=["The name of the famous soccer player is ", "The capital of US is"],
    n=3,
    temperature=0.8,
    max_tokens=32,
)
print(response)


# Chat completion
response = client.chat.completions.create(
    model="default",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant"},
        {"role": "user", "content": "List 3 countries and their capitals."},
    ],
    temperature=0.8,
    max_tokens=64,
    logprobs=True,
    n=4,
)
print(response)

The result will be

Completion(id='0e8758bc5e964ffc893c78ec4a805484', choices=[CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=0, logprobs=None, text=' robot named Bob. Bob was different from the other robots because he was curious about the world beyond his factory floor. He wanted to learn how to think like')], created=1721380457, model='default', object='text_completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=32, prompt_tokens=30, total_tokens=62))
Completion(id='1ca8cb8cd4b74613858935311ecdbefc', choices=[CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=0, logprobs=None, text=' robot named Zeta. Zeta lived in a big factory where she helped make other robots. But Zeta had big dreams. She wanted to learn'), CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=1, logprobs=None, text=' robot named Zeta. Zeta loved to learn and play with his robot friends, but he wanted to learn more. He wanted to learn like humans! So'), CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=2, logprobs=None, text=' robot named Robby. Robby lived in a big factory where he did lots of work, but he was not happy. He wanted to learn more and be')], created=1721380458, model='default', object='text_completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=96, prompt_tokens=30, total_tokens=126))
Completion(id='b6f62139cb8148d6ac1868a932923790', choices=[CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=0, logprobs=None, text=' Medi éllo Gór Mahrez. It is a good name for a soccer player because it is unique and easy to pronounce. It is also'), CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=1, logprobs=None, text=' Washington DC, which stands for District of Columbia. It is located on the East Coast of the United States and is home to many national landmarks and institutions,')], created=1721380459, model='default', object='text_completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=64, prompt_tokens=11, total_tokens=75))
Completion(id='a4b18b086b1b4c9f8a6a59317444a53c', choices=[CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=0, logprobs=None, text=' \n\nAnswer:\nThe famous soccer player is Lionel Messi.\nThe capital of the United States of America is Washington, D.C'), CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=1, logprobs=None, text=' \nDirections: Read the text and answer the questions that follow.\n\nThe name of the famous soccer player is David Beckham. He'), CompletionChoice(finish_reason='FINISH_MATCHED_TOKEN: 2', index=2, logprobs=None, text='\n\nAnswer:\nThe famous soccer player is Lionel Messi, and the capital of the United States is Washington D.C.'), CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=3, logprobs=None, text=' __________ \nAnswer:\nThe famous soccer player is Lionel Messi.\nThe capital of the United States is Washington D.C.'), CompletionChoice(finish_reason='FINISH_LENGTH: 32', index=4, logprobs=None, text='  Washington DC\nSoccer players from around the world travel to Brazil to compete in the FIFA World Cup, the most prestigious international soccer tournament'), CompletionChoice(finish_reason='FINISH_MATCHED_TOKEN: 2', index=5, logprobs=None, text=' \n\nAnswer:\nThe famous soccer player is Lionel Messi.\nThe capital of the United States is Washington D.C.')], created=1721380460, model='default', object='text_completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=189, prompt_tokens=17, total_tokens=206))
ChatCompletion(id='06ca8f81157c4f1887be08129621f84e', choices=[Choice(finish_reason='FINISH_MATCHED_TOKEN: 2', index=0, logprobs=None, message=ChatCompletionMessage(content='  Of course! Here are three countries and their capitals:\n\n1. Country: Brazil\nCapital: Brasília\n2. Country: China\nCapital: Beijing\n3. Country: Germany\nCapital: Berlin', role='assistant', function_call=None, tool_calls=None)), Choice(finish_reason='FINISH_MATCHED_TOKEN: 2', index=1, logprobs=None, message=ChatCompletionMessage(content='  Of course! Here are three countries and their capitals:\n\n1. Country: France\nCapital: Paris\n2. Country: Japan\nCapital: Tokyo\n3. Country: Brazil\nCapital: Brasília', role='assistant', function_call=None, tool_calls=None)), Choice(finish_reason='FINISH_MATCHED_TOKEN: 2', index=2, logprobs=None, message=ChatCompletionMessage(content='  Of course! Here are three countries and their capitals:\n\n1. Country: Japan - Capital: Tokyo\n2. Country: France - Capital: Paris\n3. Country: Brazil - Capital: Brasília', role='assistant', function_call=None, tool_calls=None)), Choice(finish_reason='FINISH_LENGTH: 64', index=3, logprobs=None, message=ChatCompletionMessage(content='  Of course! Here are three countries and their capitals:\n\n1. Country: France\nCapital: Paris\n2. Country: China\nCapital: Beijing\n3. Country: Brazil\nCapital: Brasília\n\nI hope that helps! Let me know if you need more', role='assistant', function_call=None, tool_calls=None))], created=1721380462, model='default', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=210, prompt_tokens=38, total_tokens=248))

Also I will support batch and model API in another PR

@yichuan520030910320
Copy link
Contributor Author

Here I fix these LoC.
Also, I rewrite some codes to align with gpt-3.5-turbo-instruct output's sequence

@merrymercy merrymercy merged commit 49c5e0e into sgl-project:main Jul 20, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants