Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Extended ChatPromptParser.cs to cover additional scenarios #10278

Open
markwallace-microsoft opened this issue Jan 23, 2025 · 0 comments
Open
Assignees
Labels
Build Features planned for next Build conference chat history .NET Issue or Pull requests regarding .NET code sk team issue A tag to denote issues that where created by the Semantic Kernel team (i.e., not the community)

Comments

@markwallace-microsoft
Copy link
Member

Discussed in #10252

Originally posted by ThDuquennoy January 21, 2025
Hello,

When parsing a chat prompt, "invalid" messages are discarded
According to the code, a message node is invalid if one of the following condition is met :

  • role attribute is missing
  • More than 1 text child node
  • No text child node AND Content is null
    private static bool IsValidChatMessage(PromptNode node)
    {
        return
            node.TagName.Equals(MessageTagName, StringComparison.OrdinalIgnoreCase) &&
            node.Attributes.ContainsKey(RoleAttributeName) &&
            IsValidChildNodes(node);
    }


    private static bool IsValidChildNodes(PromptNode node)
    {
        var textTagsCount = node.ChildNodes.Count(n => n.TagName.Equals(TextTagName, StringComparison.OrdinalIgnoreCase));
        return textTagsCount == 1 || (textTagsCount == 0 && node.Content is not null);
    }

(Link to source)

I get it for the first condition but not the other 2
OpenAI API allows messages with :

  • Multiple message content of type text
  • No content of type text but one or more content of type image_url
  • Empty content array (not very useful, I agree)

For instance the following payload is valid

{
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": ""
                    }
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "This is a sample payload to demonstrate that having multiple text part is ok"
                },
                {
                    "type": "text",
                    "text": "Just say \"Hi github\""
                }
            ]
        },
        {
            "role": "user",
            "content": []
        }
    ],
    "model": "gpt-4o",
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "stream": false,
    "temperature": 0,
    "top_p": 0,
    "max_tokens": 1000
}

When using KernelFunctionFromPrompt, I cannot generate that structure of payload because of this parsing and I don't understand why. I'm sure there is a reason behind this, can somebody explain it to me ?

I'm asking this because I noticed that GPT-4o answer differently these 2 payloads, and the one working is not possible with this parser

  • Non-working payload
{
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "The name of this image is image1.png"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": ""
                    }
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What do you see in the image ?"
                }
            ]
        }
    ],
    "model": "gpt-4o",
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "stream": false,
    "temperature": 0,
    "top_p": 0,
    "max_tokens": 1000
}

GPT-4o's response :

I'm unable to view images or any visual content. If you describe the image to me, I can help you interpret or analyze it!

  • Working payload :
{
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "The name of this image is image1.png"
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": ""
                    }
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What do you see in the image ?"
                }
            ]
        }
    ],
    "model": "gpt-4o",
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "stream": false,
    "temperature": 0,
    "top_p": 0,
    "max_tokens": 1000
}

Response :

The image is an emoji with a yellow face, heart-shaped eyes, and a broad smile. This emoji is commonly used to express love, adoration, or strong approval.

The second payload would be considered invalid because the message containing the image does not contain a text content

Thanks in advance

@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code Build Features planned for next Build conference chat history labels Jan 23, 2025
@github-actions github-actions bot changed the title Extended ChatPromptParser.cs to cover additional scenarios .Net: Extended ChatPromptParser.cs to cover additional scenarios Jan 23, 2025
@markwallace-microsoft markwallace-microsoft moved this to Backlog: Planned in Semantic Kernel Jan 23, 2025
github-merge-queue bot pushed a commit that referenced this issue Jan 28, 2025
…nstead of single value (#10304)

### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->
See [Issue
#10278](#10278
)

### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->
**`ChatPromptParser.cs` :**
Remove method `IsValidChildNodes` and its call in `IsValidChatMessage` :
A chat message is valid as long as it has a role attribute. Messages
with no text child or multiple text children are now valid


**`ChatPromptParserTests` :** 
- Changed 3rd invalid example since the former one is now valid
- Added tests for : Message with multiple text nodes, mixed XML content
and empty XML node
Remark : The expected behavior for mixed XML content is unclear so I
kept it as it was : the content of the message node ends up in a
`TextContent` if and only if the message has no valid text or image
child node.
So for instance, if the prompt has a message that is a mixed XML with
content and a child `image` node, the content would be ignored and the
`ChatMessageContent` object will have only an `ImageContent` item

**Other remark :** 
`ChatMessageContent.Content` property only returns/sets the first
`TextContent` item.
I thought about changing it to : 
- get : return a concatenation of the `TextContent` items separated by
`\n`
- set : set the first `TextContent` element (or add one if there is
none) and remove other `TextContent` items

But its current behavior seems intended (it is even included in some
unit tests) and I felt like such a change would have too much impact
across the code. So I left it as it is. But I think that such a change
could be beneficial.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [ ] All unit tests pass, and I have added new tests where possible : 
I have the same test fails with the unmodified `main` branch. For
instance the test `GettingStarted/Step1_Create_Kernel` fails with
`ConfigurationNotFoundException : Configuration section 'OpenAI' not
found`. I think there are some missing config files
- [x] I didn't break anyone 😄

---------

Co-authored-by: Thomas DUQUENNOY <[email protected]>
Co-authored-by: Mark Wallace <127216156+markwallace-microsoft@users.noreply.github.com>
Co-authored-by: Dmytro Struk <[email protected]>
@evchaki evchaki added the sk team issue A tag to denote issues that where created by the Semantic Kernel team (i.e., not the community) label Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Features planned for next Build conference chat history .NET Issue or Pull requests regarding .NET code sk team issue A tag to denote issues that where created by the Semantic Kernel team (i.e., not the community)
Projects
Status: Backlog: Planned
Development

No branches or pull requests

3 participants