.Net: Extended ChatPromptParser.cs to cover additional scenarios #10278

markwallace-microsoft · 2025-01-23T16:51:31Z

Discussed in #10252

^{Originally posted by ThDuquennoy January 21, 2025}
Hello,

When parsing a chat prompt, "invalid" messages are discarded
According to the code, a message node is invalid if one of the following condition is met :

role attribute is missing
More than 1 text child node
No text child node AND Content is null

    private static bool IsValidChatMessage(PromptNode node)
    {
        return
            node.TagName.Equals(MessageTagName, StringComparison.OrdinalIgnoreCase) &&
            node.Attributes.ContainsKey(RoleAttributeName) &&
            IsValidChildNodes(node);
    }


    private static bool IsValidChildNodes(PromptNode node)
    {
        var textTagsCount = node.ChildNodes.Count(n => n.TagName.Equals(TextTagName, StringComparison.OrdinalIgnoreCase));
        return textTagsCount == 1 || (textTagsCount == 0 && node.Content is not null);
    }

(Link to source)

I get it for the first condition but not the other 2
OpenAI API allows messages with :

Multiple message content of type text
No content of type text but one or more content of type image_url
Empty content array (not very useful, I agree)

For instance the following payload is valid

{
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAApgAAAKYB3X3/OAAAABl0RVh0U29mdHdhcmUAd3d3Lmlua3NjYXBlLm9yZ5vuPBoAAANCSURBVEiJtZZPbBtFFMZ/M7ubXdtdb1xSFyeilBapySVU8h8OoFaooFSqiihIVIpQBKci6KEg9Q6H9kovIHoCIVQJJCKE1ENFjnAgcaSGC6rEnxBwA04Tx43t2FnvDAfjkNibxgHxnWb2e/u992bee7tCa00YFsffekFY+nUzFtjW0LrvjRXrCDIAaPLlW0nHL0SsZtVoaF98mLrx3pdhOqLtYPHChahZcYYO7KvPFxvRl5XPp1sN3adWiD1ZAqD6XYK1b/dvE5IWryTt2udLFedwc1+9kLp+vbbpoDh+6TklxBeAi9TL0taeWpdmZzQDry0AcO+jQ12RyohqqoYoo8RDwJrU+qXkjWtfi8Xxt58BdQuwQs9qC/afLwCw8tnQbqYAPsgxE1S6F3EAIXux2oQFKm0ihMsOF71dHYx+f3NND68ghCu1YIoePPQN1pGRABkJ6Bus96CutRZMydTl+TvuiRW1m3n0eDl0vRPcEysqdXn+jsQPsrHMquGeXEaY4Yk4wxWcY5V/9scqOMOVUFthatyTy8QyqwZ+kDURKoMWxNKr2EeqVKcTNOajqKoBgOE28U4tdQl5p5bwCw7BWquaZSzAPlwjlithJtp3pTImSqQRrb2Z8PHGigD4RZuNX6JYj6wj7O4TFLbCO/Mn/m8R+h6rYSUb3ekokRY6f/YukArN979jcW+V/S8g0eT/N3VN3kTqWbQ428m9/8k0P/1aIhF36PccEl6EhOcAUCrXKZXXWS3XKd2vc/TRBG9O5ELC17MmWubD2nKhUKZa26Ba2+D3P+4/MNCFwg59oWVeYhkzgN/JDR8deKBoD7Y+ljEjGZ0sosXVTvbc6RHirr2reNy1OXd6pJsQ+gqjk8VWFYmHrwBzW/n+uMPFiRwHB2I7ih8ciHFxIkd/3Omk5tCDV1t+2nNu5sxxpDFNx+huNhVT3/zMDz8usXC3ddaHBj1GHj/As08fwTS7Kt1HBTmyN29vdwAw+/wbwLVOJ3uAD1wi/dUH7Qei66PfyuRj4Ik9is+hglfbkbfR3cnZm7chlUWLdwmprtCohX4HUtlOcQjLYCu+fzGJH2QRKvP3UNz8bWk1qMxjGTOMThZ3kvgLI5AzFfo379UAAAAASUVORK5CYII="
                    }
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "This is a sample payload to demonstrate that having multiple text part is ok"
                },
                {
                    "type": "text",
                    "text": "Just say \"Hi github\""
                }
            ]
        },
        {
            "role": "user",
            "content": []
        }
    ],
    "model": "gpt-4o",
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "stream": false,
    "temperature": 0,
    "top_p": 0,
    "max_tokens": 1000
}

When using KernelFunctionFromPrompt, I cannot generate that structure of payload because of this parsing and I don't understand why. I'm sure there is a reason behind this, can somebody explain it to me ?

I'm asking this because I noticed that GPT-4o answer differently these 2 payloads, and the one working is not possible with this parser

Non-working payload

{
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "The name of this image is image1.png"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAApgAAAKYB3X3/OAAAABl0RVh0U29mdHdhcmUAd3d3Lmlua3NjYXBlLm9yZ5vuPBoAAANCSURBVEiJtZZPbBtFFMZ/M7ubXdtdb1xSFyeilBapySVU8h8OoFaooFSqiihIVIpQBKci6KEg9Q6H9kovIHoCIVQJJCKE1ENFjnAgcaSGC6rEnxBwA04Tx43t2FnvDAfjkNibxgHxnWb2e/u992bee7tCa00YFsffekFY+nUzFtjW0LrvjRXrCDIAaPLlW0nHL0SsZtVoaF98mLrx3pdhOqLtYPHChahZcYYO7KvPFxvRl5XPp1sN3adWiD1ZAqD6XYK1b/dvE5IWryTt2udLFedwc1+9kLp+vbbpoDh+6TklxBeAi9TL0taeWpdmZzQDry0AcO+jQ12RyohqqoYoo8RDwJrU+qXkjWtfi8Xxt58BdQuwQs9qC/afLwCw8tnQbqYAPsgxE1S6F3EAIXux2oQFKm0ihMsOF71dHYx+f3NND68ghCu1YIoePPQN1pGRABkJ6Bus96CutRZMydTl+TvuiRW1m3n0eDl0vRPcEysqdXn+jsQPsrHMquGeXEaY4Yk4wxWcY5V/9scqOMOVUFthatyTy8QyqwZ+kDURKoMWxNKr2EeqVKcTNOajqKoBgOE28U4tdQl5p5bwCw7BWquaZSzAPlwjlithJtp3pTImSqQRrb2Z8PHGigD4RZuNX6JYj6wj7O4TFLbCO/Mn/m8R+h6rYSUb3ekokRY6f/YukArN979jcW+V/S8g0eT/N3VN3kTqWbQ428m9/8k0P/1aIhF36PccEl6EhOcAUCrXKZXXWS3XKd2vc/TRBG9O5ELC17MmWubD2nKhUKZa26Ba2+D3P+4/MNCFwg59oWVeYhkzgN/JDR8deKBoD7Y+ljEjGZ0sosXVTvbc6RHirr2reNy1OXd6pJsQ+gqjk8VWFYmHrwBzW/n+uMPFiRwHB2I7ih8ciHFxIkd/3Omk5tCDV1t+2nNu5sxxpDFNx+huNhVT3/zMDz8usXC3ddaHBj1GHj/As08fwTS7Kt1HBTmyN29vdwAw+/wbwLVOJ3uAD1wi/dUH7Qei66PfyuRj4Ik9is+hglfbkbfR3cnZm7chlUWLdwmprtCohX4HUtlOcQjLYCu+fzGJH2QRKvP3UNz8bWk1qMxjGTOMThZ3kvgLI5AzFfo379UAAAAASUVORK5CYII="
                    }
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What do you see in the image ?"
                }
            ]
        }
    ],
    "model": "gpt-4o",
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "stream": false,
    "temperature": 0,
    "top_p": 0,
    "max_tokens": 1000
}

GPT-4o's response :

I'm unable to view images or any visual content. If you describe the image to me, I can help you interpret or analyze it!

Working payload :

{
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "The name of this image is image1.png"
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAApgAAAKYB3X3/OAAAABl0RVh0U29mdHdhcmUAd3d3Lmlua3NjYXBlLm9yZ5vuPBoAAANCSURBVEiJtZZPbBtFFMZ/M7ubXdtdb1xSFyeilBapySVU8h8OoFaooFSqiihIVIpQBKci6KEg9Q6H9kovIHoCIVQJJCKE1ENFjnAgcaSGC6rEnxBwA04Tx43t2FnvDAfjkNibxgHxnWb2e/u992bee7tCa00YFsffekFY+nUzFtjW0LrvjRXrCDIAaPLlW0nHL0SsZtVoaF98mLrx3pdhOqLtYPHChahZcYYO7KvPFxvRl5XPp1sN3adWiD1ZAqD6XYK1b/dvE5IWryTt2udLFedwc1+9kLp+vbbpoDh+6TklxBeAi9TL0taeWpdmZzQDry0AcO+jQ12RyohqqoYoo8RDwJrU+qXkjWtfi8Xxt58BdQuwQs9qC/afLwCw8tnQbqYAPsgxE1S6F3EAIXux2oQFKm0ihMsOF71dHYx+f3NND68ghCu1YIoePPQN1pGRABkJ6Bus96CutRZMydTl+TvuiRW1m3n0eDl0vRPcEysqdXn+jsQPsrHMquGeXEaY4Yk4wxWcY5V/9scqOMOVUFthatyTy8QyqwZ+kDURKoMWxNKr2EeqVKcTNOajqKoBgOE28U4tdQl5p5bwCw7BWquaZSzAPlwjlithJtp3pTImSqQRrb2Z8PHGigD4RZuNX6JYj6wj7O4TFLbCO/Mn/m8R+h6rYSUb3ekokRY6f/YukArN979jcW+V/S8g0eT/N3VN3kTqWbQ428m9/8k0P/1aIhF36PccEl6EhOcAUCrXKZXXWS3XKd2vc/TRBG9O5ELC17MmWubD2nKhUKZa26Ba2+D3P+4/MNCFwg59oWVeYhkzgN/JDR8deKBoD7Y+ljEjGZ0sosXVTvbc6RHirr2reNy1OXd6pJsQ+gqjk8VWFYmHrwBzW/n+uMPFiRwHB2I7ih8ciHFxIkd/3Omk5tCDV1t+2nNu5sxxpDFNx+huNhVT3/zMDz8usXC3ddaHBj1GHj/As08fwTS7Kt1HBTmyN29vdwAw+/wbwLVOJ3uAD1wi/dUH7Qei66PfyuRj4Ik9is+hglfbkbfR3cnZm7chlUWLdwmprtCohX4HUtlOcQjLYCu+fzGJH2QRKvP3UNz8bWk1qMxjGTOMThZ3kvgLI5AzFfo379UAAAAASUVORK5CYII="
                    }
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What do you see in the image ?"
                }
            ]
        }
    ],
    "model": "gpt-4o",
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "stream": false,
    "temperature": 0,
    "top_p": 0,
    "max_tokens": 1000
}

Response :

The image is an emoji with a yellow face, heart-shaped eyes, and a broad smile. This emoji is commonly used to express love, adoration, or strong approval.

The second payload would be considered invalid because the message containing the image does not contain a text content

Thanks in advance

The text was updated successfully, but these errors were encountered:

…nstead of single value (#10304) ### Motivation and Context  See [Issue #10278](#10278 ) ### Description  **`ChatPromptParser.cs` :** Remove method `IsValidChildNodes` and its call in `IsValidChatMessage` : A chat message is valid as long as it has a role attribute. Messages with no text child or multiple text children are now valid **`ChatPromptParserTests` :** - Changed 3rd invalid example since the former one is now valid - Added tests for : Message with multiple text nodes, mixed XML content and empty XML node Remark : The expected behavior for mixed XML content is unclear so I kept it as it was : the content of the message node ends up in a `TextContent` if and only if the message has no valid text or image child node. So for instance, if the prompt has a message that is a mixed XML with content and a child `image` node, the content would be ignored and the `ChatMessageContent` object will have only an `ImageContent` item **Other remark :** `ChatMessageContent.Content` property only returns/sets the first `TextContent` item. I thought about changing it to : - get : return a concatenation of the `TextContent` items separated by `\n` - set : set the first `TextContent` element (or add one if there is none) and remove other `TextContent` items But its current behavior seems intended (it is even included in some unit tests) and I felt like such a change would have too much impact across the code. So I left it as it is. But I think that such a change could be beneficial. ### Contribution Checklist  - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [ ] All unit tests pass, and I have added new tests where possible : I have the same test fails with the unmodified `main` branch. For instance the test `GettingStarted/Step1_Create_Kernel` fails with `ConfigurationNotFoundException : Configuration section 'OpenAI' not found`. I think there are some missing config files - [x] I didn't break anyone 😄 --------- Co-authored-by: Thomas DUQUENNOY <[email protected]> Co-authored-by: Mark Wallace <127216156+markwallace-microsoft@users.noreply.github.com> Co-authored-by: Dmytro Struk <[email protected]>

markwallace-microsoft added .NET Issue or Pull requests regarding .NET code Build Features planned for next Build conference chat history labels Jan 23, 2025

markwallace-microsoft assigned dmytrostruk Jan 23, 2025

markwallace-microsoft added this to Semantic Kernel Jan 23, 2025

github-actions bot changed the title ~~Extended ChatPromptParser.cs to cover additional scenarios~~ .Net: Extended ChatPromptParser.cs to cover additional scenarios Jan 23, 2025

markwallace-microsoft added the triage label Jan 23, 2025

markwallace-microsoft moved this to Backlog: Planned in Semantic Kernel Jan 23, 2025

markwallace-microsoft removed the triage label Jan 23, 2025

ThDuquennoy mentioned this issue Jan 27, 2025

.Net: issue-10278 : Change ChatPromptParser to enable 0-n text part instead of single value #10304

Merged

4 tasks

evchaki added the sk team issue A tag to denote issues that where created by the Semantic Kernel team (i.e., not the community) label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: Extended ChatPromptParser.cs to cover additional scenarios #10278

.Net: Extended ChatPromptParser.cs to cover additional scenarios #10278

markwallace-microsoft commented Jan 23, 2025

.Net: Extended ChatPromptParser.cs to cover additional scenarios #10278

.Net: Extended ChatPromptParser.cs to cover additional scenarios #10278

Comments

markwallace-microsoft commented Jan 23, 2025

Discussed in #10252