Dynamic xpaths not working with HTML task preannotation via flask ML backend #6929

chrisdukeLlama · 2025-01-17T15:55:23Z

Describe the bug
“XPath-based predictions are added to the task data but not visually displayed in the UI when using dynamic XPaths like .//p[contains(., 'Nutzenbewertung')].”

To Reproduce
Steps to reproduce the behavior:
Use the following HTML in a Label Studio project:
html

Nutzenbewertung gemäß § 35a SGB V

Another paragraph containing Nutzenbewertung.

Add predictions using the XPath:

//*[contains(text(), 'Nutzenbewertung,')]
Observe that the prediction is added to the task data but not displayed in the UI.

Expected behavior
“The prediction should be visually displayed in the UI, as the XPath matches the correct text nodes.”

Environment (please complete the following information):

OS: macOS Sonoma
Label Studio Version v1.15.0

Additional context
The XPath works in a flask backend and in a browser (showing the html or the html within Label studio) but not in Label Studio.
Predictions using exact XPaths derived from annotations made in the label Studio interface like /p[34]/text()[1] work when put statically in the flask backend, but dynamic XPaths fail.

Task source with prediction and annotation (annotation done after prediction, so I am sure I could not see the prediction even before the annotation):

{
"id": 5084,
"data": {
"html": "

Nutzenbewertung gemäß § 35a SGB V

\n

Another paragraph containing Nutzenbewertung.

"
},
"annotations": [
{
"id": 1306,
"result": [
{
"id": "735b0e8f-2c80-4168-b358-6662097a7739",
"type": "labels",
"value": {
"end": ".//p[contains(., 'Nutzenbewertung')]",
"text": "Nutzenbewertung gemäß § 35a SGB V",
"start": ".//p[contains(., 'Nutzenbewertung')]",
"labels": [
"Ursprungsprojekt"
],
"endOffset": 33,
"startOffset": 0
},
"origin": "prediction",
"to_name": "text",
"from_name": "Was war der ursprüngliche Auftrag?"
},
{
"id": "Wtcf3gI9g7",
"type": "labels",
"value": {
"end": "/p[1]/text()[1]",
"text": "Nutzenbewertung gemäß § 35a SGB V",
"start": "/p[1]/text()[1]",
"labels": [
"Ursprungsprojekt"
],
"endOffset": 33,
"startOffset": 0,
"globalOffsets": {
"end": 33,
"start": 0
}
},
"origin": "manual",
"to_name": "text",
"from_name": "Was war der ursprüngliche Auftrag?"
}
],
"created_username": " XXXXX, 1",
"created_ago": "0 minutes",
"completed_by": {
"id": 1,
"first_name": "",
"last_name": "",
"avatar": null,
"email": "[email protected]",
"initials": "ch"
},
"was_cancelled": false,
"ground_truth": false,
"created_at": "2025-01-17T15:32:22.888219Z",
"updated_at": "2025-01-17T15:32:22.888240Z",
"draft_created_at": "2025-01-17T15:32:09.385624Z",
"lead_time": 27.26,
"import_id": null,
"last_action": null,
"task": 5084,
"project": 1,
"updated_by": 1,
"parent_prediction": 175,
"parent_annotation": null,
"last_created_by": null
}
],
"predictions": [
{
"id": 175,
"result": [
{
"id": "735b0e8f-2c80-4168-b358-6662097a7739",
"type": "labels",
"value": {
"end": ".//p[contains(., 'Nutzenbewertung')]",
"text": "Nutzenbewertung gemäß § 35a SGB V",
"start": ".//p[contains(., 'Nutzenbewertung')]",
"labels": [
"Ursprungsprojekt"
],
"endOffset": 33,
"startOffset": 0
},
"origin": "prediction",
"to_name": "text",
"from_name": "Was war der ursprüngliche Auftrag?"
}
],
"model_version": "v1.0",
"created_ago": "10 minutes",
"score": 1,
"cluster": null,
"neighbors": null,
"mislabeling": 0,
"created_at": "2025-01-17T15:22:24.182589Z",
"updated_at": "2025-01-17T15:22:24.182602Z",
"model": null,
"model_run": null,
"task": 5084,
"project": 1
}
]
}

"Prediction" passing the xpath from label studio also via the backend (this shows up in the gui and is highlighted):

{
"id": 5083,
"data": {
"html": "

Nutzenbewertung gemäß § 35a SGB V

\n

Another paragraph containing Nutzenbewertung.

"
},
"annotations": [],
"predictions": [
{
"id": 174,
"result": [
{
"id": "123KBhsJdU-5w",
"type": "labels",
"value": {
"end": "/p[1]/text()[1]",
"text": "Nutzenbewertung gemäß § 35a SGB V",
"start": "/p[1]/text()[1]",
"labels": [
"Ursprungsprojekt"
],
"endOffset": 33,
"startOffset": 0
},
"origin": "prediction",
"to_name": "text",
"from_name": "Was war der ursprüngliche Auftrag?"
}
],
"model_version": "v1.0",
"created_ago": "14 minutes",
"score": 1,
"cluster": null,
"neighbors": null,
"mislabeling": 0,
"created_at": "2025-01-17T15:21:34.936985Z",
"updated_at": "2025-01-17T15:21:34.936995Z",
"model": null,
"model_run": null,
"task": 5083,
"project": 1
}
]
}

chrisdukeLlama · 2025-01-19T20:05:47Z

As a further explanation: I have really complicated html including complex tables. The only way to get static xpaths that match the label studio DOM would be if label studio could provide the DOM to the ML backend. I have now modified the label studio frontend to get the DOM and I can generate static xpaths based on it and have the HTML predictions rendered. But it isn't a very elegant solution. I really appreciate the great job you are doing with label studio, it would be great if you could consider providing the DOM to the ML backend to have matching predictions even with complex HTML.

heidi-humansignal · 2025-01-24T00:13:27Z

Hello,

We'll create a feature request! We greatly appreciate your feedback and the opportunity to consider your suggestion. Your request will be evaluated and ranked alongside other roadmap items. If our product team opts to proceed with your idea, we will keep you updated throughout the process. Please understand that while we take all requests seriously, we cannot promise implementation or a specific timeframe.

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

chrisdukeLlama · 2025-01-24T05:43:08Z

Hello Abu,
I appreciate that you will consider it! My other (newly posted) issue (feature request for export of the DOM) might actually be the broader solution for the overall problem and much more useful to implement than the dynamic xpaths!
In my opinion the DOM export could have a major impact on label studio's support for HTML data.

Anyway, thanks for the great work you guys are doing, it's a fantastic software!

heidi-humansignal · 2025-01-29T06:46:08Z

Hello,

Let the product team know about this feature request and will follow up as soon as I've update from them

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

chrisdukeLlama · 2025-01-29T09:13:31Z

@heidi-humansignal

Hello Abu,

Thanks for keeping me updated! I appreciate the consideration and look forward to any future updates.

Best,
Chris

heidi-humansignal added the community_investigate label Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic xpaths not working with HTML task preannotation via flask ML backend #6929

Dynamic xpaths not working with HTML task preannotation via flask ML backend #6929

chrisdukeLlama commented Jan 17, 2025

chrisdukeLlama commented Jan 19, 2025

heidi-humansignal commented Jan 24, 2025

chrisdukeLlama commented Jan 24, 2025

heidi-humansignal commented Jan 29, 2025

chrisdukeLlama commented Jan 29, 2025

Dynamic xpaths not working with HTML task preannotation via flask ML backend #6929

Dynamic xpaths not working with HTML task preannotation via flask ML backend #6929

Comments

chrisdukeLlama commented Jan 17, 2025

chrisdukeLlama commented Jan 19, 2025

heidi-humansignal commented Jan 24, 2025

chrisdukeLlama commented Jan 24, 2025

heidi-humansignal commented Jan 29, 2025

chrisdukeLlama commented Jan 29, 2025