Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic xpaths not working with HTML task preannotation via flask ML backend #6929

Open
chrisdukeLlama opened this issue Jan 17, 2025 · 5 comments

Comments

@chrisdukeLlama
Copy link

Describe the bug
“XPath-based predictions are added to the task data but not visually displayed in the UI when using dynamic XPaths like .//p[contains(., 'Nutzenbewertung')].”

To Reproduce
Steps to reproduce the behavior:
Use the following HTML in a Label Studio project:
html

Nutzenbewertung gemäß § 35a SGB V

Another paragraph containing Nutzenbewertung.

Add predictions using the XPath:

//*[contains(text(), 'Nutzenbewertung,')]
Observe that the prediction is added to the task data but not displayed in the UI.

Expected behavior
“The prediction should be visually displayed in the UI, as the XPath matches the correct text nodes.”

Environment (please complete the following information):

  • OS: macOS Sonoma
  • Label Studio Version v1.15.0

Additional context
The XPath works in a flask backend and in a browser (showing the html or the html within Label studio) but not in Label Studio.
Predictions using exact XPaths derived from annotations made in the label Studio interface like /p[34]/text()[1] work when put statically in the flask backend, but dynamic XPaths fail.

Task source with prediction and annotation (annotation done after prediction, so I am sure I could not see the prediction even before the annotation):

{
"id": 5084,
"data": {
"html": "

Nutzenbewertung gemäß § 35a SGB V

\n

Another paragraph containing Nutzenbewertung.

"
},
"annotations": [
{
"id": 1306,
"result": [
{
"id": "735b0e8f-2c80-4168-b358-6662097a7739",
"type": "labels",
"value": {
"end": ".//p[contains(., 'Nutzenbewertung')]",
"text": "Nutzenbewertung gemäß § 35a SGB V",
"start": ".//p[contains(., 'Nutzenbewertung')]",
"labels": [
"Ursprungsprojekt"
],
"endOffset": 33,
"startOffset": 0
},
"origin": "prediction",
"to_name": "text",
"from_name": "Was war der ursprüngliche Auftrag?"
},
{
"id": "Wtcf3gI9g7",
"type": "labels",
"value": {
"end": "/p[1]/text()[1]",
"text": "Nutzenbewertung gemäß § 35a SGB V",
"start": "/p[1]/text()[1]",
"labels": [
"Ursprungsprojekt"
],
"endOffset": 33,
"startOffset": 0,
"globalOffsets": {
"end": 33,
"start": 0
}
},
"origin": "manual",
"to_name": "text",
"from_name": "Was war der ursprüngliche Auftrag?"
}
],
"created_username": " XXXXX, 1",
"created_ago": "0 minutes",
"completed_by": {
"id": 1,
"first_name": "",
"last_name": "",
"avatar": null,
"email": "[email protected]",
"initials": "ch"
},
"was_cancelled": false,
"ground_truth": false,
"created_at": "2025-01-17T15:32:22.888219Z",
"updated_at": "2025-01-17T15:32:22.888240Z",
"draft_created_at": "2025-01-17T15:32:09.385624Z",
"lead_time": 27.26,
"import_id": null,
"last_action": null,
"task": 5084,
"project": 1,
"updated_by": 1,
"parent_prediction": 175,
"parent_annotation": null,
"last_created_by": null
}
],
"predictions": [
{
"id": 175,
"result": [
{
"id": "735b0e8f-2c80-4168-b358-6662097a7739",
"type": "labels",
"value": {
"end": ".//p[contains(., 'Nutzenbewertung')]",
"text": "Nutzenbewertung gemäß § 35a SGB V",
"start": ".//p[contains(., 'Nutzenbewertung')]",
"labels": [
"Ursprungsprojekt"
],
"endOffset": 33,
"startOffset": 0
},
"origin": "prediction",
"to_name": "text",
"from_name": "Was war der ursprüngliche Auftrag?"
}
],
"model_version": "v1.0",
"created_ago": "10 minutes",
"score": 1,
"cluster": null,
"neighbors": null,
"mislabeling": 0,
"created_at": "2025-01-17T15:22:24.182589Z",
"updated_at": "2025-01-17T15:22:24.182602Z",
"model": null,
"model_run": null,
"task": 5084,
"project": 1
}
]
}

"Prediction" passing the xpath from label studio also via the backend (this shows up in the gui and is highlighted):

{
"id": 5083,
"data": {
"html": "

Nutzenbewertung gemäß § 35a SGB V

\n

Another paragraph containing Nutzenbewertung.

"
},
"annotations": [],
"predictions": [
{
"id": 174,
"result": [
{
"id": "123KBhsJdU-5w",
"type": "labels",
"value": {
"end": "/p[1]/text()[1]",
"text": "Nutzenbewertung gemäß § 35a SGB V",
"start": "/p[1]/text()[1]",
"labels": [
"Ursprungsprojekt"
],
"endOffset": 33,
"startOffset": 0
},
"origin": "prediction",
"to_name": "text",
"from_name": "Was war der ursprüngliche Auftrag?"
}
],
"model_version": "v1.0",
"created_ago": "14 minutes",
"score": 1,
"cluster": null,
"neighbors": null,
"mislabeling": 0,
"created_at": "2025-01-17T15:21:34.936985Z",
"updated_at": "2025-01-17T15:21:34.936995Z",
"model": null,
"model_run": null,
"task": 5083,
"project": 1
}
]
}

@chrisdukeLlama
Copy link
Author

As a further explanation: I have really complicated html including complex tables. The only way to get static xpaths that match the label studio DOM would be if label studio could provide the DOM to the ML backend. I have now modified the label studio frontend to get the DOM and I can generate static xpaths based on it and have the HTML predictions rendered. But it isn't a very elegant solution. I really appreciate the great job you are doing with label studio, it would be great if you could consider providing the DOM to the ML backend to have matching predictions even with complex HTML.

@heidi-humansignal
Copy link
Collaborator

Hello,

We'll create a feature request! We greatly appreciate your feedback and the opportunity to consider your suggestion. Your request will be evaluated and ranked alongside other roadmap items. If our product team opts to proceed with your idea, we will keep you updated throughout the process. Please understand that while we take all requests seriously, we cannot promise implementation or a specific timeframe.

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

@chrisdukeLlama
Copy link
Author

Hello Abu,
I appreciate that you will consider it! My other (newly posted) issue (feature request for export of the DOM) might actually be the broader solution for the overall problem and much more useful to implement than the dynamic xpaths!
In my opinion the DOM export could have a major impact on label studio's support for HTML data.

Anyway, thanks for the great work you guys are doing, it's a fantastic software!

@heidi-humansignal
Copy link
Collaborator

Hello,

Let the product team know about this feature request and will follow up as soon as I've update from them

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

@chrisdukeLlama
Copy link
Author

@heidi-humansignal

Hello Abu,

Thanks for keeping me updated! I appreciate the consideration and look forward to any future updates.

Best,
Chris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants