feat: [Experimental] New VLM Pipeline leveraging vision models #708

maxmnemonic · 2025-01-08T09:22:42Z

Preliminary integration with SmolDocling model and VLM Pipeline:

SmolDocling inference model
New VLM Pipeline that uses SmolDocling model
Assembly code that builds Docling document from Doc-tags format predicted by SmolDocling
Example of how to use
Rudimentary speed measurement logging

Checklist:

Documentation has been updated, if necessary.
Examples have been added, if necessary.
Tests have been added, if necessary.

mergify · 2025-01-08T09:23:16Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

pyproject.toml

docling/models/smol_docling_model.py

maxmnemonic

merged all the proposed changes

dolfim-ibm

Initial comments submitted

docling/pipeline/base_pipeline.py

docling/pipeline/standard_pdf_pipeline.py

docling/pipeline/vlm_pipeline.py

docling/models/smol_docling_model.py

dolfim-ibm · 2025-02-13T15:15:50Z

I'm summarizing here the target of this PR, I will submit code proposals later.

`VlmPipeline`

Specs of the new pipeline

Input: (PDF) Document
Processing: using a vision language model
Output: DoclingDocument

Implementations

`SmolDocling`

Here the model will produce accurate DocTags which are converted (in the assemble step) to a DoclingDocument.

Other DocTags models

In the future we expect more models producing DocTags, which would go through the same assembling step of SmolDocling.

Other intermediate outputs

The pipeline will also support the case of VLMs producing a different intermediate representation. For example, models producing Markdown output, then we internally reuse the Markdown backend to create the DoclingDocument.

Wrap up

We definitely don't have to implement more than what it is nicely done in the PR, but a few naming (specially in the options) could be tuned for being ready for the next steps.

My suggestion is to use the vlm_options as the discriminator which, in the future, will decide things like 1) which model to call, 2) which type of internal assemble.

I would at least introduce from the beginning the kind in the options.

dolfim-ibm · 2025-02-13T15:26:13Z

docling/datamodel/pipeline_options.py

@@ -229,6 +229,13 @@ def repo_cache_folder(self) -> str:
 )


+class SmolDoclingOptions(BaseModel):
+    question: str = "Convert this page to docling."  # "Perform Layout Analysis."


I'm reading this as a wish for experimenting. What I'm really asking is if experimenting would not imply also using a different model?

Meaning, wouldn't it be better to

either both question and repo_id

or none of those?

The same model can be instructed differently to perform different tasks, default option is conversion to docling doc tags

This is currently allowing a free question. Only a few, very specific prompts, will produce processable output by the pipeline.

Whenever we would consider making a nice wrapper for running interesting prompts with SmolDocling, I think it would deserve its own place in the docling-ibm-models, not in the model class for the processing pipeline.

We should either allow both question and repo_id or none of the two.

docling/pipeline/base_pipeline.py

docling/pipeline/standard_pdf_pipeline.py

docling/pipeline/vlm_pipeline.py

Signed-off-by: Christoph Auer <[email protected]> Signed-off-by: Maksym Lysak <[email protected]>

Signed-off-by: Maksym Lysak <[email protected]>

…e assembly code, example included. Signed-off-by: Maksym Lysak <[email protected]>

Signed-off-by: Maksym Lysak <[email protected]>

…s in VLM pipeline. This enables correct figure extraction and page numbers in provenances Signed-off-by: Maksym Lysak <[email protected]>

…easurement in smol_docling models Signed-off-by: Maksym Lysak <[email protected]>

….py enum Signed-off-by: Maksym Lysak <[email protected]>

Signed-off-by: Maksym Lysak <[email protected]>

…kend Signed-off-by: Maksym Lysak <[email protected]>

Signed-off-by: Maksym Lysak <[email protected]>

…updated doctags Signed-off-by: Maksym Lysak <[email protected]>

… assembly Signed-off-by: Maksym Lysak <[email protected]>

…query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models. Signed-off-by: Maksym Lysak <[email protected]>

…al pipeline option Signed-off-by: Maksym Lysak <[email protected]>

…ng of doctags, updated logging Signed-off-by: Maksym Lysak <[email protected]>

Signed-off-by: Maksym Lysak <[email protected]>

… provenance definition for all elements Signed-off-by: Maksym Lysak <[email protected]>

Signed-off-by: Christoph Auer <[email protected]>

Signed-off-by: Maksym Lysak <[email protected]>

…, cleaned up comments Signed-off-by: Maksym Lysak <[email protected]>

…lated VLM pipeline option, few other minor things Signed-off-by: Maksym Lysak <[email protected]>

Signed-off-by: Maksym Lysak <[email protected]>

…recated in the pipelines) Signed-off-by: Maksym Lysak <[email protected]>

…l_docling Signed-off-by: Maksym Lysak <[email protected]>

Signed-off-by: Maksym Lysak <[email protected]>

cau-git · 2025-02-26T19:03:50Z

This was merged on a derived PR.

maxmnemonic changed the title ~~WIP: Integration of SmolDocling pipeline~~ WIP: Integration of SmolDocling Jan 8, 2025

maxmnemonic force-pushed the mly/smol-docling-integration branch from e4a60ae to 48faf18 Compare January 8, 2025 15:12

cau-git changed the title ~~WIP: Integration of SmolDocling~~ feat: [WIP] Integration of SmolDocling Jan 10, 2025

maxmnemonic force-pushed the mly/smol-docling-integration branch from 64e854e to 354c90a Compare January 16, 2025 15:52

dolfim-ibm reviewed Feb 12, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

dolfim-ibm reviewed Feb 12, 2025

View reviewed changes

docling/models/smol_docling_model.py Outdated Show resolved Hide resolved

maxmnemonic commented Feb 12, 2025

View reviewed changes

maxmnemonic requested review from dolfim-ibm, cau-git, PeterStaar-IBM and nassarofficial February 12, 2025 17:50

maxmnemonic marked this pull request as ready for review February 12, 2025 17:55

maxmnemonic changed the title ~~feat: [WIP] Integration of SmolDocling~~ feat: Integration of SmolDocling Feb 12, 2025

dolfim-ibm reviewed Feb 13, 2025

View reviewed changes

dolfim-ibm changed the title ~~feat: Integration of SmolDocling~~ feat: New VLM Pipeline leveraging vision models Feb 14, 2025

cau-git reviewed Feb 14, 2025

View reviewed changes

cau-git and others added 11 commits February 24, 2025 11:46

Skeleton for SmolDocling model and VLM Pipeline

dc3a388

Signed-off-by: Christoph Auer <[email protected]> Signed-off-by: Maksym Lysak <[email protected]>

wip smolDocling inference and vlm pipeline

03c8d45

Signed-off-by: Maksym Lysak <[email protected]>

WIP, first working code for inference of SmolDocling, and vlm pipelin…

3c4c647

…e assembly code, example included. Signed-off-by: Maksym Lysak <[email protected]>

Fixes to preserve page image and demo export to html

1b968e4

Signed-off-by: Maksym Lysak <[email protected]>

Enabled figure support in vlm_pipeline

ef079e4

Signed-off-by: Maksym Lysak <[email protected]>

Fix for table span compute in vlm_pipeline

01c46e2

Signed-off-by: Maksym Lysak <[email protected]>

Properly propagating image data per page, together with predicted tag…

61bb9db

…s in VLM pipeline. This enables correct figure extraction and page numbers in provenances Signed-off-by: Maksym Lysak <[email protected]>

Cleaned up logs, added pages to vlm_pipeline, basic timing per page m…

2a43c19

…easurement in smol_docling models Signed-off-by: Maksym Lysak <[email protected]>

Replaced hardcoded otsl tokens with the ones from docling-core tokens…

4370535

….py enum Signed-off-by: Maksym Lysak <[email protected]>

Added tokens/sec measurement, improved example

e092978

Signed-off-by: Maksym Lysak <[email protected]>

Added capability for vlm_pipeline to grab text from preconfigured bac…

0dc3ac4

…kend Signed-off-by: Maksym Lysak <[email protected]>

Maksym Lysak and others added 19 commits February 24, 2025 13:12

Exposed "force_backend_text" as pipeline parameter

9901729

Signed-off-by: Maksym Lysak <[email protected]>

Flipped keep_backend to True for vlm_pipeline assembly to work

f6d123a

Signed-off-by: Maksym Lysak <[email protected]>

Updated vlm pipeline assembly and smol docling model code to support …

0fe12d8

…updated doctags Signed-off-by: Maksym Lysak <[email protected]>

Fixing doctags starting tag, that broke elements on first line during…

88b9ac6

… assembly Signed-off-by: Maksym Lysak <[email protected]>

Introduced SmolDoclingOptions to configure model parameters (such as …

f2751e1

…query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models. Signed-off-by: Maksym Lysak <[email protected]>

Moved artifacts_path for SmolDocling into vlm_options instead of glob…

7c4ab5c

…al pipeline option Signed-off-by: Maksym Lysak <[email protected]>

New assembly code for latest model revision, updated prompt and parsi…

479ee23

…ng of doctags, updated logging Signed-off-by: Maksym Lysak <[email protected]>

Updated example of Smol Docling usage

d7abe1b

Signed-off-by: Maksym Lysak <[email protected]>

Added captions for the images for SmolDocling assembly code, improved…

b1df461

… provenance definition for all elements Signed-off-by: Maksym Lysak <[email protected]>

Update minimal smoldocling example

6f9f4f4

Signed-off-by: Christoph Auer <[email protected]>

Fix repo id

55fa4eb

Signed-off-by: Christoph Auer <[email protected]>

Cleaned up unnecessary logging

e486eb1

Signed-off-by: Maksym Lysak <[email protected]>

More elegant solution in removing the input prompt

66532ea

Signed-off-by: Maksym Lysak <[email protected]>

removed minimal_smol_docling example from CI checks

b12f5ba

Signed-off-by: Maksym Lysak <[email protected]>

Removed special html code wrapping when exporting to docling document…

b0935da

…, cleaned up comments Signed-off-by: Maksym Lysak <[email protected]>

Addressing PR comments, added enabled property to SmolDocling, and re…

853544b

…lated VLM pipeline option, few other minor things Signed-off-by: Maksym Lysak <[email protected]>

Moved keep_backend = True to vlm pipeline

0c60ef1

Signed-off-by: Maksym Lysak <[email protected]>

removed pipeline_options.generate_table_images from vlm_pipeline (dep…

1dbedcb

…recated in the pipelines) Signed-off-by: Maksym Lysak <[email protected]>

Added example on how to get original predicted doctags in minimal_smo…

a7a1f32

…l_docling Signed-off-by: Maksym Lysak <[email protected]>

maxmnemonic force-pushed the mly/smol-docling-integration branch from 61fce90 to a7a1f32 Compare February 24, 2025 13:45

Maksym Lysak added 4 commits February 24, 2025 15:13

removing changes from base_pipeline

a095a7c

Signed-off-by: Maksym Lysak <[email protected]>

Replaced remaining strings to appropriate enums

923f766

Signed-off-by: Maksym Lysak <[email protected]>

Updated poetry.lock

9ecec1d

Signed-off-by: Maksym Lysak <[email protected]>

re-built poetry.lock

1c75b52

Signed-off-by: Maksym Lysak <[email protected]>

dolfim-ibm changed the title ~~feat: New VLM Pipeline leveraging vision models~~ feat: [Experimental] New VLM Pipeline leveraging vision models Feb 25, 2025

maxmnemonic requested review from dolfim-ibm and cau-git February 25, 2025 08:11

cau-git closed this Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: [Experimental] New VLM Pipeline leveraging vision models #708

feat: [Experimental] New VLM Pipeline leveraging vision models #708

maxmnemonic commented Jan 8, 2025 •

edited

Loading

mergify bot commented Jan 8, 2025 •

edited

Loading

maxmnemonic left a comment •

edited

Loading

dolfim-ibm left a comment

dolfim-ibm commented Feb 13, 2025

dolfim-ibm Feb 13, 2025

maxmnemonic Feb 13, 2025

dolfim-ibm Feb 13, 2025

dolfim-ibm Feb 25, 2025

cau-git commented Feb 26, 2025

feat: [Experimental] New VLM Pipeline leveraging vision models #708

feat: [Experimental] New VLM Pipeline leveraging vision models #708

Conversation

maxmnemonic commented Jan 8, 2025 • edited Loading

mergify bot commented Jan 8, 2025 • edited Loading

Merge Protections

🟢 Enforce conventional commit

maxmnemonic left a comment • edited Loading

Choose a reason for hiding this comment

dolfim-ibm left a comment

Choose a reason for hiding this comment

dolfim-ibm commented Feb 13, 2025

VlmPipeline

Implementations

SmolDocling

Other DocTags models

Other intermediate outputs

Wrap up

dolfim-ibm Feb 13, 2025

Choose a reason for hiding this comment

maxmnemonic Feb 13, 2025

Choose a reason for hiding this comment

dolfim-ibm Feb 13, 2025

Choose a reason for hiding this comment

dolfim-ibm Feb 25, 2025

Choose a reason for hiding this comment

cau-git commented Feb 26, 2025

maxmnemonic commented Jan 8, 2025 •

edited

Loading

mergify bot commented Jan 8, 2025 •

edited

Loading

maxmnemonic left a comment •

edited

Loading

`VlmPipeline`

`SmolDocling`