-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: [Experimental] New VLM Pipeline leveraging vision models #708
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
e4a60ae
to
48faf18
Compare
64e854e
to
354c90a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged all the proposed changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial comments submitted
I'm summarizing here the target of this PR, I will submit code proposals later.
|
@@ -229,6 +229,13 @@ def repo_cache_folder(self) -> str: | |||
) | |||
|
|||
|
|||
class SmolDoclingOptions(BaseModel): | |||
question: str = "Convert this page to docling." # "Perform Layout Analysis." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reading this as a wish for experimenting. What I'm really asking is if experimenting would not imply also using a different model?
Meaning, wouldn't it be better to
- either both
question
andrepo_id
- or none of those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same model can be instructed differently to perform different tasks, default option is conversion to docling doc tags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently allowing a free question. Only a few, very specific prompts, will produce processable output by the pipeline.
Whenever we would consider making a nice wrapper for running interesting prompts with SmolDocling, I think it would deserve its own place in the docling-ibm-models, not in the model class for the processing pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either allow both question
and repo_id
or none of the two.
Signed-off-by: Christoph Auer <[email protected]> Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
…e assembly code, example included. Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
…s in VLM pipeline. This enables correct figure extraction and page numbers in provenances Signed-off-by: Maksym Lysak <[email protected]>
…easurement in smol_docling models Signed-off-by: Maksym Lysak <[email protected]>
….py enum Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
…kend Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
…updated doctags Signed-off-by: Maksym Lysak <[email protected]>
… assembly Signed-off-by: Maksym Lysak <[email protected]>
…query and artifacts path) via client code, see example in minimal_smol_docling. Provisioning for other potential vlm all-in-one models. Signed-off-by: Maksym Lysak <[email protected]>
…al pipeline option Signed-off-by: Maksym Lysak <[email protected]>
…ng of doctags, updated logging Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
… provenance definition for all elements Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
…, cleaned up comments Signed-off-by: Maksym Lysak <[email protected]>
…lated VLM pipeline option, few other minor things Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
…recated in the pipelines) Signed-off-by: Maksym Lysak <[email protected]>
…l_docling Signed-off-by: Maksym Lysak <[email protected]>
61fce90
to
a7a1f32
Compare
Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
Signed-off-by: Maksym Lysak <[email protected]>
This was merged on a derived PR. |
Preliminary integration with SmolDocling model and VLM Pipeline:
Checklist: