Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a vision-to-text example with a very similar structure to the DistilViT example.
Different TrOCR model sizes use different tokenizers. This example works with the base model which uses a BPE tokenizer, but not the small model which uses a unigram tokenizer.
Compared to Ocrs the models are much larger and thus slower to execute. However being a bigger model it also has more capacity.
TODO:
If
operator. This will allow using the "merged" output model from Optimum (decoder_model_merged.onnx
), which is faster than using the cache-less model (decoder_model.onnx
) alone and more size-efficient than using separate models for the initial run and subsequent runs. See ImplementIf
operator #306.past_key_values.{layer}.encoder.{key,value}
inputs that Optimum uses. Unlike self-attention KV-caches these are generated once when the encoder is run for the first time and skipped in subsequent runs (Support cross-attention key-value caches in rten-generate #318)LayerNormalization
op is not fused in the decoderfuse_layer_norm
doesn't match because it expects arguments to theAdd
andMul
operators to be constants. However they are actually value nodes which capture values from constants defined in the parent graph.