-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added ctranslate2 translation example script #83
Conversation
This script is breaking because it is unable to serialize ct2 model while loading on worker. I have run this after initializing NDC gpu dask cluster on slurm but this is not working here. |
@uahmed93 , Can you post the error you saw here please. |
Here:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @uahmed93, for implementing this feature. The overall approach looks solid, but I've left some suggestions to enhance the code's readability and design.
The foundation is strong, but with a few refinements, it can be polished for better performance and maintainability.
Also , CC: @sarahyurick for suggestions . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @uahmed93 for the updates! I ran black
and there are several lines which need to be reformatted, so I added them here.
Also @uahmed93, I was wondering if I could test this PR myself by downloading the model files at https://huggingface.co/ai4bharat/indictrans2-en-indic-1B? Or should I be using something else? |
Thanks @VibhuJawa and @sarahyurick for detailed analysis. I have now added get_model_output function in Model class to handle postprocessing of outputs as suggested by @VibhuJawa . |
@sarahyurick that is hf model. CT2 model can be downloaded from here Base En-indic model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and it worked for me. Thanks @uahmed93 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add pytests, mostly other things look good to me.
Hi, @VibhuJawa @sarahyurick |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last final bits of review which i missed in the initial review, we should be good to merge after these are fixed.
Signed-off-by: Ahmed Umair <[email protected]>
to allow string output types. Signed-off-by: Ahmed Umair <[email protected]>
handle postprocessing of outputs there only. Signed-off-by: Ahmed Umair <[email protected]>
Handle some review comments. Signed-off-by: Ahmed Umair <[email protected]>
changed to type checking instead of string matching in Model. Signed-off-by: Ahmed Umair <[email protected]>
…t function. Signed-off-by: Ahmed Umair <[email protected]>
9929be2
to
476b335
Compare
Signed-off-by: Ahmed Umair <[email protected]>
Resolved it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last nits around testing
Signed-off-by: Ahmed Umair <[email protected]>
/okay to test |
@uahmed93 , Please fix linting issue. |
Signed-off-by: Ahmed Umair <[email protected]>
/okay to test |
Added a ctransalte2 example which works on string tokens instead of integer tokens.
To run the example :