Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PowerPoint translation functionality using python-pptx and OpenAI #240

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

github-actions[bot]
Copy link
Contributor

This message was generated by git-bob (version: 0.19.0, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

I've created a Jupyter notebook that implements PowerPoint translation functionality using python-pptx and OpenAI's API. The notebook provides functions to extract text from PowerPoint slides, translate the content using OpenAI's GPT model, and update the presentation with translated text. It includes proper error handling, sanity checks for translation output, and saves intermediate results as JSON files for inspection. The implementation follows the suggested approach using the prompt_chatgpt helper function for OpenAI API integration, making it easy to translate presentations between languages while preserving the presentation structure.

  • translate_pptx.ipynb Created a new notebook implementing PowerPoint translation using python-pptx and OpenAI API

closes #239

@haesleinhuepf
Copy link
Owner

Ok, that's already quite good. Please change 2 things:

  • In case no API key is set, return the JSON in original language and print out a warning
  • When reading and writing texts from the Powerpoint file, make sure formattings are kept like in the original. I presume this can be done by not stripping text from shapes, but going through its elements individually.

git-bob implement this

Copy link
Contributor Author

This message was generated by git-bob (version: 0.19.0, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

I've updated the PowerPoint translation notebook to address two key requirements. First, I added a safety check for the OpenAI API key - if it's not set, the system now returns the original text and displays a warning instead of failing. Second, I improved the text formatting preservation by modifying how we handle text extraction and updating in PowerPoint shapes - instead of simply stripping text, we now process each text element individually to maintain the original formatting. This ensures that the translated presentations maintain the same visual style and formatting as the originals.

  • translate_pptx.ipynb Updated PowerPoint translation notebook to handle missing API keys and preserve text formatting in shapes.

@haesleinhuepf
Copy link
Owner

One more thing: Adapt the prompt so that institute names and names of things (projects, libraries) are kept as they are.

git-bob implement this

…es and updated warning message for missing API key.
Copy link
Contributor Author

This message was generated by git-bob (version: 0.19.0, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

I've updated the PowerPoint translation notebook to address the latest requirement regarding proper name preservation. The main change was in the translation function where I modified the prompt to ensure that institute names, project names, and library names remain unchanged during translation. This complements the previous changes where we added proper handling of missing API keys (returning original text with warnings) and maintained text formatting by processing individual shape elements. The overall functionality remains the same, but now it's more robust in preserving important named entities in the presentations.

  • translate_pptx.ipynb Updated translation prompt to preserve institute, project, and library names while translating PowerPoint content.

@haesleinhuepf
Copy link
Owner

Ok, please also make sure that font-colours are kept and the links remain the same.
Last but not least, I see empty lines disappearing during translation. Please prevent this.

git-bob implement this

… in PowerPoint presentations during translation.
Copy link
Contributor Author

This message was generated by git-bob (version: 0.19.0, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

I've made several improvements to the PowerPoint translation functionality to address specific formatting requirements. The main updates focus on preserving the visual integrity of presentations during translation. I modified the code to maintain font colors, ensure hyperlinks remain functional, and prevent the loss of empty lines in the translated content. This was achieved by enhancing the text extraction and updating functions to capture and retain these additional formatting attributes. These changes complement the previous improvements where we added API key handling, preserved text formatting, and maintained named entities like institute and project names intact during translation.

  • translate_pptx.ipynb Updated code to preserve font colors, hyperlinks, and empty lines during PowerPoint translation while maintaining existing functionality.

@haesleinhuepf
Copy link
Owner

There was an issue with this line:

'color': run.font.color.rgb if run.font.color else None,

please replace it with this one:

'color': run.font.color.rgb if run.font.color and hasattr(run.font.color, "rgb") else None,

Also make sure that the resulting PPTX file has as many paragraphs on each slide as the original file. You can fix this, by making sure that the for-loops going through the slides work both like in the extract_text_from_slides function.

git-bob implement this

…e and added paragraph preservation in update_presentation function.
Copy link
Contributor Author

This message was generated by git-bob (version: 0.19.1, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

I updated the PowerPoint translation notebook to fix two important issues that were affecting the formatting preservation. First, I modified the font color extraction code to handle cases where the color attribute might not have an rgb property, preventing potential errors. Second, I improved the slide update mechanism to ensure that the number of paragraphs in the translated presentation matches exactly with the original presentation, maintaining the correct structure. These changes make the translation process more robust while preserving all formatting aspects, including font colors, hyperlinks, and empty lines.

  • translate_pptx.ipynb Fixed font color attribute handling and improved paragraph preservation in PowerPoint translation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Programming question]: Translating PPTx files
1 participant