Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: class-based design #15

Merged
merged 20 commits into from
Dec 1, 2024
Merged

refactor: class-based design #15

merged 20 commits into from
Dec 1, 2024

Conversation

NULL204
Copy link
Contributor

@NULL204 NULL204 commented Nov 28, 2024

Summary by Sourcery

Refactor the subtitle translation functionality into a class-based design using the new SubtitleTranslator class. Update the README to reflect these changes and add tests for the new class. Adjust the CI workflow to reorder OS versions in the testing matrix.

Enhancements:

  • Refactor the code to use a class-based design by introducing the SubtitleTranslator class, encapsulating subtitle translation logic.

CI:

  • Update CI workflow to adjust the order of OS versions in the matrix.

Documentation:

  • Update README to reflect the new class-based design and usage of the SubtitleTranslator class.

Tests:

  • Add new tests for the SubtitleTranslator class to verify subtitle translation from both subtitle files and audio inputs.

	modified:   yuisub/__main__.py
	new file:   yuisub/sub_translator.py
Copy link

sourcery-ai bot commented Nov 28, 2024

Reviewer's Guide by Sourcery

This PR refactors the codebase to implement a class-based design by introducing the SubtitleTranslator class. The class encapsulates all subtitle translation functionality, including audio transcription and subtitle file handling. The changes simplify the API surface and improve code organization by moving the core functionality into a single cohesive class.

Class diagram for WhisperModel

classDiagram
    class WhisperModel {
        - name: str
        - device: Optional[Union[str, torch.device]]
        - download_root: Optional[str]
        - in_memory: bool
        + WhisperModel(name, device, download_root, in_memory)
        + transcribe(audio)
    }
Loading

File-Level Changes

Change Details Files
Introduce new SubtitleTranslator class to encapsulate subtitle translation functionality
  • Create class constructor with configuration parameters for LLM, Bangumi, and Whisper settings
  • Implement get_subtitles method to handle both audio and subtitle file inputs
  • Add automatic device selection logic for Whisper model initialization
  • Consolidate translation and bilingual subtitle generation into a single workflow
yuisub/translator.py
Refactor main script to use the new class-based approach
  • Replace direct function calls with SubtitleTranslator class usage
  • Simplify command-line argument handling
  • Add input validation to ensure either audio or subtitle file is provided
  • Update error messages and argument descriptions
yuisub/__main__.py
Update documentation and examples
  • Add class-based usage examples to README
  • Simplify code examples by showing the new unified API
  • Update documentation to reflect new class-based architecture
README.md
Update test suite for new class-based implementation
  • Add new test module for SubtitleTranslator class
  • Update existing tests to use new Bangumi token parameter
  • Add CI skip conditions for specific tests
  • Fix import statements and test utilities
tests/test_translator.py
tests/test_bangumi.py
tests/test_sub.py
tests/test_llm.py
tests/util.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @NULL204 - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 3 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

yuisub/sub_translator.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@NULL204
Copy link
Contributor Author

NULL204 commented Nov 28, 2024

@sourcery-ai review

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @NULL204 - I've reviewed your changes - here's some feedback:

Overall Comments:

  • There appears to be a bug in get_subtitles() where it uses self.sub_zh instead of the local sub_zh variable in the bilingual() call. This will cause issues since self.sub_zh is never set.
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

yuisub/translator.py Outdated Show resolved Hide resolved
yuisub/translator.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Nov 30, 2024

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

NULL204 and others added 10 commits November 30, 2024 16:12
	modified:   tests/test_bangumi.py
	modified:   tests/test_llm.py
	modified:   tests/test_sub.py
	new file:   tests/test_translator.py
	modified:   tests/util.py
	modified:   yuisub/translator.py
@Tohrusky
Copy link
Member

Tohrusky commented Dec 1, 2024

@sourcery-ai review

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @NULL204 - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟡 Testing: 3 issues found
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

sub: Optional[Union[str, Path, pysubs2.SSAFile]] = None,
audio: Optional[Union[str, Any]] = None,
styles: Optional[Dict[str, pysubs2.SSAStyle]] = None,
ad: Optional[pysubs2.SSAEvent] = advertisement(), # noqa: B008
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Avoid mutable default arguments in method parameters

Mutable default arguments can cause unexpected behavior. Consider using None as the default and creating the advertisement in the method body if needed.

@@ -112,8 +112,8 @@ async def translate(
base_url=base_url,
bangumi_info=bangumi_info,
)
print(summarizer.system_prompt)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Remove or replace debug print statement

Consider using a proper logging system instead of print statements if this information is important for debugging.

import logging

logging.debug(summarizer.system_prompt)

Comment on lines +11 to +20
async def test_translator_sub() -> None:
translator = SubtitleTranslator(
model=util.OPENAI_MODEL,
api_key=util.OPENAI_API_KEY,
base_url=util.OPENAI_BASE_URL,
bangumi_url=util.BANGUMI_URL,
bangumi_access_token=util.BANGUMI_ACCESS_TOKEN,
)

sub_zh, sub_bilingual = await translator.get_subtitles(sub=str(util.TEST_ENG_SRT))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Test should verify the content of the translated subtitles

The test only checks if the files are saved but doesn't verify the actual content of the translations. Consider adding assertions to check the translated text, timing, and format of both sub_zh and sub_bilingual.

async def test_translator_sub() -> None:
    translator = SubtitleTranslator(
        model=util.OPENAI_MODEL,
        api_key=util.OPENAI_API_KEY,
        base_url=util.OPENAI_BASE_URL,
        bangumi_url=util.BANGUMI_URL,
        bangumi_access_token=util.BANGUMI_ACCESS_TOKEN,
    )
    sub_zh, sub_bilingual = await translator.get_subtitles(sub=str(util.TEST_ENG_SRT))
    assert "你好" in str(sub_zh)
    assert "Hello" in str(sub_bilingual) and "你好" in str(sub_bilingual)

Comment on lines +10 to +19
@pytest.mark.skipif(os.environ.get("GITHUB_ACTIONS") == "true", reason="Skipping test when running on CI")
async def test_translator_sub() -> None:
translator = SubtitleTranslator(
model=util.OPENAI_MODEL,
api_key=util.OPENAI_API_KEY,
base_url=util.OPENAI_BASE_URL,
bangumi_url=util.BANGUMI_URL,
bangumi_access_token=util.BANGUMI_ACCESS_TOKEN,
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider using mocks for CI environment instead of skipping tests

Rather than skipping these tests in CI, consider mocking the external dependencies (Whisper model, OpenAI API) to allow these tests to run in all environments. This would provide better test coverage and catch potential issues earlier.

@pytest.mark.asyncio
@mock.patch('your_module.SubtitleTranslator.get_subtitles')
async def test_translator_sub(mock_get_subtitles) -> None:
    mock_get_subtitles.return_value = (Mock(), Mock())
    translator = SubtitleTranslator(
        model=util.OPENAI_MODEL,
        api_key="mock_key",
        base_url="mock_url",
        bangumi_url="mock_url",
        bangumi_access_token="mock_token"
    )
    await translator.get_subtitles(sub=str(util.TEST_ENG_SRT))

Comment on lines +26 to +35
async def test_translator_audio() -> None:
translator = SubtitleTranslator(
torch_device=util.DEVICE,
whisper_model=util.MODEL_NAME,
model=util.OPENAI_MODEL,
api_key=util.OPENAI_API_KEY,
base_url=util.OPENAI_BASE_URL,
bangumi_url=util.BANGUMI_URL,
bangumi_access_token=util.BANGUMI_ACCESS_TOKEN,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add error case tests for the SubtitleTranslator

The tests only cover the happy path. Consider adding tests for error cases such as invalid audio files, network errors, invalid API keys, and other edge cases that could occur during translation.

async def test_translator_audio() -> None:
    translator = SubtitleTranslator(
        torch_device=util.DEVICE,
        whisper_model=util.MODEL_NAME,
        model=util.OPENAI_MODEL,
        api_key=util.OPENAI_API_KEY,
        base_url=util.OPENAI_BASE_URL,
        bangumi_url=util.BANGUMI_URL,
        bangumi_access_token=util.BANGUMI_ACCESS_TOKEN,
    )

    sub_zh, sub_bilingual = await translator.get_subtitles(audio=str(util.TEST_AUDIO))
    sub_zh.save(util.projectPATH / "assets" / "test.zh.translator.audio.ass")
    sub_bilingual.save(util.projectPATH / "assets" / "test.bilingual.translator.audio.ass")

    with pytest.raises(FileNotFoundError):
        await translator.get_subtitles(audio="nonexistent_file.mp3")

    with pytest.raises(Exception):
        invalid_translator = SubtitleTranslator(
            torch_device=util.DEVICE,
            whisper_model=util.MODEL_NAME,
            model=util.OPENAI_MODEL,
            api_key="invalid_key",
            base_url=util.OPENAI_BASE_URL,
            bangumi_url=util.BANGUMI_URL,
            bangumi_access_token=util.BANGUMI_ACCESS_TOKEN,
        )
        await invalid_translator.get_subtitles(audio=str(util.TEST_AUDIO))

@Tohrusky Tohrusky changed the title Refactor this code into a class-based design using classes and object-oriented principles. refactor: class-based design Dec 1, 2024
@Tohrusky Tohrusky merged commit f991615 into TensoRaws:main Dec 1, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants