Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[youtube] Use localized title & channel name #28538

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mjsir911
Copy link

@mjsir911 mjsir911 commented Mar 25, 2021

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

  • Searched the bugtracker for similar pull requests
  • Read youtube-dl coding conventions and adjusted the code to meet them
  • Covered the code with tests (note that PRs without tests will be REJECTED)
  • Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Get the localized version of title, channel name

  • Prioritize microformat title & description over basic title & description
  • Fix bug causing channel to never be set, expecting wrong type

Related:

Fixes:

This codebase was a pleasure to work with, kudos

@mjsir911 mjsir911 force-pushed the msirabella/youtube_translated branch from 4174607 to 16bf9a8 Compare March 25, 2021 08:03
@mjsir911 mjsir911 changed the title [WIP] use youtube localized title & owner name use youtube localized title & owner name Mar 25, 2021
@mjsir911 mjsir911 force-pushed the msirabella/youtube_translated branch from 16bf9a8 to c244a2d Compare March 25, 2021 08:13
@mjsir911 mjsir911 changed the title use youtube localized title & owner name Ese youtube localized title & channel name Mar 25, 2021
@mjsir911 mjsir911 changed the title Ese youtube localized title & channel name Use youtube localized title & channel name Mar 25, 2021
@mjsir911 mjsir911 force-pushed the msirabella/youtube_translated branch from c244a2d to 3a9bf60 Compare March 25, 2021 08:17
@mjsir911 mjsir911 changed the title Use youtube localized title & channel name [youtube] Use localized title & channel name Mar 25, 2021
@mjsir911 mjsir911 force-pushed the msirabella/youtube_translated branch from 3a9bf60 to d6a4913 Compare March 26, 2021 00:32
@mjsir911
Copy link
Author

mjsir911 commented Mar 27, 2021

Also closes #28562 but the same fix done in this PR was implemented in 49fc0a5 :)

This PR adds tests for it too though

Previously was never getting set because was expecting a string but
getting a dict, as evidenced by get_text

Surprisingly, can't find an issue to close with this one
@theultramage
Copy link

If you don't mind, could you explain what your edit does, what the logic behind it is, and in what way does it successfully resolve the issues you are intending to close? At first glance it is hard to tell from just the code change, because the diff is tiny and just changes the order of operations. I am worried that your 'localized version' actually means 'the hardcoded en-US version'. At least two of the issues you referenced are requesting control over the selected language. One is asking for additional commandline switches for retrieving the title/desc in one, multiple or all available languages, like how it's currently done for subtitles. I don't believe this PR addresses these, and may even be working against the goals of those issue reports.

@mjsir911
Copy link
Author

I appreciate the feedback @theultramage!

could you explain what your edit does, what the logic behind it is, and in what way does it successfully resolve the issues you are intending to close?

Currently, youtube's video_details.get('title') returns the canonical title of a video, while microformat.get('title') returns the version specified by the uploader localized to the request's language, if available.

Example:

$ youtube-dl --get-title https://www.youtube.com/watch?v=UnIhRpIT7nc --print-traffic # returns english translated
... Accept-Language: en-us,en ...
inabakumori - Lagtrain (Vo. Kaai Yuki) / 稲葉曇『ラグトレイン』Vo. 歌愛ユキ
$ youtube-dl --get-title https://www.youtube.com/watch?v=UnIhRpIT7nc --add-header 'Accept-Language: es' --print-traffic # no translated version in spanish, return the default title
... Accept-Language:  es ...
稲葉曇『ラグトレイン』Vo. 歌愛ユキ

I am worried that your 'localized version' actually means 'the hardcoded en-US version'

It does, without the --add-header 'Accept-Language: es' flag. Perhaps the solution is not hardcoding the Accept-Language header in youtube-dl? But --add-header 'Accept-Language: es' as a flag is not so complicated, but feedback for making this nicer would be appreciated

One is asking for additional commandline switches for retrieving the title/desc in one, multiple or all available languages, like how it's currently done for subtitles.

I guess I missed this. This PR does not cover this request, as requesting multiple titles/desc requires multiple requests & would require a more complicated change. I believe all subtitles are shipped in a single request?

and may even be working against the goals of those issue reports.

Not sure what you mean by this. The fact that it is defaulting to the hardcoded en-us?.

All in all, I think you are right, and this PR needs a bit more to not just default to en-us.

Does de-hardcoding the default Accept-Language sound like a way forwards? I'm not sure how much garbage downloaded that makes way for. Something derived from locale.getlocale()?

@theultramage
Copy link

Thank you for the explanation. I've seen multiple reports point out that adding the accept language header had no effect whatsoever, and there was no workaround. You managed to find the cause and patch it in a way that makes the workaround possible, and perhaps opens up a way for future improvements. I was initially worried that just flipping it to default En, closing all the tickets and declaring the problem solved would make the devs complacent and the problem would get even less attention than before.

I think this PR is good enough for what it intends to do (assuming there are no other issues), but I would narrow down the list of closed issues to those that are directly addressed by the PR and do not call for any additional work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment