Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Youtube] Fetch all videos with search request rather than only approx. 500 video results. #30795

Closed
3 tasks done
keshawnhsieh opened this issue Mar 30, 2022 · 11 comments
Closed
3 tasks done
Labels

Comments

@keshawnhsieh
Copy link

keshawnhsieh commented Mar 30, 2022

Checklist

  • I'm asking a question
  • I've looked through the README and FAQ for similar questions
  • I've searched the bugtracker for similar questions including closed ones

Question

As the problem stated in the title, how can I fetch all the video search results rather than just a part of results, approximately 500 results. I'm not sure this is the limit from youtube or youtube-dl?

Currently, I'm using the search command with youtube-dl like this youtube-dl ytsearchall:"ditempat"

@dirkf
Copy link
Contributor

dirkf commented Mar 30, 2022

With the git master I have 501 results. The limit in the extractor itself is infinite, so perhaps there's a server-side limit, maybe depending on the API key in use.

@dirkf
Copy link
Contributor

dirkf commented Mar 30, 2022

30 pages, and then that's it.

@pukkandan
Copy link
Contributor

The website also only loads 30 pages

@dirkf
Copy link
Contributor

dirkf commented Mar 31, 2022

So someone who wanted an exhaustive search could either

  • guess a small enough timeslice to have fewer than 500 results at most and iterate from the birth of YT to now, or
  • binary chop the period from the birth of YT to now to get periods that each have fewer than 500 results and iterate over those,

or a combination of the two.

@pukkandan

This comment was marked as resolved.

@dirkf
Copy link
Contributor

dirkf commented Mar 31, 2022

IIRC at least one extractor implements a scheme similar to those I described, but making use of a proper date-range query and a total results field.

@keshawnhsieh
Copy link
Author

Tested but failed with error like this:
ERROR: Unable to recognize tab page; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

By the way, the command I used is youtube-dl "https://www.youtube.com/results?search_query=cats+after:2013-01-01+before:2014-01-01" --skip-download --get-id, hope that I didn't misunderstand your meaning.

It looks like youtube-dl has no longer support this type of request format.

@keshawnhsieh
Copy link
Author

keshawnhsieh commented Mar 31, 2022

So someone who wanted an exhaustive search could either

  • guess a small enough timeslice to have fewer than 500 results at most and iterate from the birth of YT to now, or
  • binary chop the period from the birth of YT to now to get periods that each have fewer than 500 results and iterate over those,

or a combination of the two.

To my experience, sadly this tricky idea maybe not work like as we expected. I tried with parameter --match-filter and --date, they all failed because the youtebe-dl interpret the date range as the filter to whether download each one of videos in search results. In another word, it doesn't change the behavior of search result but do affect the download behavior.

Execute youtube-dl ytsearchall:"ditempat" --date 20220330 --skip-download --write-info-json got
image

--match-filter works in the same way.

@keshawnhsieh
Copy link
Author

keshawnhsieh commented Mar 31, 2022

Yes. I tried and it indeed works. Thank you so much. You saved me a lot of time.

By the way, could you take a look at another issue I asked? Do you think it can be implemented in the similar way like passing some filter condition into ytsearchall query?

@dirkf
Copy link
Contributor

dirkf commented Mar 31, 2022

Execute youtube-dl ytsearchall:"ditempat" --date 20220330 --skip-download ...

That wouldn't help because the filter is applied after the playlist has already been extracted.

Tested but failed with error like this:
ERROR: Unable to recognize tab page;

As advised in the related thread, you need the git master, or PR #27749.

The cats query is very productive so testing it on a scale of months produces strange results (more cats in the last six months of 2013 than the whole year, eg).

But I do believe the result that 8 "cats" were posted in 2013-01-01..02 and 20 in 2013-01-01..05.

$ python -m youtube_dl -v --flat-playlist 'https://www.youtube.com/results?search_query=cats+after:2013-01-01+before:2013-01-02'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'--flat-playlist', u'https://www.youtube.com/results?search_query=cats+after:2013-01-01+before:2013-01-02']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 60f014a47
[debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[download] Downloading playlist: cats after:2013-01-01 before:2013-01-02
[youtube:search_url] query "cats after:2013-01-01 before:2013-01-02": Downloading page 1
[youtube:search_url] query "cats after:2013-01-01 before:2013-01-02": Downloading page 2
[youtube:search_url] playlist cats after:2013-01-01 before:2013-01-02: Downloading 8 videos
[download] Downloading video 1 of 8
[download] Downloading video 2 of 8
[download] Downloading video 3 of 8
[download] Downloading video 4 of 8
[download] Downloading video 5 of 8
[download] Downloading video 6 of 8
[download] Downloading video 7 of 8
[download] Downloading video 8 of 8
[download] Finished downloading playlist: cats after:2013-01-01 before:2013-01-02
$

@coletdjnz
Copy link
Contributor

coletdjnz commented Mar 31, 2022

afaik, this is not possible with INNERTUBE APIs since the website only provides a limited list of date filters. It may be possible to come up with more generic filters by analyzing the protobuf? (cc @coletdjnz) image

even if we could, manipulating protobuf in youtube-dl/p is not something we can easily do (unless we can make them fixed)

@dirkf dirkf changed the title Fetch all videos with search request rather than only approx. 500 video results. [Youtube] Fetch all videos with search request rather than only approx. 500 video results. Apr 3, 2022
@dirkf dirkf closed this as completed Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants