Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix npo support #31976

Open
wants to merge 52 commits into
base: master
Choose a base branch
from
Open

Fix npo support #31976

wants to merge 52 commits into from

Conversation

bartbroere
Copy link

@bartbroere bartbroere commented Mar 31, 2023

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

  • Bug fix

Fix support for NPO sites

This fixes two things that have been changed on the NPO websites:

  • The current video player no longer returns the token in the JSON body, but instead provides us with an XSRF token in the cookie.
  • A second call changed from GET to POST and should include this XSRF token.

This branch started out as a small fix, but in the ~11 months the PR was open the NPO site was updated heavily, so now it changes a lot more than the things above. Many of the broadcaster (NL: omroep) sites' extractors (vpro etc.) no longer worked, so these have been removed entirely. I'm always willing to look into re-implementing support for some of these, if that's still possible. However, typically the broadcaster's website just embeds the npo.nl player, so grabbing the same media from the npo.nl URL is recommended.

Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work!

I've made a few suggestions, really just conventions, so I'll let the CI test run now.

youtube_dl/extractor/npo.py Outdated Show resolved Hide resolved
youtube_dl/extractor/npo.py Outdated Show resolved Hide resolved
youtube_dl/extractor/npo.py Outdated Show resolved Hide resolved
youtube_dl/extractor/npo.py Outdated Show resolved Hide resolved
youtube_dl/extractor/npo.py Outdated Show resolved Hide resolved
@bartbroere bartbroere requested a review from dirkf April 3, 2023 07:53
@bartbroere
Copy link
Author

I have accepted all suggestions here. Sorry I missed some conventions.

* simplify comment
* force CI
@bartbroere
Copy link
Author

@dirkf Is it ready to merge now?

@RickTimmer
Copy link

I have had great success using this fork to download NPO Start videos. Greatly appreciated @bartbroere. It would be nice to see these changes merged.

Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to merge this but it needs valid tests.

All the current test URLs redirect to the home page, but maybe that doesn't happen in NL. Please ensure that there is at least one valid test (with playable content, non-DRM) and mark any invalid URLs with

        'skip': 'Content expired',

(or ...: 'only available in NL',, or whatever is appropriate) after the info_dict. If an invalid test URL doesn't match the extractor's _VALID_URL, it should be deleted (but I don't think that applies here).

If URLs are only valid in NL, or with an account, please post a console log of successful tests.

I've made some additional suggestions that I might improve with the aid of a usable test URL.

youtube_dl/extractor/npo.py Show resolved Hide resolved
youtube_dl/extractor/npo.py Outdated Show resolved Hide resolved
youtube_dl/extractor/npo.py Outdated Show resolved Hide resolved
youtube_dl/extractor/npo.py Outdated Show resolved Hide resolved
youtube_dl/extractor/npo.py Outdated Show resolved Hide resolved
@bartbroere
Copy link
Author

I'd like to merge this but it needs valid tests.

All the current test URLs redirect to the home page, but maybe that doesn't happen in NL. Please ensure that there is at least one valid test (with playable content, non-DRM) and mark any invalid URLs with

        'skip': 'Content expired',

(or ...: 'only available in NL',, or whatever is appropriate) after the info_dict. If an invalid test URL doesn't match the extractor's _VALID_URL, it should be deleted (but I don't think that applies here).

If URLs are only valid in NL, or with an account, please post a console log of successful tests.

I've made some additional suggestions that I might improve with the aid of a usable test URL.

Thanks for the suggested improvements. I'll address all the feedback, and re-request review once I feel it's all done.

@bartbroere
Copy link
Author

@dirkf Looking into it I realised this is probably caused by the recently released new NPO app and website. I'll check all tests and maybe add some new ones

@bartbroere
Copy link
Author

yt-dlp/yt-dlp#9319 addresses the same issues. I think (speaking for the Dutch users a bit) that support for npo.nl is the main feature, and all the other sites often just embed the NPO player in some way.

Since almost all of the other sites no longer work in the same way as before, I would propose throwing away extractors for many other domains, since rebuilding them is probably a lot quicker.

Re-implementing these is quicker for the cases where that's even still possible
@dirkf
Copy link
Contributor

dirkf commented Mar 1, 2024

It's a bit vexing that the yt-dlp PR has done the same job, but by all means plunder any useful stuff from there. I would generally merge/back-port an updated yt-dlp extractor but generally also only when ours doesn't work.

@bartbroere
Copy link
Author

bartbroere commented Mar 1, 2024

It's a bit vexing that the yt-dlp PR has done the same job, but by all means plunder any useful stuff from there. I would generally merge/back-port an updated yt-dlp extractor but generally also only when ours doesn't work.

I'm not convinced that the pull request on yt_dlp actually works for the new npo.nl player yet:
https://github.com/rvsit/yt-dlp/blob/c2d2b589588c7dbe44a97b4d29bad0d68922cd07/yt_dlp/extractor/npo.py#L122-L123

On the other hand, my branch probably doesn't work for some of the yt_dlp test cases yet.

So I think we can do a nice bit of "cross-plundering" between my youtube-dl pull request and @rvsit 's yt-dlp pull request.

@bartbroere bartbroere mentioned this pull request Mar 1, 2024
9 tasks
@dirkf
Copy link
Contributor

dirkf commented Mar 1, 2024

Great.

Also, if there are standard patterns for the discarded sites where NPO videos are embedded, we can add a class variable _EMBED_REGEX that is a list of those patterns with the NPO URL found in regex group url. Add this to provide an entry-point that could be called from the generic extractor:

    @classmethod
    def _extract_urls(cls, webpage):
        def yield_urls():
            for p in cls._EMBED_REGEX:
                for m in re.finditer(p, webpage):
                    yield m.group('url')

        return list(yield_urls())

Eventually the webpage extraction system from yt-dlp should be pulled in, with the method standardised in InfoExtractor, and the generic extractor refactored to blazes.

@bartbroere
Copy link
Author

I'll be mostly offline during April, so this is just a heads up that this PR will not see much progress for a few weeks. I didn't forget about it though, and plan to continue working on it in May.

@dirkf
Copy link
Contributor

dirkf commented Apr 3, 2024

I may be able to save you much of that work. I've run through the d/l tests locally and adjusted the code somewhat to match. If anyone has an answer to #31976 (comment), I'll be able to settle that extractor and push an update that should be close to mergeable.

@bartbroere
Copy link
Author

It successfully downloads all test URLs right now, but I'm not sure what it should say in the ext field in the info_dict of each test. I tried debugging it, but that gets tricky with the format_selector method being created at runtime.

The example URLs all download successfully with this command

python3.11 -m youtube_dl https://npo.nl/start/serie/vpro-tegenlicht/seizoen-11/zwart-geld-de-toekomst-komt-uit-afrika https://npo.nl/start/serie/zembla/seizoen-2015/wie-is-de-mol-2/ https://www.bnnvara.nl/videos/27455 https://ongehoordnederland.tv/2024/03/01/korte-clips/heeft-preppen-zin-betwijfel-dat-je-daar-echt-iets-aan-zult-hebben-bij-oorlog-lydia-daniel/ https://www.zapp.nl/programmas/zappsport/gemist/POMS_AT_811523 https://schooltv.nl/item/zapp-music-challenge-2015-zapp-music-challenge-2015 https://hetklokhuis.nl/dossier/142/zoek-het-uit/tv-uitzending/2987/aliens https://www.vpro.nl/programmas/tegenlicht/kijk/afleveringen/2015-2016/offline-als-luxe.html https://anderetijden.nl/programma/1/Andere-Tijden/aflevering/676/Duitse-soldaten-over-de-Slag-bij-Arnhem

but I can't get the test framework to pass. That gives me the error:

youtube_dl.utils.DownloadError: ERROR: requested format not available

So even with the failing tests I think this is an improvement over the current master, which no longer successfully downloads any NPO video. If anyone knows what it should say in ext, I can change it to the correct value.

@bartbroere bartbroere requested a review from dirkf October 20, 2024 12:46
@dirkf dirkf mentioned this pull request Dec 23, 2024
5 tasks
@victorhartman
Copy link

Hey, this is amazing work! I wanted to test it out, but it no longer seems to work unfortunately. Issue with getting the token.

py.exe -m youtube_dl https://www.bnnvara.nl/videos/27455
[bnnvara] 27455: Downloading JSON metadata
[bnnvara] VARA_101369808: Downloading token
ERROR: An extractor error has occurred. (caused by KeyError('token'));`

@bartbroere
Copy link
Author

Hey, this is amazing work! I wanted to test it out, but it no longer seems to work unfortunately. Issue with getting the token.

py.exe -m youtube_dl https://www.bnnvara.nl/videos/27455
[bnnvara] 27455: Downloading JSON metadata
[bnnvara] VARA_101369808: Downloading token
ERROR: An extractor error has occurred. (caused by KeyError('token'));`

Thanks for reporting this! I'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants