Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scraping multiple URLs #5677

Merged
merged 4 commits into from
Feb 25, 2025
Merged

Conversation

WithoutPants
Copy link
Collaborator

The mapped scraper functionality is in sore need of a rework. #5294 has highlighted this with the new URLs field. For now, I've hacked in a solution that should correctly convert URLs to a list.

Notably, this does not work where URLs is used in a sub-object scenario. For example, it will populate only a single URL in the URLs field of performers within a scene scrape. This is because there is currently no way to determine which performer a given URL value belongs to. Therefore, it follows existing convention and assigns one URL for each performer result. This means that it is not possible to have multiple URLs for any performer within a scene scrape.

Resolves #5294

For testing, I used the following performer scraper for LinkTree:

name: LinkTree
performerByURL:
  - action: scrapeXPath
    url:
      - linktr.ee
    scraper: performerScraper

xPathScrapers:
  performerScraper:
    performer:
      Name: //div[@id='profile-title']/h1/text()
      URLs:
        selector: //div[@id='links-container']//a/@href

@WithoutPants WithoutPants added the bug Something isn't working label Feb 24, 2025
@WithoutPants WithoutPants added this to the Version 0.28.0 milestone Feb 24, 2025
@DogmaDragon
Copy link
Collaborator

Tested with several existing community scrapers (IAFD, The Nude, Babepedia, MFC) and some custom scrapers.
Tested across different actions (script, scrapeJson, scrapeXPath).

Everything looks to work in my testing so far.

If URLs is used, URL, Twitter and Instagram are ignored while URLs is processed correctly. Which maintains compatibility for older scrapers.

@WithoutPants WithoutPants merged commit 1e05766 into stashapp:develop Feb 25, 2025
2 checks passed
feederbox826 added a commit to stashapp/CommunityScrapers that referenced this pull request Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug Report] XPath scraper is missing string array support
2 participants