Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to debug feeds that throw an error? #80

Closed
kylealwyn opened this issue Jan 12, 2023 · 8 comments
Closed

How to debug feeds that throw an error? #80

kylealwyn opened this issue Jan 12, 2023 · 8 comments

Comments

@kylealwyn
Copy link

kylealwyn commented Jan 12, 2023

Trying to pull something like https://www.nature.com/nature.rss - getting an error both locally and in demo. Ran the address through the w3c validator and came up valid.

Somewhat related, I'm also trying to use a proxy but to no avail as http://[email protected]:8887 is throwing Invalid URL

@kylealwyn
Copy link
Author

Curious if something similar to extractus/article-extractor#326 is viable for this library - it'd be great to fetch the xml on my own and provide that to this parser

@kylealwyn
Copy link
Author

Sorry last thing but I think the type for headers in FetchOptions is incorrect, believe it should be something like Record<string, string>:

export interface FetchOptions {
  /**
   * list of request headers
   * default: null
   */
  headers?: string[];
  /**
   * the values to configure proxy
   * default: null
   */
  proxy?: ProxyConfig;
}

@ndaidong
Copy link
Collaborator

@kylealwyn same idea, this lib should have that method too.

@ndaidong
Copy link
Collaborator

@kylealwyn https://www.nature.com/nature.rss uses RDF, It's been a long time since I've seen this format!

ndaidong added a commit that referenced this issue Jan 12, 2023
- Replace `read()` by `extract()`
- Add new methods: `extractFromJson()` & `extractFromXml()`
- Change coding convention (remove standardjs)
- Update dependencies

Related issues: #80
@ndaidong ndaidong mentioned this issue Jan 12, 2023
ndaidong added a commit that referenced this issue Jan 12, 2023
- Fix fetch interface (#80)
@ndaidong
Copy link
Collaborator

@kylealwyn v6.2.1 has just been released with 2 new methods for extracting feed data from XML or JSON string. That mays resolve your case.

Regarding https://www.nature.com/nature.rss, we have not plan to support RDF format for right now, because this format is quite rarely used.

Somewhat related, I'm also trying to use a proxy but to no avail as http://[email protected]:8887 is throwing Invalid URL

Could you share more info about your code here? This lib does not modify or verify proxy url. it simply prefers to pick the url from the proxy if that presents.

@kylealwyn
Copy link
Author

Awesome! Will check it out. Would be great to expose the utils to validate whether xml or json feed, or have a unified entrypoint that runs the validation & normalization, but I will copy those over for now!

Regarding https://www.nature.com/nature.rss, we have not plan to support RDF format for right now, because this format is quite rarely used.

Makes sense

Could you share more info about your code here? This lib does not modify or verify proxy url. it simply prefers to pick the url from the proxy if that presents.

I'm doing something like

const res = await read(
  feed.xmlUrl,
  {},
  {
    proxy: {
      target: 'http://127.0.0.1:3001',
    },
  },
);

Where the target is the url initially shared, or any ip/port combination, and getting back an Invalid URL error.

@kylealwyn
Copy link
Author

Also, what would be the lift on supporting RDF feeds? https://rss.slashdot.org/Slashdot/slashdotMain is another big one I'm interested in. Seeing the format quite a bit through my explorations.

@ndaidong
Copy link
Collaborator

@kylealwyn thank you, RDF can reuse almost logic from RSS parser. I will try to implement a draft.

ndaidong added a commit that referenced this issue Aug 24, 2023
To resolve issue #80
@ndaidong ndaidong mentioned this issue Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants