Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feed.xml link field parsing error #111

Closed
jc955 opened this issue Sep 12, 2023 · 4 comments
Closed

feed.xml link field parsing error #111

jc955 opened this issue Sep 12, 2023 · 4 comments

Comments

@jc955
Copy link

jc955 commented Sep 12, 2023

https://stephango.com/feed.xml

link field parsing error

image
image

ndaidong added a commit that referenced this issue Sep 12, 2023
- Update dependencies
- Fix issue #111
@ndaidong ndaidong mentioned this issue Sep 12, 2023
@jc955
Copy link
Author

jc955 commented Sep 12, 2023

@neizod
Hi!
First and foremost, thanks for your work!
Ask another question.

https://www.historyinmemes.com/feed

This feed link actually has body content, but the feed extractor only displays a thumbnail text. How can this format be displayed in its entirety? Can you help me?

image

image

@ndaidong
Copy link
Collaborator

@0x1017 you can use parserOptions parameter to customize the output.

For example, if you turn off normalization, you can get the raw result, with full description:

  const feed = await extract('https://www.historyinmemes.com/feed', {
    normalization: false,
  })
  console.log(feed)

If you still want to normalize feed data, let's use getExtraEntryFields to modify only description as below:

  const feed = await extract(url, {
    getExtraEntryFields: (feedEntry) => {
      const { description } = feedEntry
      // you can do anything with the description here
      return {
        description,
      }
    },
  })
  console.log(feed)

@jc955 jc955 closed this as completed Sep 14, 2023
@LavaCxx
Copy link

LavaCxx commented Apr 7, 2024

Even though I turned off normalization, the parsing of links still changed.
image

@LavaCxx
Copy link

LavaCxx commented Apr 7, 2024

After some research, I found that the problem is in the fast-xml-parser. Maybe the previous xml didn't support the & character, so this problem wasn't considered?
Anyway, if anyone else encounters this problem in the future, they can refer to this code.

await extract(url, {
    normalization: false, xmlParserOptions: {
        tagValueProcessor: (tagName, tagValue) => {
            if (tagName === 'link') return tagValue.replace(/&/g, '&')
            return tagValue
        }
    }
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants