Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Discard/Keep_Only Filter Not Matching Inside Non-Standard Fields #92

Closed
tillcash opened this issue Mar 16, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@tillcash
Copy link

Currently, there is an issue with the discard/keep_only filter not matching inside non-standard fields, specifically the media:description tag within the media namespace.

@shouya shouya added the bug Something isn't working label Mar 16, 2024
shouya added a commit that referenced this issue Mar 27, 2024
This PR clarifies the concept for "body" used in code and config.

Fixes #95 and
#96.

## Motivation

Previously, I name a generic field in the code "description" to
distinguish it from the title. For rss format it refers to the
[`description`
field](https://github.com/shouya/rss-funnel/blob/dc1efac19a96e06143b75e9495adb3f6b013a75f/src/feed.rs#L348)
and for atom it refers to the [`content`
field](https://github.com/shouya/rss-funnel/blob/dc1efac19a96e06143b75e9495adb3f6b013a75f/src/feed.rs#L368).
The choice of the name and the selected fields are purely arbitrary
based on the few example feeds I had in hand. Overall, it is supposed be
the field that ultimately get displayed in rss feeder beneath the title.

In this PR I renamed the general term to "body". Unlike the old notion,
a post can have multiple `body` fields. We need this if we want to
handle all types of different fields that considered as body in the RSS
reader. For example, if we consider all the body fields, then we can
correctly filter posts matching certain keyword using the `keep_only`
and `discard` filter (#95).

In addition, some feeds do not use the typical body fields. On example
is YouTube, who puts the video description in the `media:description`
field under the `media:group` tag
(#92). And we hope to support
filtering on this field as well.

## Implementation

First, I removed the single-field accessor for `Post.description` field.

Then I provided various APIs for accessing the bodies:

  + `Post.bodies_mut`
  + `Post.bodies`
  + `Post.modify_bodies`
  + `Post.first_body`
  + `Post.first_body_mut`
  + `Post.create_body`
  + `Post.ensure_body`

The following fields are considered as body fields:

- rss
  + `content`
  + `description`
  + `media:description`
  + `itunes:summary`
- atom
  + `content`
  + `summary`
  + `media:description`

## Config changes

- Rename the `content` variant to `body` of the `field` field for
`keep_only`/`discard` filter.
- Rename the `description_selector` field to `body_selector` for the
`extract` filter.

Both changes are backward compatible. The old fields are currently
marked deprecated, and may be removed in a future breaking release.

## Checklist

- [ ] update filter docs
- [x] review all usage of the term "description" in code
@shouya
Copy link
Owner

shouya commented Mar 31, 2024

Fixed in #100. Now your original filter should work without having to assign the description field with content's value.

You can try out the nightly image (https://github.com/shouya/rss-funnel/pkgs/container/rss-funnel).

@shouya shouya closed this as completed Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants