Skip to content

Commit

Permalink
fix: handle different body fields (#100)
Browse files Browse the repository at this point in the history
This PR clarifies the concept for "body" used in code and config.

Fixes #95 and
#96.

## Motivation

Previously, I name a generic field in the code "description" to
distinguish it from the title. For rss format it refers to the
[`description`
field](https://github.com/shouya/rss-funnel/blob/dc1efac19a96e06143b75e9495adb3f6b013a75f/src/feed.rs#L348)
and for atom it refers to the [`content`
field](https://github.com/shouya/rss-funnel/blob/dc1efac19a96e06143b75e9495adb3f6b013a75f/src/feed.rs#L368).
The choice of the name and the selected fields are purely arbitrary
based on the few example feeds I had in hand. Overall, it is supposed be
the field that ultimately get displayed in rss feeder beneath the title.

In this PR I renamed the general term to "body". Unlike the old notion,
a post can have multiple `body` fields. We need this if we want to
handle all types of different fields that considered as body in the RSS
reader. For example, if we consider all the body fields, then we can
correctly filter posts matching certain keyword using the `keep_only`
and `discard` filter (#95).

In addition, some feeds do not use the typical body fields. On example
is YouTube, who puts the video description in the `media:description`
field under the `media:group` tag
(#92). And we hope to support
filtering on this field as well.

## Implementation

First, I removed the single-field accessor for `Post.description` field.

Then I provided various APIs for accessing the bodies:

  + `Post.bodies_mut`
  + `Post.bodies`
  + `Post.modify_bodies`
  + `Post.first_body`
  + `Post.first_body_mut`
  + `Post.create_body`
  + `Post.ensure_body`

The following fields are considered as body fields:

- rss
  + `content`
  + `description`
  + `media:description`
  + `itunes:summary`
- atom
  + `content`
  + `summary`
  + `media:description`

## Config changes

- Rename the `content` variant to `body` of the `field` field for
`keep_only`/`discard` filter.
- Rename the `description_selector` field to `body_selector` for the
`extract` filter.

Both changes are backward compatible. The old fields are currently
marked deprecated, and may be removed in a future breaking release.

## Checklist

- [ ] update filter docs
- [x] review all usage of the term "description" in code
  • Loading branch information
shouya authored Mar 27, 2024
1 parent dc1efac commit 2cf74d3
Show file tree
Hide file tree
Showing 8 changed files with 317 additions and 133 deletions.
151 changes: 118 additions & 33 deletions src/feed.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
mod conversion;
mod extension;

use chrono::DateTime;
use paste::paste;
Expand All @@ -13,6 +14,8 @@ use crate::source::FromScratch;
use crate::util::Error;
use crate::util::Result;

use extension::ExtensionExt;

#[derive(Serialize, Deserialize, Clone, Debug, PartialEq)]
#[serde(untagged)]
pub enum Feed {
Expand Down Expand Up @@ -288,7 +291,6 @@ pub enum Post {
enum PostField {
Title,
Link,
Description,
Author,
Guid,
}
Expand All @@ -315,14 +317,128 @@ impl Post {
Post::Atom(item) => Some(item.updated),
}
}

// the order should match the actual display order in rss
// readers. This allows ensure_body to return the body field that is
// most likely to affect the actual appearance.
#[allow(clippy::option_map_unit_fn)]
pub fn bodies_mut(&mut self) -> Vec<&mut String> {
let mut bodies = Vec::new();
match self {
Post::Rss(item) => {
item.content.as_mut().map(|v| bodies.push(v));
item.description.as_mut().map(|v| bodies.push(v));
item
.extensions
.tags_mut_with_names(&["media:description"])
.into_iter()
.filter_map(|tag| tag.value.as_mut())
.for_each(|v| bodies.push(v));
item
.itunes_ext
.as_mut()
.and_then(|ext| ext.summary.as_mut())
.map(|v| bodies.push(v));
}
Post::Atom(item) => {
item
.content
.as_mut()
.and_then(|c| c.value.as_mut())
.map(|v| bodies.push(v));
item.summary.as_mut().map(|s| bodies.push(&mut s.value));
item
.extensions
.tags_mut_with_names(&["media:description"])
.into_iter()
.filter_map(|tag| tag.value.as_mut())
.for_each(|v| bodies.push(v));
}
}
bodies
}

// Please make sure this function matches the order of bodies_mut
#[allow(clippy::option_map_unit_fn)]
pub fn bodies(&self) -> Vec<&str> {
let mut bodies = Vec::new();
match self {
Post::Rss(item) => {
item.content.as_deref().map(|v| bodies.push(v));
item.description.as_deref().map(|v| bodies.push(v));
item
.extensions
.tags_with_names(&["media:description"])
.into_iter()
.filter_map(|tag| tag.value.as_deref())
.for_each(|v| bodies.push(v));
item
.itunes_ext
.as_ref()
.and_then(|ext| ext.summary.as_deref())
.map(|v| bodies.push(v));
}
Post::Atom(item) => {
item
.content
.as_ref()
.and_then(|c| c.value.as_deref())
.map(|v| bodies.push(v));
item.summary.as_ref().map(|s| bodies.push(&s.value));
item
.extensions
.tags_with_names(&["media:description"])
.into_iter()
.filter_map(|tag| tag.value.as_deref())
.for_each(|v| bodies.push(v));
}
}
bodies
}

pub fn modify_bodies(&mut self, mut f: impl FnMut(&mut String)) {
for body in self.bodies_mut() {
f(body);
}
}

pub fn first_body(&self) -> Option<&str> {
self.bodies().into_iter().next()
}

pub fn first_body_mut(&mut self) -> Option<&mut String> {
self.bodies_mut().into_iter().next()
}

pub fn create_body(&mut self) -> &mut String {
match self {
Post::Rss(item) => {
item.description = Some(String::new());
item.description.as_mut().unwrap()
}
Post::Atom(item) => {
item.summary = Some(atom_syndication::Text::html(String::new()));
&mut item.summary.as_mut().unwrap().value
}
}
}

pub fn ensure_body(&mut self) -> &mut String {
let needs_body = self.first_body_mut().is_none();

if needs_body {
self.create_body()
} else {
self.first_body_mut().unwrap()
}
}
}

impl Post {
fn get_field(&self, field: PostField) -> Option<&str> {
match (self, field) {
(Post::Rss(item), PostField::Title) => item.title.as_deref(),
(Post::Rss(item), PostField::Link) => item.link.as_deref(),
(Post::Rss(item), PostField::Description) => item.description.as_deref(),
(Post::Rss(item), PostField::Author) => item.author.as_deref(),
(Post::Rss(item), PostField::Guid) => {
item.guid.as_ref().map(|v| v.value.as_str())
Expand All @@ -331,9 +447,6 @@ impl Post {
(Post::Atom(item), PostField::Link) => {
item.links.first().map(|v| v.href.as_str())
}
(Post::Atom(item), PostField::Description) => {
item.content.as_ref().and_then(|c| c.value.as_deref())
}
(Post::Atom(item), PostField::Author) => {
item.authors.first().map(|v| v.name.as_str())
}
Expand All @@ -345,9 +458,6 @@ impl Post {
match (self, field) {
(Post::Rss(item), PostField::Title) => item.title = Some(value.into()),
(Post::Rss(item), PostField::Link) => item.link = Some(value.into()),
(Post::Rss(item), PostField::Description) => {
item.description = Some(value.into())
}
(Post::Rss(item), PostField::Author) => item.author = Some(value.into()),
(Post::Rss(item), PostField::Guid) => {
item.guid = Some(rss::Guid {
Expand All @@ -365,13 +475,6 @@ impl Post {
});
}
},
(Post::Atom(item), PostField::Description) => {
item.content = Some(atom_syndication::Content {
value: Some(value.into()),
content_type: Some("html".to_string()),
..Default::default()
})
}
(Post::Atom(item), PostField::Author) => match item.authors.get_mut(0) {
Some(author) => author.name = value.into(),
None => {
Expand All @@ -389,7 +492,6 @@ impl Post {
match (self, field) {
(Post::Rss(item), PostField::Title) => item.title.as_mut(),
(Post::Rss(item), PostField::Link) => item.link.as_mut(),
(Post::Rss(item), PostField::Description) => item.description.as_mut(),
(Post::Rss(item), PostField::Author) => item.author.as_mut(),
(Post::Rss(item), PostField::Guid) => {
item.guid.as_mut().map(|v| &mut v.value)
Expand All @@ -398,9 +500,6 @@ impl Post {
(Post::Atom(item), PostField::Link) => {
item.links.get_mut(0).map(|v| &mut v.href)
}
(Post::Atom(item), PostField::Description) => {
item.content.as_mut().and_then(|c| c.value.as_mut())
}
(Post::Atom(item), PostField::Author) => {
item.authors.get_mut(0).map(|v| &mut v.name)
}
Expand All @@ -416,9 +515,6 @@ impl Post {
(Post::Rss(item), PostField::Link) => {
item.link.get_or_insert_with(String::new)
}
(Post::Rss(item), PostField::Description) => {
item.description.get_or_insert_with(String::new)
}
(Post::Rss(item), PostField::Author) => {
item.author.get_or_insert_with(String::new)
}
Expand All @@ -442,16 +538,6 @@ impl Post {
)
.href
}
(Post::Atom(item), PostField::Description) => item
.content
.get_or_insert_with(|| atom_syndication::Content {
value: Some(String::new()),
content_type: Some("html".to_string()),
..Default::default()
})
.value
.as_mut()
.unwrap(),
(Post::Atom(item), PostField::Author) => {
&mut vec_first_or_insert(
&mut item.authors,
Expand Down Expand Up @@ -508,7 +594,6 @@ macro_rules! impl_post_accessors {
impl_post_accessors! {
title => Title;
link => Link;
description => Description;
author => Author;
guid => Guid
}
Expand Down
100 changes: 100 additions & 0 deletions src/feed/extension.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
use std::collections::BTreeMap;

pub struct TagRef<'a> {
pub name: &'a String,
pub attrs: &'a BTreeMap<String, String>,
pub value: &'a Option<String>,
}

pub struct TagRefMut<'a> {
pub name: &'a mut String,
pub attrs: &'a mut BTreeMap<String, String>,
pub value: &'a mut Option<String>,
}

pub trait ExtensionExt {
fn tags(&self) -> Vec<TagRef>;
fn tags_mut(&mut self) -> Vec<TagRefMut>;

fn tags_mut_with_names(&mut self, names: &[&str]) -> Vec<TagRefMut> {
self
.tags_mut()
.into_iter()
.filter(|tag| names.contains(&tag.name.as_str()))
.collect()
}

fn tags_with_names(&self, names: &[&str]) -> Vec<TagRef> {
self
.tags()
.into_iter()
.filter(|tag| names.contains(&tag.name.as_str()))
.collect()
}
}

macro_rules! impl_extension_ext {
($ty:ty) => {
impl ExtensionExt for $ty {
fn tags(&self) -> Vec<TagRef> {
let tag = TagRef {
name: &self.name,
attrs: &self.attrs,
value: &self.value,
};

let mut tags = vec![tag];
for children in self.children.values() {
tags.extend(children.iter().flat_map(|ext| ext.tags()));
}
tags
}

fn tags_mut(&mut self) -> Vec<TagRefMut> {
let tag = TagRefMut {
name: &mut self.name,
attrs: &mut self.attrs,
value: &mut self.value,
};

let mut tags = vec![tag];
for children in self.children.values_mut() {
tags.extend(children.iter_mut().flat_map(|ext| ext.tags_mut()));
}
tags
}
}
};
}

// These two structs has exactly the same structure but are different
// types since they belong to different crates.
impl_extension_ext!(atom_syndication::extension::Extension);
impl_extension_ext!(rss::extension::Extension);

impl<T> ExtensionExt for BTreeMap<String, BTreeMap<String, Vec<T>>>
where
T: ExtensionExt,
{
fn tags(&self) -> Vec<TagRef> {
self
.values()
.flat_map(|children| {
children
.values()
.flat_map(|exts| exts.iter().flat_map(|ext| ext.tags()))
})
.collect()
}

fn tags_mut(&mut self) -> Vec<TagRefMut> {
self
.values_mut()
.flat_map(|children| {
children
.values_mut()
.flat_map(|exts| exts.iter_mut().flat_map(|ext| ext.tags_mut()))
})
.collect()
}
}
Loading

0 comments on commit 2cf74d3

Please sign in to comment.