Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community post tags (part 1) #4997

Merged
merged 17 commits into from
Dec 18, 2024
Merged

Community post tags (part 1) #4997

merged 17 commits into from
Dec 18, 2024

Conversation

phiresky
Copy link
Collaborator

@phiresky phiresky commented Sep 2, 2024

This is the first part of an implementation of post tags based on LemmyNet/rfcs#4.

It implements the following:

  • The necessary DB tables
  • Fetching the post tags while reading PostView (list and single)
  • Tests for the above

What is missing:

  • New API methods (for fetching tags of a community (public) and for updating post tags (creator+mods)
  • Updated post create method (for adding tags directly when posting
  • Filtering by post tag in post view (I would consider this out of scope for now)
  • Federation of tags and tag updates
  • The frontend (select tags on post creation+edit and show post tags in post listings)

I would like to propose that we work on and merge this (and potentially other larger changes) in multiple parts, as soon as each part is ready and ofc doesn't negatively affect the existing functionality. That way the branches don't live for months and diverge more and more and each chunk stays simple and can be reviewed quickly.

There's some somewhat open questions wrt this code:

  • Table naming: Currently I name the table that contains the data community_post_tag (id, apub_id, name, url) so it's clear this is a tag that belongs to one community and tags posts in that community. This way we can later more tag types that may have different meanings and especially different properties and access control. For example global_post_tag for tags that are the same / interpretable globally (like NSFW). The n:m association table is called post_community_post_tag (post_id, tag_id).
  • The way the tags of a post are fetched. See the comment in the post_view code

@phiresky
Copy link
Collaborator Author

phiresky commented Dec 2, 2024

I've updated the implementation based on the comments:

  1. split table community_post_tag into tag and community_post_tag (everything apart from community_id is now in the tag table)
  2. renamed structs and tables to match
  3. changed post_view tests to use test_context, because if one test fails the cleanup function was not being called which makes all the subsequent tests fail even if they are correct

the tag table should thus work for creating other types of tags later. i don't think it's perfectly optimal, because the tag table itself might become overloaded with optionals or hard to manage at some point. but it's probably a fine start.

@Nutomic @dessalines please let me know what you think

pub ap_id: DbUrl,
pub name: String,
// default now
pub published: Option<DateTime<Utc>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it has a default you can leave out the field here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at a few of our other forms, PostInsertForm, CommentInsertForm, and they all have it (probably for activitypub reasons, ie overriding the published date based on when it was created in the first database, not when it got federated)

}

#[test_context(Data)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good!

pub ap_id: DbUrl,
pub name: String,
// default now
pub published: Option<DateTime<Utc>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at a few of our other forms, PostInsertForm, CommentInsertForm, and they all have it (probably for activitypub reasons, ie overriding the published date based on when it was created in the first database, not when it got federated)

@phiresky
Copy link
Collaborator Author

phiresky commented Dec 8, 2024

Let me summarize my interpretation of everyone's position on the DB design:

  • @Nutomic thinks we should make it specific to community post tags. That is, a community_post_tag table that stores the info of each post tag. If we add more tag types in the future, we would either migrate the table to be generic or add additional tables depending on requirements. This is the same as the RFC describes and what my initial implementation looked like (diff from main branch up till then)
  • @dessalines thinks we should store the main info of each tag in a generic tags table, and store the community-post-tag specific info (really just the community_id right now) in a separate table. This is pretty much what the branch currently does.
  • @dullbananas thinks we should use a generic tags table, but store the community-post-tag specific info also in that table (as a optional community_id references(..) NULL). I'd expect a new column tag_type ENUM to be added in the future.

Personally, I see arguments for all options and don't have a strong opinion. I also don't think it's actually that important, because IF/when we add more tag types, migrating to what will only then be revealed as the best idea won't be too difficult.

So I really just need something that everyone will approve before I make further changes, because I don't want to keep ping-ponging around.

@dessalines
Copy link
Member

dessalines commented Dec 10, 2024

Thx for summarizing. I def think we should keep a generic tag table, and then we have two options as you outlined above.

  1. Create dependent tag tables that reference the tag table, and are filled with subsets of allowed tags for that item and type.
  2. Add optional columns like community_id to the tag table, and probably a tag_type ENUM.

I personally think option 1) is a little better, as we can use the DB to enforce what tags are allowed for that item (and for example, a community_post_tag could have a required community_id column. But it definitely is more complicated, because then you'll have to reference the dependent tag table ids, then join your way back to the tag table. I don't think its too difficult to do that, and you get the benefit of DB-enforced subsets.

Option 2 is definitely easier, because everything only needs to reference the tag table directly. You'll have to enforce the allowed tag in code, to make sure you used the correct tag type. IE, make sure you didn't tag a post with a tag that's supposed to only be for a person or community.

I'm good with either one. They both work and option 2 is probably simpler, so if you want to go with that, that works for me too.

@Nutomic
Copy link
Member

Nutomic commented Dec 12, 2024

I really fail to understand what would be the purpose of the generic tag table. Is it so that you can store metadata on the tag, like nsfw or display_color? That doesnt seem to work because the attributes might be different in each community. Or is it to allow filtering posts, comments and users with the same tag.id parameter? That might work within a single instance, but once you add federation then identically named tags will require separate db rows anyway, because they have different ap_ids (eg lemmy.ml/tag/nsfw and lemm.ee/tag/nsfw).

To me this is really similar to the db structure for votes. Sure we could have a single vote table with vote_type ('post', 'comment') and then tables post_vote(vote_id, post_id), comment_vote(vote_id, comment_id). Its not much more complicated than now, but it wouldnt actually improve anything.

@phiresky
Copy link
Collaborator Author

phiresky commented Dec 12, 2024

I really fail to understand what would be the purpose of the generic tag table

I think the difficulty is that we don't know yet, and we can't without already sketching out the design of a second tag type, which we don't want to do. So it can go either way later, either we realize we should have made it more generic, or that we should have made it more specific.

One thing I like that I saw after changing the code, is that I think it makes a lot of sense to have the field in PostView be called tags and not community_post_tags - since generally frontends probably want to and can handle tags mostly generically. Think of for example tags federated from other software, this way those can be displayed similarily without major code changes, even if lemmy does not have specific logic for them.

My suggestion would be to go for what dullbananas wrote / suggestion (2) from dess above:

A table called tags, with a community_id column. For now this table will be specific to community post tags (since that's the only tag type). When/if we add more tag types, we will decide then (depending on the newly discovered requirements) if we want to rename the table to be specific and add new ones, or keep the table generic and add columns.

Please let me know @Nutomic if you'll approve that because the other two have already said so.

but once you add federation then identically named tags will require separate db rows anyway

Just a note, the authority that manages the metadata of a Tag object is not necessarily the same as the authority that manages the object-tag relationship. An NSFW tag for example could be a lemmy-global tag with a global description/meaning, but associating it to posts can be managed by individual communities.

Is it so that you can store metadata on the tag

Yes, adding tag-specific fields to that table makes no sense. But metadata on a tag would be things like "creation date", "human readable description of the tag", "author of the tag", "category of the tag (see hierarchical tags discussion)", "federation audience", etc.

@phiresky
Copy link
Collaborator Author

phiresky commented Dec 12, 2024

Really though, ignore most of my message, ideally I want you to just say yes. We are spending a lot of time on this and it's not really being productive. Your concerns are not wrong, just no one has the real answers and we can not have them at this stage. At this point we are wasting more time on this than if we simply chose a random solution and then refactored it later in the 20% chance that a different option is significantly better. And my motivation to work on this is fading because of the circularity.

@Nutomic
Copy link
Member

Nutomic commented Dec 12, 2024

One thing I like that I saw after changing the code, is that I think it makes a lot of sense to have the field in PostView be called tags and not community_post_tags - since generally frontends probably want to and can handle tags mostly generically. Think of for example tags federated from other software, this way those can be displayed similarily without major code changes, even if lemmy does not have specific logic for them.

Thats fine, Im not concerned about the naming of specific api fields.

A table called tags, with a community_id column. For now this table will be specific to community post tags (since that's the only tag type). When/if we add more tag types, we will decide then (depending on the newly discovered requirements) if we want to rename the table to be specific and add new ones, or keep the table generic and add columns.

Thats close to what Im saying too. But as we dont have any other tag types for now, the community_id column should be not null. Later we can change it to be nullable if that makes sense, and/or add some tag_type enum. Discussing those things is really premature and irrelevant now.

@phiresky
Copy link
Collaborator Author

Sure, sounds fine. Then I'll do it as described above, and community_id will not be nullable (for now).

@phiresky
Copy link
Collaborator Author

I've made the changes and fixed the comments. Please re-review and we can merge.

Copy link
Member

@Nutomic Nutomic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. Im a bit sceptical about merging a partial PR like this which doesnt require all the needed changes (api/federation), but we can give it a try. Anyway the main branch is unstable now.

@phiresky phiresky merged commit a2a5cb0 into main Dec 18, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants