Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonical Stats #13

Closed
caius opened this issue Dec 1, 2022 · 5 comments · Fixed by #30
Closed

Canonical Stats #13

caius opened this issue Dec 1, 2022 · 5 comments · Fixed by #30

Comments

@caius
Copy link

caius commented Dec 1, 2022

I'm not actually sure if this is an issue, but it surprised me so I thought I'd query it.

I've defined go/cu-devops to take me to our DevOps work board (think Trello), and that works fine. Shows up in the stats as go/cu-devops.

I then remembered the logic strips - so I could visit go/cudevops and did so. This shows up as a separate entry in the stats now, although clicking the (i) on either go/cu-devops or go/cudevops shows the go/cu-devops Link to be edited.

Should we coalesce the stats entries to the Link value, so missing out hypens doesn't duplicate entries for the same Link?

@caius
Copy link
Author

caius commented Dec 1, 2022

From looking at the code the database lookup in db.linkID strips - but the in-memory stats just stores whatever has been cut from the URL Path without any processing. Interestingly when the stats are flushed to the database they appear to strip - correctly, both entries in the database have cudevops stored. Looks like db.SaveStats uses db.linkID internally.

Given that I restarted the service to see how it would handle loading back from the database, and now the stats show go/cudevops 2. Visiting go/cu-devops adds the go/cu-devops entry back into the stats table though.

I'd like to unify this behaviour so either the in-memory stats store strips - to match the database, or the Stats database table stores the short name including - so after a service reload the stats look identical.

My feeling is strip - given that's how we're storing the Link records, does anyone have any strong feeling on either of these as a solution?

(I'd also like to add this is a pretty useful tool. It's already replaced most of my bookmarks bar. Thanks for releasing it.)

@willnorris
Copy link
Member

The intent was certainly that we store stats based on the canonical link name. Sounds like that got screwed up. Thanks for pointing it out. I'll take a closer look today.

@willnorris
Copy link
Member

I can't seem to reproduce this, and I think there might be something missing... did you rename the link on the detail page?

The sqlite schema has two different fields for the link name: ID and Short. Short is the name exactly as the user entered (the "preferred rendering" of the name), while ID is the normalized version (all lowercase, hyphens removed, etc).

When resolving a link, we normalize whatever the user specified, then do a lookup by that normalized ID. However, we when store stats, we always use the current Short value (regardless of what the user entered). So you should be able to visit go/cu-devops, go/c-u-d-e-v-o-p-s, and go/cudevops, and all of those should increment the same entry in the stats table.

The one thing we don't handle is coalescing stats when the preferred name for a link is changed. Once you change the preferred name, all future ClickStats will use that new name, but we don't go back and change old stats with the old name. This was an intentional design choice to keep the data model simple, and because we didn't expect that the preferred name for a link was likely to change often.

Does that track with the behavior you're seeing?

@caius
Copy link
Author

caius commented Dec 13, 2022

I don't recall renaming the link, just initially created it as cu-devops. The entry in the links table currently has cu-devops for the Short and cudevops as ID. (The Created and LastEdit timestamps are identical too, suggesting it's not been updated.)

I've just tried reproducing this locally with the latest docker image and it's functioning as expected. The instance I saw the issue with is running 76c67c0 according to podman, but I don't see any relevant changes between there and current main.

The entries in the stats table are all coalesced to the same value cudevops, but the in-memory cache of stats is storing both cu-devops and cudevops. Interestingly visiting go/cu-dev-ops increments the cudevops entry though. Once the container is restarted (so stats are picked up from the Stats table) it shows just cu-devops with all the entries rolled up.

Just tried upgrading the container with the issue to latest main (5fefe25) and it still shows the same behaviour there.

@willnorris
Copy link
Member

ah, got it! That makes sense and now I see the problem...

When a golink is visited, we directly store the click in the in-memory stats map, and also queue it up to be save to the database later. When saving to the in-memory map, we store the canonical short name. When it eventually gets stored in the database, we store the normalized id. So on startup, when we load historical stats from the database, those all have the normalized IDs, and then subsequent clicks use the short name again.

My intent was always to use the preferred canonical short name in the stats, so I think we just need to map the IDs to short names on cold startup.

willnorris added a commit that referenced this issue Dec 13, 2022
When loading stats from the database, map IDs back to their canonical
short name, which is what we want to show in the frontend.  This is only
called once on cold start, so performance of loading all links isn't a
big concern.

Fixes #13
willnorris added a commit that referenced this issue Dec 13, 2022
When loading stats from the database, map IDs back to their canonical
short name, which is what we want to show in the frontend.  This is only
called once on cold start, so performance of loading all links isn't a
big concern.

Fixes #13

Signed-off-by: Will Norris <[email protected]>
willnorris added a commit that referenced this issue Dec 21, 2022
When loading stats from the database, map IDs back to their canonical
short name, which is what we want to show in the frontend.  This is only
called once on cold start, so performance of loading all links isn't a
big concern.

Fixes #13

Signed-off-by: Will Norris <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants