Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort events and properties by volume first, A-Z second #7426

Merged
merged 6 commits into from
Dec 1, 2021

Conversation

mariusandra
Copy link
Collaborator

Changes

How did you test this code?

  • Locally in the browser

Copy link
Contributor

@paolodamico paolodamico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! fixed merge conflicts and moved to query volume as discussed on Slack. Only thing I'm not 100% sure is that creating this indices won't stall instances with a ton of events/properties (e.g. this customer)

GinIndex(name="index_event_definition_name", fields=["name"], opclasses=["gin_trgm_ops"]),
] # To speed up DB-based fuzzy searching
# Improve ordering performance
models.Index(fields=["team_id", "-volume_30_day"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how indexing works with the -? Do you know any good literature on this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should just add desc in the index as well, and should actually make no difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres-level does that just mean better performance because we would order desc too? What's the impact of not having the desc in there?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot a link: https://www.postgresql.org/docs/14/indexes-ordering.html

I copied this from a different model in the system, and left the - in place since we mostly do order DESC here. If I read the doc correctly, there shouldn't be any real difference between the two. This will be truly useful if we'd sort by multiple columns. We don't here, and postgres can read indexes backwards, so it's probably better to skip the - here for clarity.

At least I should run a few tests on it to make sure it works well, but now it's gone, so :).

@mariusandra
Copy link
Collaborator Author

You are right there. I'll remove the migration.

These indices (indexes?) are just for performance. Users with a sensible number of event names probably won't notice much difference, and the user you linked will certainly experience problems. Currently this user can't even use the event stats page because of the sheer number of events, because we still need to load all of them into memory before displaying anything 🤯

2021-11-30 20 51 26

Perhaps this should be one of the first uses for the "Special Migrations API"? @yakkomajuri

@mariusandra
Copy link
Collaborator Author

To add, even without the indexes we can't merge this, as with a few dozen million event names, the taxonomic filter will get really slow. So we're blocked.

@mariusandra mariusandra marked this pull request as draft November 30, 2021 19:59
@paolodamico
Copy link
Contributor

Well we already didn't have an index for the event name yet we already ordered based on that, will performance be that bad with these changes?

@mariusandra
Copy link
Collaborator Author

mariusandra commented Nov 30, 2021

We have an index for the event name:

GinIndex(name="index_event_definition_name", fields=["name"], opclasses=["gin_trgm_ops"]),

image

... but seems like it's indeed not used.

@mariusandra
Copy link
Collaborator Author

Performance is really poor for this user:

2021-11-30 22 11 48

But you're right again. We can try just the ordering without the index.

@mariusandra mariusandra marked this pull request as ready for review November 30, 2021 21:17
Copy link
Contributor

@paolodamico paolodamico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! let's try this out, won't merge cause we'll all be asleep if something goes wrong, but feel free to merge once you see this

@mariusandra mariusandra merged commit 98570c5 into master Dec 1, 2021
@mariusandra mariusandra deleted the event-filter-sort branch December 1, 2021 08:15
@yakkomajuri
Copy link
Contributor

Perhaps this should be one of the first uses for the "Special Migrations API"? @yakkomajuri

"Get in line" lol

On a more serious note, I was thinking of running a change to Replicated dead letter queue tables, and ofc we have the events migration too.

However, I'd be happy to have you rather than me as a "Beta" tester for writing a special migration, if you're willing. Will probably be good for feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sort infinite list events & properties by popularity
3 participants