Skip to content
This repository has been archived by the owner on Nov 4, 2021. It is now read-only.

Action matching juice #436

Merged
merged 40 commits into from
Jun 9, 2021
Merged

Action matching juice #436

merged 40 commits into from
Jun 9, 2021

Conversation

Twixes
Copy link
Member

@Twixes Twixes commented May 27, 2021

Changes

Resolves #235.

  • In-memory synced list of action steps for all teams, reloading/adding/removing data for a specific action based on pubsub dispatched when the action is updated in the app, using Django model signals In-memory action definitions synced with Django #403 & Reorient ActionManager to group by teamId for practicality #433
  • Action matching logic:
    • Action step type:
      1. Autocapture:
        • Link href equals
        • Text equals
        • HTML selector match
        • URL contains
        • URL matches regex
        • URL matches exactly
      2. Custom view
        • Event name matches exactly
      3. Page view
        • Event name matches exactly ($pageview)
        • URL contains
        • URL matches regex
        • URL matches exactly
    • General filtering:
      1. Property:
        • Types:
          • Event
          • Person
          • Element
        • Operators:
          • Equals (or not)
          • Contains (or not)
          • Matches (or not) regex
          • Greater than/lower than
          • Set (or not)
      2. Cohort:
        • Person belonging to dynamic cohort
        • Person belonging to static cohort (ClickHouse)
  • Insertion of new action occurrence into Postgres posthog_action_events table (Postgres)
  • Webhooks firing
  • REST hooks (Zapier) firing

Checklist

  • Jest tests

@Twixes Twixes changed the base branch from master to actionmanager-reoriented May 27, 2021 09:01
Base automatically changed from actionmanager-reoriented to master May 27, 2021 10:55
@Twixes Twixes marked this pull request as ready for review June 3, 2021 10:18
@Twixes
Copy link
Member Author

Twixes commented Jun 3, 2021

All matching options are functional (selector matching was definitely the trickiest). Any optimization tips appreciated.
There's a bunch of tests but I'm still adding more for more combinations and edge cases (though impossible to cover all combinations).
Webhooks/Zapier will be a smaller followup PR.

Copy link
Contributor

@yakkomajuri yakkomajuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing work!

Initial first pass - didn't compare anything with the Python code yet, and didn't look over everything just yet.

This is great stuff though.

requirements: Partial<Element>

constructor(tag: string, directDescendant: boolean, escapeSlashes: boolean) {
const SELECTOR_ATTRIBUTE_REGEX = /([a-zA-Z]*)\[(.*)=[\'|\"](.*)[\'|\"]\]/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here would be useful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should this comment convey?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with longer regexes an example of what it should match is always useful

Copy link
Collaborator

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some really nice code. Well done! :)

I found one TODO to fix, and have some thoughts regarding caching person. Otherwise LGTM.

Comment on lines 209 to 218
case 'event':
return this.checkEventAgainstEventFilter(event, filter)
case 'person':
person = await this.db.fetchPerson(event.team_id, event.distinct_id)
return this.checkEventAgainstPersonFilter(person, filter)
case 'element':
return this.checkEventAgainstElementFilter(elements, filter)
case 'cohort':
person = await this.db.fetchPerson(event.team_id, event.distinct_id)
return await this.checkEventAgainstCohortFilter(person, filter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This db.fetchPerson could be fetching the same db rows multiple times per ActionMatcher.match in case we have a lot of steps that check user properties.

It would be great if we could either recycle the person that we got while ingesting, or then just cache it here somehow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I was initially thinking of just using the person from processEvent, but keeping track of that object while making sure it's updated along with property updates etc. all while other events are being processed would very likely result in some inconsistencies. So the person for action matching is fetched once per event now. There could be some better optimizations too, but how about seeing how this performs dryrunning in production?

@Twixes Twixes added the bump minor Bump minor version when this PR gets merged label Jun 9, 2021
@Twixes Twixes added bump patch Bump patch version when this PR gets merged and removed bump minor Bump minor version when this PR gets merged labels Jun 9, 2021
@Twixes Twixes merged commit 84e471c into master Jun 9, 2021
@Twixes Twixes deleted the 235-action-matching-plus branch June 9, 2021 12:44
fuziontech pushed a commit to PostHog/posthog that referenced this pull request Oct 12, 2021
* Reorient `ActionManager` to group by teamId for practicality

* Make `getTeamActions()` return type more versatile

* Add ActionMatcher base

* Add `matchActions` worker task for optimization

* Add part of action matching checks

* Fix PubSub's lack of teamId

* Remove `eventsProcessor.prepare()` calls

* Add a legit ActionMatcher test

* Adjust test for matchActions task

* Improve task counting in test

* Add moar matching capabilities

* Improve tests

* Reorganize class dependencies

* Add cohort matching

* Add element/selector matching and polish other action matching parts

* Handle selector matching edge cases

* Save matched action occurrences to Postgres

* Fix action-matcher tests

* Don't expose ActionManager methods in ActionMatcher

* Use feedback + RE2

* Remove never satisfied branch

* Address sum feedback and clean code up

* Use action matching results in a smarter way

* Fix `createHub`

* Add action matching metric

* Enhance `PLUGIN_SERVER_ACTION_MATCHING`

* Update action-matcher.test.ts

* Remove `||=`

* Update process-event.ts

* Only fetch person if matching actions

* Fix non-string distinct ID handling

* Update process-event.ts
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bump patch Bump patch version when this PR gets merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Match actions to events on the fly
3 participants