Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(experiments): Init new experiment query runner #28347

Open
wants to merge 61 commits into
base: master
Choose a base branch
from

Conversation

andehen
Copy link
Contributor

@andehen andehen commented Feb 5, 2025

Problem

We are building a new query runner for experiments, see https://github.com/PostHog/product-internal/pull/712

Changes

Initial setup of a new query runner. Currently behind a feature flag. Will only be tested internally for now.

Contains:

  • new metric types to represent experiment metrics
  • new metric form components
  • logic to handle both metric types safely

How did you test this code?

  • lots of tests added + query snapshots
  • lots of manual testing to verify it does not break anything for existing users
  • storybook story added

Copy link
Contributor

@jurajmajerik jurajmajerik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great. Especially love that we're now unifying the data warehouse integration with the standard use case. This makes so much sense in hindsight.

I know this is very much a WIP, so feel free to disregard some of my comments :)

case _:
# Else, we default to count
# We then just emit 1 so we can easily sum it up
metric_value = "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how the funnel/binomial comes into play here. Would we also simply count "1" for every recorded result returned from the funnel's last step?

*test_accounts_filter,
]
),
group_by=[ast.Field(chain=["variant"]), ast.Field(chain=["distinct_id"])],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a new problem, but I’m just noting that a single user can have two different variant values. This exposure query groups by [variant, distinct id], so if a user receives two different flag values, they'll appear in both groups. This is mainly an SDK/feature flag implementation issue, but the question is if we want to handle it on our end (e.g. counting each user only once).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think what we want to do here is to exclude users with multiple variant exposure and include the count in the "health checks" we display to the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exclude them entirely? Not simply use the most recent exposure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's my position, as:

  1. Users with multiple variant exposure screws up results
  2. it's an error we should warn the user about
  3. it complicates our queries if we have to handle this

I think we should start with that at least, as that's the most sane default behavior. More advanced handling could be implemented in the future if requested.

expr=ast.Field(chain=[series_node.table_name, series_node.distinct_id_field]),
),
ast.Field(chain=["exposure", "variant"]),
parse_expr(f"{metric_value} as value"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor nit/more of a curiosity thing (I haven't written any AST myself), but is there a convention for when to use parse_expr versus assembling the query programmatically? i.e:

This:
parse_expr(f"{metric_value} as value"),

vs this:

ast.Alias(
    alias="value",
    expr=parse_expr(metric_value)
)

The latter seems more consistent and less error-prone.

Comment on lines 205 to 209
ast.Field(chain=["events", "timestamp"]),
ast.Field(chain=["events", "distinct_id"]),
ast.Field(chain=["exposure", "variant"]),
ast.Field(chain=["events", "event"]),
parse_expr(f"{metric_value} as value"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of manually written strings here and we'll be writing many more similar queries. It would be good to abstract the common ones into string constants to reduce the risk of typos.

next_join=ast.JoinExpr(
table=events_after_exposure_query,
join_type="LEFT JOIN",
alias="eae",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a bit more descriptive alias here :)

@@ -7196,6 +7196,21 @@ class ExperimentTrendsQuery(BaseModel):
response: Optional[ExperimentTrendsQueryResponse] = None


class ExperimentQuery(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this generated from the frontend's schema-general.ts (as it should be)? Let's commit that file too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this generated from the frontend's schema-general.ts (as it should be)? Let's commit that file too?

Nope, added it solely as a talking point for now.

@andehen
Copy link
Contributor Author

andehen commented Feb 6, 2025

Looking great. Especially love that we're now unifying the data warehouse integration with the standard use case. This makes so much sense in hindsight.

Good! 👍

I know this is very much a WIP, so feel free to disregard some of my comments :)

Yeah, I have consciously taken many shortcuts to identify and resolve the complex issues as quickly as possible. So I'll wait with any polishing until the core stuff feels right. It's beneficial to keep the code changes minimal until interfaces stabilizes, so iterating and refactoring is easier.

@danielbachhuber danielbachhuber force-pushed the init-new-experiment-query-runner branch from f89e988 to a2e2e27 Compare February 11, 2025 20:20
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@andehen andehen marked this pull request as ready for review February 14, 2025 08:56
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Based on the provided files and context, I'll provide a concise summary of the key changes in this PR:

Initial setup of a new experiment query runner with unified data warehouse integration. Key changes include:

  • Added new ExperimentMetric type to replace ExperimentQuery, simplifying the metrics structure
  • Introduced ExperimentActionMetricConfig for handling action-based metrics alongside event metrics
  • Created new ExperimentQueryRunner class in Python for handling experiment analysis through HogQL queries
  • Added feature flag EXPERIMENTS_NEW_QUERY_RUNNER to control rollout
  • Updated frontend components to support both legacy and new metric types with proper type safety

The changes appear well-structured and maintain backward compatibility while introducing the new query runner infrastructure. The implementation is currently WIP and not yet being called from anywhere.

Potential concerns:

  • Missing error handling for database queries in ExperimentQueryRunner
  • Type safety improvements needed in filterToMetricConfig utility
  • Validation needed for statistical assumptions in query runner

34 file(s) reviewed, 33 comment(s)
Edit PR Review Bot Settings | Greptile

frontend/src/queries/schema/schema-general.ts Show resolved Hide resolved
Comment on lines 221 to 223
{result &&
(result.kind === 'ExperimentTrendsQuery' || result.kind === 'ExperimentFunnelsQuery') && (
<ExploreButton result={result} />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: string literals used for type comparison instead of NodeKind enum that's already imported and used elsewhere in the file

Suggested change
{result &&
(result.kind === 'ExperimentTrendsQuery' || result.kind === 'ExperimentFunnelsQuery') && (
<ExploreButton result={result} />
{result &&
(result.kind === NodeKind.ExperimentTrendsQuery || result.kind === NodeKind.ExperimentFunnelsQuery) && (
<ExploreButton result={result} />

Comment on lines +20 to +22
if (!metricIdx && metricIdx !== 0) {
return <></>
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: returning an empty fragment when metricIdx is undefined could lead to confusing UI state - consider showing a message or error state instead

# If the metric type is continuous, we need to extract the value from the event property
metric_property = self.metric.metric_config.math_property
if is_data_warehouse_query:
metric_value = f"toFloat('{metric_property}')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: potential SQL injection vulnerability - metric_property value should be escaped/sanitized before interpolation

Comment on lines +60 to +61
self.experiment = Experiment.objects.get(id=self.query.experiment_id)
self.feature_flag = self.experiment.feature_flag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: missing error handling for Experiment.objects.get() which could raise DoesNotExist

Suggested change
self.experiment = Experiment.objects.get(id=self.query.experiment_id)
self.feature_flag = self.experiment.feature_flag
try:
self.experiment = Experiment.objects.get(id=self.query.experiment_id)
self.feature_flag = self.experiment.feature_flag
except Experiment.DoesNotExist:
raise ValidationError(f"Experiment with id {self.query.experiment_id} does not exist")

date_range=self.date_range,
team=self.team,
interval=IntervalType.DAY,
now=datetime.now(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: using datetime.now() without timezone info could cause inconsistencies - should use datetime.now(UTC)

Suggested change
now=datetime.now(),
now=datetime.now(UTC),

Comment on lines +107 to +108
elif end_date is not None:
end_date = timezone.make_aware(end_date) # Make naive datetime timezone-aware
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Redundant condition - elif end_date is not None will always be true since it's the else branch of if end_date is None

Suggested change
elif end_date is not None:
end_date = timezone.make_aware(end_date) # Make naive datetime timezone-aware
else:
end_date = timezone.make_aware(end_date) # Make naive datetime timezone-aware

Comment on lines +396 to +405
if kind == "ExperimentQuery":
from .experiments.experiment_query_runner import ExperimentQueryRunner

return ExperimentQueryRunner(
query=query,
team=team,
timings=timings,
modifiers=modifiers,
limit_context=limit_context,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The ExperimentQuery case follows the same pattern as other runners but doesn't use cast() like most other cases do. Consider adding cast for consistency.

@PostHog PostHog deleted a comment from greptile-apps bot Feb 14, 2025
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

4 snapshot changes in total. 0 added, 4 modified, 0 deleted:

  • chromium: 0 added, 4 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

4 snapshot changes in total. 0 added, 4 modified, 0 deleted:

  • chromium: 0 added, 4 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Copy link
Contributor

@danielbachhuber danielbachhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 I did a final round of testing...

With the feature flag disabled:

  • An existing experiments displays results as expected.
  • I could add and remove metrics from the experiment as expected.
  • I could create and delete shared metrics as expected.

With the feature flag enabled:

  • I could edit an existing shared metric with a query.
  • Creating a new shared metric used the experiment metric form. The event and property both saved as expected.
  • Loading an existing experiment displayed results as expected.
  • I could add new query metrics and remove existing metrics from an existing experiment.
  • Creating a new experiment used the new experiment metric UX as expected.

@@ -1331,13 +1363,22 @@ export const experimentLogic = kea<experimentLogicType>([
getMetricType: [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to rename this to getExperimentQueryType in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, noted it down

raise ValueError("Control variant not found in experiment results")

# Statistical analysis
if self.stats_version == 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we only support stats_version == 2 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, shouldn't be possible with stats_version == 1 here. Noted it down to clean this up.

Copy link
Contributor

@jurajmajerik jurajmajerik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go!

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants