-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(plugin-server): Reduce concurrency of functional test suite. #18833
Conversation
…replicate errors.
That's not it -- things seem to be failing marginally less often, but they're still flapping. This reverts commit 076c63b.
Finally, an explanation for failures in (env) ted@revuelto plugin-server % CLICKHOUSE_DATABASE='default' DATABASE_URL=postgres://posthog:posthog@localhost:5432/test_posthog pnpm functional_tests --verbose -t 'teardown is called on stateful plugin reload if they are updated'
> @posthog/[email protected] functional_tests /Users/ted/Repositories/PostHog/posthog/plugin-server
> jest --config ./jest.config.functional.js "--verbose" "-t" "teardown is called on stateful plugin reload if they are updated"
FAIL functional_tests/plugins.test.ts (14.636 s)
✕ plugin method tests: teardown is called on stateful plugin reload if they are updated (13109 ms)
○ skipped plugin method tests: event captured, processed, ingested
○ skipped plugin method tests: creates error on unhandled throw
○ skipped plugin method tests: creates error on unhandled rejection
○ skipped plugin method tests: creates error on unhandled promise errors
○ skipped plugin method tests: can update distinct_id via processEvent
○ skipped plugin method tests: can drop events via processEvent
○ skipped plugin method tests: correct $autocapture properties included in onEvent calls
○ skipped plugin jobs: can call runNow from onEvent
○ skipped plugin jobs: can call runNow from processEvent
○ skipped plugin jobs: runEveryMinute is executed
○ skipped plugins can use attachements
○ skipped plugins can use config variables
○ skipped plugin configs are still loaded if is_global = false and the team does not own the plugin
○ skipped liveness check endpoint works
● plugin method tests: teardown is called on stateful plugin reload if they are updated
expect(received).toHaveLength(expected)
Expected length: 1
Received length: 0
Received array: []
263 | await waitForExpect(async () => {
264 | const logs = await fetchPluginConsoleLogEntries(pluginConfig.id)
> 265 | expect(logs.filter((log) => log.message.method === 'teardownPlugin')).toHaveLength(1)
| ^
266 | })
267 | })
268 |
at toHaveLength (functional_tests/plugins.test.ts:265:79)
at waitForExpect (functional_tests/expectations.ts:9:20)
at functional_tests/plugins.test.ts:263:5
Test Suites: 1 failed, 9 skipped, 1 of 10 total
Tests: 1 failed, 68 skipped, 69 total
Snapshots: 0 total
Time: 14.818 s
Ran all test suites with tests matching "teardown is called on stateful plugin reload if they are updated".
Checking clickhouse-ingestion events_plugin_ingestion 0 38 38
Checking clickhouse-plugin-server-async-webhooks clickhouse_events_json 0 38 38
Checking clickhouse-plugin-server-async-onevent clickhouse_events_json 0 38 38
Checking partition_statistics events_plugin_ingestion 0 38 38
Checking group1 clickhouse_events_json 0 38 38
Checking group1 clickhouse_person 0 38 38
Checking group1 plugin_log_entries 0 1060 1060
Checking group1 clickhouse_app_metrics 0 37 37
Checking group1 clickhouse_person_distinct_id 0 38 38
ELIFECYCLE Command failed with exit code 1. I added a bit more detail to the Kafka producer: diff --git a/plugin-server/src/kafka/producer.ts b/plugin-server/src/kafka/producer.ts
index 7029a26c79..66416f450b 100644
--- a/plugin-server/src/kafka/producer.ts
+++ b/plugin-server/src/kafka/producer.ts
@@ -80,7 +80,7 @@ export const produce = async ({
headers?: MessageHeader[]
waitForAck?: boolean
}): Promise<number | null | undefined> => {
- status.debug('📤', 'Producing message', { topic: topic })
+ status.debug('📤', 'Producing message', { topic: topic, value: value.toString() })
const produceSpan = getSpan()?.startChild({ op: 'kafka_produce' })
return await new Promise((resolve, reject) => {
if (waitForAck) { and noticed that even when it was failing, it still was producing the plugin-defined teardown message:
I thought this was strange because I hadn't seen the log message at all in the ClickHouse table on prior test runs that failed (most of them) when running: ted@revuelto plugin-server % docker-compose -f ../docker-compose.dev.yml exec -it clickhouse clickhouse-client -q 'select * from plugin_log_entries order by timestamp asc' This seemed suspicious, there were no errors in the ClickHouse server logs to indicate any problems with Kafka. I started to suspect a row key conflict: SHOW CREATE TABLE plugin_log_entries
FORMAT Vertical
Query id: aebb380d-9147-4c6b-8157-6cbe94304643
Row 1:
──────
statement: CREATE TABLE default.plugin_log_entries
(
`id` UUID,
`team_id` Int64,
`plugin_id` Int64,
`plugin_config_id` Int64,
`timestamp` DateTime64(6, 'UTC'),
`source` String,
`type` String,
`message` String,
`instance_id` UUID,
`_timestamp` DateTime,
`_offset` UInt64
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/noshard/posthog.plugin_log_entries', '{replica}-{shard}', _timestamp)
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (team_id, plugin_id, plugin_config_id, timestamp)
TTL toDate(timestamp) + toIntervalWeek(1)
SETTINGS index_granularity = 512
1 row in set. Elapsed: 0.003 sec. Sure enough, there was a gap in the Kafka offsets in the table while unloading: SELECT *
FROM plugin_log_entries
ORDER BY _offset ASC
FORMAT Markdown
And of course, here's our missing message at offset 1058 -- it was combined with the message at offset 1059 which shares the same row key: ted@revuelto plugin-server % docker-compose -f ../docker-compose.dev.yml exec -it kafka kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic plugin_log_entries --partition 0 --offset 1058 --max-messages 1 | jq
{
"source": "CONSOLE",
"type": "LOG",
"id": "018bf968-d281-0000-0171-5313d671629b",
"team_id": 39,
"plugin_id": 39,
"plugin_config_id": 39,
"timestamp": "2023-11-22 23:41:18.337",
"message": "{\"method\":\"teardownPlugin\"}",
"instance_id": "018bf967-b7a6-0000-37fd-d708628dd2c3"
}
Processed a total of 1 messages |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
Still some issues with export functional tests but seems like those can be addressed on their own, don't really need this meta-PR for identifying the flakes any longer. |
Ignore this and don't merge (at least for now) -- trying to replicate test failures seen locally under similar conditions when forcing serial execution, since they're the same tests that I've seen failing intermittently on #18764 and others.