Skip to content
This repository has been archived by the owner on Nov 4, 2021. It is now read-only.

Plugin server gets stuck when ingesting via Kafka #154

Closed
mariusandra opened this issue Feb 15, 2021 · 2 comments
Closed

Plugin server gets stuck when ingesting via Kafka #154

mariusandra opened this issue Feb 15, 2021 · 2 comments

Comments

@mariusandra
Copy link
Collaborator

We have had cases where the Kafka-powered plugin server just stops working. Such as last night on cloud.

image

Plugin server ingestion on its own is turned off, yet I'm still running the "queue latency" plugin, which emits one event via posthog.capture every minute. Between noon and 9am, the plugin server stopped ingesting events.

Here's what the logs show:

Screenshot 2021-02-15 at 09 44 04

The line took to long to run plugins on... is part of a system to detect if and when eachBatch gets stuck (without killing anything).

What exactly happened is still to be determined. Everything came back online when the task was redeployed this morning.

@mariusandra
Copy link
Collaborator Author

This can be closed I hope. We now have timeouts and a lot of metrics while processing the Kafka batch. The "30sec timeout warning" messages will also be sent to Sentry, so we should keep an eye on them ther.e

@mariusandra
Copy link
Collaborator Author

The timeouts helped catch several bugs. The big problem though was unpooled concurrent redis requests (more here).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant