Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize stream consumption before routing queries to servers #3451

Closed
mcvsubbu opened this issue Nov 9, 2018 · 2 comments
Closed

Stabilize stream consumption before routing queries to servers #3451

mcvsubbu opened this issue Nov 9, 2018 · 2 comments
Assignees

Comments

@mcvsubbu
Copy link
Contributor

mcvsubbu commented Nov 9, 2018

We noticed that in some cases when a pinot-server hosting segments in CONSUMING state is restarted it showed high query latencies. The high latency correlated with high consumption rate from streams (a restart causes a consuming segment to consume from the starting offset until latest event).

A few different options to address this problem:

  1. Save state before shutdown. In this case it will mean saving all the events consumed so far (and the last offset consumed) before shutdown. Completing the segment is one way of doing that, but then it not only takes a while to complete the segment, but the next replica to restart (in a rolling restart) will go through the same motion, producing a very small segment in the process. So, saving state locally is better, but is a complex matter. Could be easier with mmaped consuming segments, but we still have inverted indices on heap.

  2. Re-construct state after startup, which is what we do now (in terms of consuming from the starting offset).

  3. Stop routing query requests while consuming segments are busy catching up.

The last option seems to be simple.

There are a few different ways of implementing that. We currently have a way to stop routing queries by not re-setting the isShuttingdown flag until segments are loaded. We can extend the criteria to include that "loading" a consuming segment effectively means that the cosnuming segment has "caught up".

It seems best to include this as a part of ServiceStatus.

It is a little hard to determine if the consuming segments have caught up. The best way is to query the stream to get the current offset or timestamp and decide if it has. Other ways could be to add up the rates of consumption of different consuming segments and as long as the total consumption rate is below a certain threshold, announce that we are ready. Some streams may also throttle consumption.

For now, we should start by integrating ServiceStatus to also query consuming segments (perhaps through the tabledatamanager). Perhaps we can start off with a (configured) time out, and then figure out better ways to determine if consuming segments have caught up.

@mcvsubbu mcvsubbu assigned mcvsubbu and unassigned mcvsubbu Nov 15, 2021
@sajjad-moradi
Copy link
Contributor

#7267 adds a mechanism to get the latest offset for each consuming segment and uses that as the basis for ending the consumption catch-up period.

@mcvsubbu
Copy link
Contributor Author

@sajjad-moradi can this issue be closed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants