Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

processor/stream: server blocks on stream end when batch size < 10 #3265

Open
axw opened this issue Jan 31, 2020 · 1 comment
Open

processor/stream: server blocks on stream end when batch size < 10 #3265

axw opened this issue Jan 31, 2020 · 1 comment

Comments

@axw
Copy link
Member

axw commented Jan 31, 2020

The processor/stream code reads in batches of 10 events at a time:

transformables, done := p.readBatch(ctx, ipRateLimiter, batchSize, jsonReader, res)

Once a batch is received, they are dispatched to the publisher, which transforms and sends them through the libbeat pipeline to be recorded in Elasticsearch.

By default, agents will close the stream after 10 seconds, or after it reaches a certain size (~750K). So if an agent sends fewer than 10 events, the processor/stream code will generally block waiting for the stream to end before it dispatches to the publisher.

We should consider adding a timeout (or context with timeout) to the StreamReader.Read method to avoid this.

@axw
Copy link
Member Author

axw commented Sep 10, 2020

master...axw:processor-concurrent-read

In this branch I have modified processor/stream to:

  • decode into map[string]interface{}s concurrently with validation and translation into model types (would partially address Intake v2: investigate parallelizing decode and validate #1285, but see below)
  • report events in batches when either: we have a minimum of 10 events, 1 second passes, or the stream ends (addresses this issue)

On master with heavy.ndjson the benchmark I get ~19MB/s, with this branch I get ~27MB/s. Once #3551 is done, as mentioned in #1285 (comment), it would no longer be possible to parallelise decode/validate; but I expect validation will be so fast that it won't matter.

We can come back to this once #3551 is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants