-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming mode #126
Streaming mode #126
Conversation
eb5f762
to
4a1680d
Compare
37d5ebd
to
6476034
Compare
Is my understanding that this would effectively limit the maximum memory used by t2html - meaning the limit is determined by the maxLines, not by the size of the input? Have you tried putting a bigger input file through the benchmarks to see what that looks like? e.g. 5mb compared to 0.5mb |
That is the goal, to limit the memory needed to process larger and larger input. In practice it will still allocate quite a lot, but should garbage collect more of it. Here's the same benchmark on a 17MiB file (several copies of fixtures/npm.sh.raw):
166 MB of memory allocations per run still sounds alarming, but that number is "total" - it doesn't decrease as the garbage collector frees memory. If I stick a little diagnostic code at the end of var m runtime.MemStats
runtime.ReadMemStats(&m)
log.Printf("TotalAlloc: %d\tHeapAlloc: %d\tHeapInuse: %d", m.TotalAlloc, m.HeapAlloc, m.HeapInuse) and do a single run with the input streamed into the parser (
The same thing without using
i.e. heap usage grows, does not shrink, and in the end has used 166 MB without being able to free it. |
cfbba30
to
5bbd91b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I'm excited to try it out.
* Reorganise terminal-to-html.go * Reorganise Screen and parser a bit * Change screen.Parse to screen.Write (works as destination for io.Copy) * Screen owns the parser: handles incremental writes * Support setting max lines in all of HTTP, stdin, and file modes * Add comments to parser methods * Change AsHTML return to string, avoiding cast to []byte * Change bytes.Replace to strings.Replace
Also preallocate lines []string slice, saves time and memory in benchmark
Replace was used to put ` ` inside otherwise blank lines (`\n\n` -> `\n \n`). I went looking through history to understand. It used to be implemented as a regexp replace from `^$`. That originated in the Ruby version going back to ef20b4a, before that there was something funky going on with divs. Changing it to generate ` ` on the fly broke two tests. If the rationale for ` ` is to ensure empty lines aren't lost, then the two tests were arguably wrong: * "clears everything after the \x1b[0K" clears the "hello" line, but because that blank line is at the start of the output it wasn't between two newlines, missing the Replace * a similar problem occurs with Pikachu, but at both the start and the end (if you are unconvinced about the end, note there are two blank lines after Pikachu if you cat fixtures/pikachu.sh.raw).
No need to garbage collect lots of slices if they can be reused right away
This checks that the streaming approach produces the same output as the classical buffer-everything approach, for all existing cases.
This avoids a copy when the existing buffer is empty. Also split the "recycle" line in two to make it clear in memory profiles that append is responsible for allocation, not creating the new line itself.
And mention that it can be turned off in the flag description
Co-authored-by: Tessa Bradbury <[email protected]>
bd535aa
to
382f44c
Compare
- Update the `github.com/buildkite/terminal-to-html/v3` dependency from version v3.10.1 to v3.13.0. - Version v3.12.0 introduced an incompatible change, the return type of `AsHTML` changed from `[]byte` to `string`. That same version also introduced streaming mode buildkite/terminal-to-html#126, which allows us to avoid reading the whole input into memory. - Closes #4313
The goal
Allow terminal output to be processed in a streaming manner, at the expense of not buffering the full history of the input. Enable much larger input to be processed without exhausting memory.
Why
The memory improvements of the last couple of PRs (#121, #124) whittle the memory allocated per run for the NPM benchmark from ~24MiB down to ~14MiB, but on a ~0.5MiB input that's still a hefty blowup.
The overwhelming amount of typical terminal output writes to new lines. Sometimes a command on a PTY will erase the previous line (e.g.
git
). Some tools will rewrite the last 10 or so lines (bazel
,docker
). In all cases, sensible CLI tools should assume the terminal window is no larger than a few hundred lines.So why do we need to buffer more lines than that?
By streaming out all the lines we no longer need to touch, they don't need to be kept in memory, so they can be freed. The total memory needed to process input can be limited, rather than having to scale with the size of the input. This will enable displaying much larger logs.
How
The key functionality was added in #118; the main change is to tidy up the main edge case (preserving parser state between chunks of input) and default to using it. Pass
--buffer-max-lines=0
to disable the line limit. It applies to all modes: stdin, file, and the web service.But while I'm here...
strings.Builder
instead ofbytes.Buffer
, and avoid a few casts to[]byte
- it's slightly fasterstrings.Replace
that inserted
by generating
for every line that would have been empty.Show me the benchmarks!
The streaming benchmark doesn't exist before this PR, so I copied it into main temporarily (and fixed it so it would build). The speedups are mostly to do with
strings.Builder
and avoiding casts to[]byte
. The 46% memory saving before/after in the streaming benchmark is primarily due to the screen line recycling.