Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace: update tracing-subscriber to 0.2.0-alpha.5 #423

Merged
merged 5 commits into from
Feb 4, 2020

Conversation

hawkw
Copy link
Contributor

@hawkw hawkw commented Jan 31, 2020

Version 0.0.7 of sharded-slab contains a bug where, when the remove
method is called with the index of a slot that is not being accessed
concurrently, the slot is emptied but not placed on the free list.
This issue meant that, under tracing-subscriber's usage pattern, where
slab entries are almost always uncontended when reused, allocated slab
pages are almost never reused, resulting in unbounded slab growth over
time (i.e. a memory leak).

This commit updates tracing-subscriber' to version 0.2.0-alpha.6,
which in turn bumps the sharded-slab dependency to v0.0.8, which
includes commit hawkw/sharded-slab@dfdd7ae. That commit fixes this bug.

I've empirically verified that, after running linkerd2-proxy under
load with a global trace filter that enables a lot of spans, heap
usage remains stable, and the characteristic stair-step heap growth
pattern of doubling slab allocations doesn't occur. This indicates that
freed slots are actually being reused, and (once fully warmed up), the
slab will only grow when the number of active spans in the system
increases.

mem_plot

Closes linkerd/linkerd2#3998

Signed-off-by: Eliza Weisman [email protected]

This includes hawkw/sharded-slab@dfdd7ae, which fixes a memory leak in
the slab used for storing per-span data. I've done some testing, and
proxy RSS under load seems pretty stable after the update.

Signed-off-by: Eliza Weisman <[email protected]>
@hawkw hawkw added the bug Something isn't working label Jan 31, 2020
@hawkw hawkw requested review from olix0r and adleong January 31, 2020 23:24
@hawkw hawkw self-assigned this Jan 31, 2020
Signed-off-by: Eliza Weisman <[email protected]>
olix0r
olix0r previously approved these changes Feb 1, 2020
@olix0r
Copy link
Member

olix0r commented Feb 2, 2020

Hmm... does this break log formatting? I see log lines like

[ 43809.10997271s] TRACE outbound:accept{{peer.addr}}:source{{target.addr}}:logical{{addr}}:making:profile:balance{{addr}}: linkerd2_app_core::trace::layer: poll ready

after enabling trace logging

@olix0r olix0r dismissed their stale review February 2, 2020 21:31

need to check a few more things..

@olix0r
Copy link
Member

olix0r commented Feb 2, 2020

Hmm, I'm also seeing some weird behavior with the log env...

      annotations:
        config.linkerd.io/proxy-log-level: linkerd2_http_metrics=info,linkerd2_metrics=info,linkerd=trace,warn

logs trace events for metrics modules

@hawkw
Copy link
Contributor Author

hawkw commented Feb 2, 2020

@olix0r the trace events in linkerd2_metrics are enabled by the linkerd=trace directive. When specifying filters for different targets, if a more specific filter doesn't enable an event, it may still be enabled by a more general directive. tracing-subscriber has always had this behavior. This could definitely be changed (open an issue!) but it's always been there.

@hawkw
Copy link
Contributor Author

hawkw commented Feb 2, 2020

Hmm... does this break log formatting? I see log lines like

[ 43809.10997271s] TRACE outbound:accept{{peer.addr}}:source{{target.addr}}:logical{{addr}}:making:profile:balance{{addr}}: linkerd2_app_core::trace::layer: poll ready

after enabling trace logging

hmm...that looks wrong to me — I didn't see this while testing, but I'll have to take another look?

Edit: hmm, everything looks fine for me:

[     0.269954304s] TRACE proxy{test=http1::outbound_asks_controller_api:proxy}:outbound{}:accept{peer.addr=127.0.0.1:51769}:source{target.addr=127.0.0.1:51764}:addr{addr=disco.test.svc.cluster.local:80}:logical{dst.logical=disco.test.svc.cluster.local:80}: linkerd2_app_core::profiles: response received

@olix0r
Copy link
Member

olix0r commented Feb 3, 2020

Also... did the formatter change to no longer format levels as 4-char strings?

We now see DEBUG and TRACE -- this breaks vertical alignment in logs when mixed with INFO and WARN. I'm fine with using the longer strings but we should left-pad them with space so that they align?

@hawkw
Copy link
Contributor Author

hawkw commented Feb 3, 2020

@olix0r whoops, I thought that our old custom formatter was identical to the default except for the (no longer used) custom logging contexts, but I had forgotten we formatted levels differently. FWIW, the default formatter should left-pad 4-character levels like INFO and WARN already. I can re-add a custom formatter to make it identical to the old formatting, though!

@hawkw
Copy link
Contributor Author

hawkw commented Feb 3, 2020

FWIW, the default formatter should left-pad 4-character levels like INFO and WARN already.

ugh, apparently it only does that when ANSI colors are enabled? that looks like a bug upstream

@hawkw
Copy link
Contributor Author

hawkw commented Feb 3, 2020

tokio-rs/tracing#560 should fix any wrong formatting with ansi colors turned off

@hawkw
Copy link
Contributor Author

hawkw commented Feb 3, 2020

@olix0r I've published a new tracing-subscriber version with fixes for non-ANSI formatting and updated the proxy, should be all good now!

@olix0r olix0r self-requested a review February 3, 2020 23:58
@olix0r
Copy link
Member

olix0r commented Feb 3, 2020

I've tested this and don't see any of the issues I reported earlier

@hawkw hawkw requested a review from a team February 4, 2020 00:23
Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@hawkw hawkw merged commit d1f18a8 into master Feb 4, 2020
@olix0r olix0r deleted the eliza/patch-subscriber-2 branch February 4, 2020 01:45
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 4, 2020
This release fixes a bug in the proxy's logging subsystem that could
cause the proxy to consume memory until the process is OOMKilled,
especially when the proxy was configured to log diagnostic information.

The proxy also now properly emits `grpc-status` headers when signaling
proxy errors to gRPC clients.

This release upgrades the proxy's Rust version, the `http` crate
dependency to address RUSTSEC-2019-0033 and RUSTSEC-2019-0034, and the
`prost` crate dependency has been patched to address RUSTSEC-2020-02.

---

* internal: Introduce a locking middleware (linkerd/linkerd2-proxy#408)
* Update to Rust 1.40 with new Cargo.lock format (linkerd/linkerd2-proxy#410)
* Update http to v0.1.21 (linkerd/linkerd2-proxy#412)
* internal: Split retry, http-classify, and http-metrics (linkerd/linkerd2-proxy#409)
* Actually update http to v0.1.21 (linkerd/linkerd2-proxy#413)
* patch `prost` 0.5 to pick up security fix (linkerd/linkerd2-proxy#414)
* metrics: Make Counter & Gauge atomic (linkerd/linkerd2-proxy#415)
* Set grpc-status headers on dispatch errors (linkerd/linkerd2-proxy#416)
* trace: update `tracing-subscriber` to 0.2.0-alpha.4 (linkerd/linkerd2-proxy#418)
* discover: Warn on discovery error (linkerd/linkerd2-proxy#422)
* router: Avoid large up-front allocations (linkerd/linkerd2-proxy#421)
* errors: Set correct HTTP version on responses (linkerd/linkerd2-proxy#424)
* app: initialize tracing prior to parsing env vars (linkerd/linkerd2-proxy#425)
* trace: update tracing-subscriber to 0.2.0-alpha.6 (linkerd/linkerd2-proxy#423)
adleong pushed a commit to linkerd/linkerd2 that referenced this pull request Feb 4, 2020
This release fixes a bug in the proxy's logging subsystem that could
cause the proxy to consume memory until the process is OOMKilled,
especially when the proxy was configured to log diagnostic information.

The proxy also now properly emits `grpc-status` headers when signaling
proxy errors to gRPC clients.

This release upgrades the proxy's Rust version, the `http` crate
dependency to address RUSTSEC-2019-0033 and RUSTSEC-2019-0034, and the
`prost` crate dependency has been patched to address RUSTSEC-2020-02.

---

* internal: Introduce a locking middleware (linkerd/linkerd2-proxy#408)
* Update to Rust 1.40 with new Cargo.lock format (linkerd/linkerd2-proxy#410)
* Update http to v0.1.21 (linkerd/linkerd2-proxy#412)
* internal: Split retry, http-classify, and http-metrics (linkerd/linkerd2-proxy#409)
* Actually update http to v0.1.21 (linkerd/linkerd2-proxy#413)
* patch `prost` 0.5 to pick up security fix (linkerd/linkerd2-proxy#414)
* metrics: Make Counter & Gauge atomic (linkerd/linkerd2-proxy#415)
* Set grpc-status headers on dispatch errors (linkerd/linkerd2-proxy#416)
* trace: update `tracing-subscriber` to 0.2.0-alpha.4 (linkerd/linkerd2-proxy#418)
* discover: Warn on discovery error (linkerd/linkerd2-proxy#422)
* router: Avoid large up-front allocations (linkerd/linkerd2-proxy#421)
* errors: Set correct HTTP version on responses (linkerd/linkerd2-proxy#424)
* app: initialize tracing prior to parsing env vars (linkerd/linkerd2-proxy#425)
* trace: update tracing-subscriber to 0.2.0-alpha.6 (linkerd/linkerd2-proxy#423)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

proxy RSS grows with traffic
3 participants