Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new JVM runtime environment metrics #3352

Closed
74 changes: 73 additions & 1 deletion semantic_conventions/metrics/process-runtime-jvm-metrics.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
groups:
- id: attributes.process.runtime.jvm.memory
type: attribute_group
brief: "Describes JVM memory metric attributes."
brief: "Describes JVM memory metric attributes. "
attributes:
- id: type
type:
Expand All @@ -25,6 +25,32 @@ groups:
Pool names are generally obtained via
[MemoryPoolMXBean#getName()](https://docs.oracle.com/en/java/javase/11/docs/api/java.management/java/lang/management/MemoryPoolMXBean.html#getName()).

- id: attributes.process.runtime.jvm.cpu.monitor
type: attribute_group
brief: "Describes JVM monitor metric attributes."
attributes:
- id: thread
roberttoyonaga marked this conversation as resolved.
Show resolved Hide resolved
type: int
requirement_level: opt_in
brief: Thread emitting the metric.
examples: [1, 2]
- id: class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkout the code.namespace field as an alternative to defining a new attribute.

type: string
requirement_level: opt_in
brief: Class of the monitor.
examples: ["java.lang.Object"]

- id: attributes.process.runtime.jvm.network
type: attribute_group
brief: "Describes JVM network IO metric attributes."
attributes:
- ref: thread
roberttoyonaga marked this conversation as resolved.
Show resolved Hide resolved
- id: mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once #3431 lands, should change this to network.direction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ok to change it proactively (that PR could take a while...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I changed it to network.direction

type: string
requirement_level: recommended
brief: Read or write.
examples: [ "read", "write" ]

- id: metric.process.runtime.jvm.memory.usage
type: metric
metric_name: process.runtime.jvm.memory.usage
Expand Down Expand Up @@ -183,3 +209,49 @@ groups:
brief: "Number of buffers in the pool."
instrument: updowncounter
unit: "{buffer}"

- id: metric.process.runtime.jvm.cpu.monitor.wait
type: metric
metric_name: process.runtime.jvm.cpu.monitor.wait
extends: attributes.process.runtime.jvm.cpu.monitor
brief: "Time thread was waiting at a monitor. Only available in JDK 17+."
instrument: histogram
unit: "ms"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will want to use s unit for all durations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add bucket recommendation at the same time?


- id: metric.process.runtime.jvm.cpu.monitor.blocked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ever useful to sum together the time a monitor was blocked and waiting? Trying to think about whether blocked vs waiting makes sense as an attribute rather than a separate metric.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems similar to process.cpu.time which has attribute

state, if specified, SHOULD be one of: system, user, wait

so maybe process.runtime.jvm.cpu.monitor.time with attribute state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I think that's a good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with suggestion applied

type: metric
metric_name: process.runtime.jvm.cpu.monitor.blocked
extends: attributes.process.runtime.jvm.cpu.monitor
brief: "Time thread was blocked at a monitor. Only available in JDK 17+."
instrument: histogram
unit: "ms"

- id: metric.process.runtime.jvm.cpu.context_switch
type: metric
metric_name: process.runtime.jvm.cpu.context_switch
Copy link
Member

@trask trask Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check if there's a difference between this and process.context_switches metric?

Suggested change
metric_name: process.runtime.jvm.cpu.context_switch
metric_name: process.runtime.jvm.context_switches

Copy link
Contributor Author

@roberttoyonaga roberttoyonaga Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @trask I checked the Hotspot code and it seems to me like the the JFR source of this metric does not account for virtual threads, only platform threads. However, it does look like process.runtime.jvm.context_switches is a little different because it reports a rate in Hz rather than a count like process.context_switches does.

Copy link
Contributor Author

@roberttoyonaga roberttoyonaga Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the description for process.context_switches says: "Number of times the process has been context switched." Does this mean it's referring to process context switches rather than thread context switches? The metrics derived from JFR refers to threads specifically.

brief: "Number of context switches per second. Only available in JDK 17+."
instrument: updowncounter
unit: "Hz"

- id: metric.process.runtime.jvm.memory.allocation
type: metric
metric_name: process.runtime.jvm.memory.allocation
brief: "Size of object allocated by thread. Only available in JDK 17+."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a little bit different. ThreadMXbean returns the cumulative allocation per thread, while the JFR event ObjectAllocationSample describes a single allocation instance (sampled to reduce overhead. Sampling only happens on the TLAB slow path). But now that I think about it, it might be more useful to know the total allocation per thread rather than have statistical data on allocation sizes per thread. Additionally, the statistical data would be skewed because sampling is only done on the slow path when a new TLAB is required, or allocations won't fit into a TLAB (this is because the events purpose is to show where the allocations are happening, not how big they are).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think(?) this could be implemented in Java 8 using https://docs.oracle.com/javase/8/docs/jre/api/management/extension/com/sun/management/ThreadMXBean.html#getThreadAllocatedBytes-long:A-

That would be cool.

the JFR event ObjectAllocationSample describes a single allocation instance (sampled to reduce overhead. Sampling only happens on the TLAB slow path).

If we continue to report this in JFR, we'll want to somehow communicate to users that thee allocations are sampled.

this is because the events purpose is to show where the allocations are happening, not how big they are

Presumably for building out a profile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably for building out a profile?

Yup, you can generate flame graphs from the stack traces and other useful things like that.

If we continue to report this in JFR

I think that we should not report allocations with JFR because the purpose of those events is actually a little different than what we want to use them for. Also, the current implementation (jdk.ObjectAllocationInNewTLAB and jdk.ObjectAllocationOutsideTLAB) would result in too high an overhead for people to use in production. Those events are turned off by default in both monitoring and profiling JFR configurations. This is because they aren't throttled like jdk.ObjectAllocationSample is.

instrument: histogram
unit: "By"

- id: metric.process.runtime.jvm.network.io
type: metric
metric_name: process.runtime.jvm.network.io
brief: "Bytes read/written by thread. Only available in JDK 17+."
extends: attributes.process.runtime.jvm.network
instrument: histogram
unit: "By"

- id: metric.process.runtime.jvm.network.time
type: metric
metric_name: process.runtime.jvm.network.time
brief: "Duration of network IO operation by thread. Only available in JDK 17+."
extends: attributes.process.runtime.jvm.network
instrument: histogram
unit: "ms"
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@ semantic conventions when instrumenting runtime environments.
* [Metric: `process.runtime.jvm.buffer.usage`](#metric-processruntimejvmbufferusage)
* [Metric: `process.runtime.jvm.buffer.limit`](#metric-processruntimejvmbufferlimit)
* [Metric: `process.runtime.jvm.buffer.count`](#metric-processruntimejvmbuffercount)
* [Metric: `process.runtime.jvm.cpu.monitor.wait`](#metric-processruntimejvmcpumonitorwait)
* [Metric: `process.runtime.jvm.cpu.monitor.blocked`](#metric-processruntimejvmcpumonitorblocked)
* [Metric: `process.runtime.jvm.cpu.context_switch`](#metric-processruntimejvmcpucontext_swtich)
* [Metric: `process.runtime.jvm.memory.allocation`](#metric-processruntimejvmmemoryallocation)
* [Metric: `process.runtime.jvm.network.io`](#metric-processruntimejvmnetworkio)
* [Metric: `process.runtime.jvm.network.time`](#metric-processruntimejvmnetworktime)

<!-- tocstop -->

Expand Down Expand Up @@ -373,3 +379,94 @@ This metric is [recommended](../metric-requirement-level.md#recommended).

**[1]:** Pool names are generally obtained via [BufferPoolMXBean#getName()](https://docs.oracle.com/en/java/javase/11/docs/api/java.management/java/lang/management/BufferPoolMXBean.html#getName()).
<!-- endsemconv -->


### Metric: `process.runtime.jvm.cpu.monitor.blocked`

This metric is [recommended](../metric-requirement-level.md#recommended).

<!-- semconv metric.process.runtime.jvm.cpu.monitor.blocked(metric_table) -->
| Name | Instrument Type | Unit (UCUM) | Description |
| -------- | --------------- | ----------- | -------------- |
| `process.runtime.jvm.cpu.monitor.blocked` | Histogram | `ms` | Time thread was blocked at a monitor. Only available in JDK 17+. |
<!-- endsemconv -->

<!-- semconv metric.process.runtime.jvm.cpu.monitor.blocked(full) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `thread` | int | Thread emitting the metric. | `1`; `2` | Opt-In |
| `class` | string | Class of the monitor. | `java.lang.Object` | Opt-In |
<!-- endsemconv -->

### Metric: `process.runtime.jvm.cpu.monitor.wait`

This metric is [recommended](../metric-requirement-level.md#recommended).

<!-- semconv metric.process.runtime.jvm.cpu.monitor.wait(metric_table) -->
| Name | Instrument Type | Unit (UCUM) | Description |
| -------- | --------------- | ----------- | -------------- |
| `process.runtime.jvm.cpu.monitor.wait` | Histogram | `ms` | Time thread was waiting at a monitor. Only available in JDK 17+. |
<!-- endsemconv -->

<!-- semconv metric.process.runtime.jvm.cpu.monitor.wait(full) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `thread` | int | Thread emitting the metric. | `1`; `2` | Opt-In |
| `class` | string | Class of the monitor. | `java.lang.Object` | Opt-In |
<!-- endsemconv -->

### Metric: `process.runtime.jvm.cpu.context_swtich`

This metric is [recommended](../metric-requirement-level.md#recommended).

<!-- semconv metric.process.runtime.jvm.cpu.context_switch(metric_table) -->
| Name | Instrument Type | Unit (UCUM) | Description |
| -------- | --------------- | ----------- | -------------- |
| `process.runtime.jvm.cpu.context_switch` | UpDownCounter | `Hz` | Number of context switches per second. Only available in JDK 17+. |
<!-- endsemconv -->



### Metric: `process.runtime.jvm.cpu.allocation`

This metric is [recommended](../metric-requirement-level.md#recommended).

<!-- semconv metric.process.runtime.jvm.memory.allocation(metric_table) -->
| Name | Instrument Type | Unit (UCUM) | Description |
| -------- | --------------- | ----------- | -------------- |
| `process.runtime.jvm.memory.allocation` | Histogram | `By` | Size of object allocated by thread. Only available in JDK 17+. |
<!-- endsemconv -->

### Metric: `process.runtime.jvm.network.io`

This metric is [recommended](../metric-requirement-level.md#recommended).

<!-- semconv metric.process.runtime.jvm.network.io(metric_table) -->
| Name | Instrument Type | Unit (UCUM) | Description |
| -------- | --------------- | ----------- | -------------- |
| `process.runtime.jvm.network.io` | Histogram | `By` | Bytes read/written by thread. Only available in JDK 17+. |
<!-- endsemconv -->

<!-- semconv metric.process.runtime.jvm.network.io(full) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `mode` | string | Read or write. | `read`; `write` | Recommended |
| `thread` | int | Thread emitting the metric. | `1`; `2` | Opt-In |
<!-- endsemconv -->

### Metric: `process.runtime.jvm.network.time`

This metric is [recommended](../metric-requirement-level.md#recommended).

<!-- semconv metric.process.runtime.jvm.network.time(metric_table) -->
| Name | Instrument Type | Unit (UCUM) | Description |
| -------- | --------------- | ----------- | -------------- |
| `process.runtime.jvm.network.time` | Histogram | `ms` | Duration of network IO operation by thread. Only available in JDK 17+. |
<!-- endsemconv -->

<!-- semconv metric.process.runtime.jvm.network.time(full) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `mode` | string | Read or write. | `read`; `write` | Recommended |
| `thread` | int | Thread emitting the metric. | `1`; `2` | Opt-In |
<!-- endsemconv -->