Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add process.name attribute and adapt process.executable.name #1737

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

braydonk
Copy link
Contributor

Fixes #1736

Changes

This PR adds a new attribute process.name that uses the description that used to apply to process.executable.name. The process.executable.name attribute's description is adjusted such that the value of the attribute will reliably contain the executable name.

Merge requirement checklist

@braydonk braydonk requested review from a team as code owners January 10, 2025 14:22
@braydonk braydonk requested a review from a team as a code owner January 10, 2025 14:26
Copy link
Member

@christos68k christos68k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

stability: experimental
brief: >
The name of the process. On Linux based systems, can be set
to the `Name` in `proc/[pid]/status`. On Windows, can be set to the
Copy link
Member

@christos68k christos68k Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to the `Name` in `proc/[pid]/status`. On Windows, can be set to the
to the value in `/proc/[pid]/comm` or to the (equivalent)
`Name` in `/proc/[pid]/status`. On Windows, can be set to the

We can also use /proc/[pid]/comm which requires no parsing, unlike extracting Name out of /proc/[pid]/status. The values should be equivalent.

Also see:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about suggesting both since they're the same anyway? For example, in the hostmetricsreceiver we'll already have parsed /proc/[pid]/status for other information, so just getting the name from that is fine in our case, but in other cases people might prefer to just get /proc/[pid]/comm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5a679fd

# Use pipe (|) for multiline entries.
subtext: |
The new `process.name` attribute uses the original guidance for `process.executable.name`,
suggesting use of the `Name` field from `/proc/[pid]/status` on Linux. `process.executable.name`
Copy link
Member

@christos68k christos68k Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
suggesting use of the `Name` field from `/proc/[pid]/status` on Linux. `process.executable.name`
suggesting use of `/proc/[pid]/comm` or the equivalent `Name` field
from `/proc/[pid]/status` on Linux. `process.executable.name`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5a679fd

model/process/registry.yaml Outdated Show resolved Hide resolved
to the `Name` in `proc/[pid]/status`. On Windows, can be set to the
base name of `GetProcessImageFileNameW`.
to the base name of the target of `/proc/[pid]/exe`. On Windows,
can be set to the base name of `GetProcessImageFileNameW`.
examples: ['otelcol']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to have an example that's different between process.name and process.executable.name ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would be the best place for that? As part of this description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it as a note on process.name in cd4c335

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant if we could have realistic example that's different on linux in examples: ['otelcol'], it's minor.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to have an example that's different between process.name and process.executable.name ?

For Linux, one example where the process name isn't changed programmatically is using a symbolic link, for example ln -s foo bar. Running ./bar gives you bar from /proc/<PID>/comm, but the basename foo from readlink /proc/<PID>/exe.

A real example for this is unlz4 which is a link to the lz4 executable (on my Debian system). So as you say in your suggestion, process.name seems to be more relevant/precise here.

Unfortunate, there are other examples where process.name needs to be combined with process.executable.path:

UnicodeNameMappingGenerator-16 -> ../lib/llvm-16/bin/UnicodeNameMappingGenerator
UnicodeNameMappingGenerator-17 -> ../lib/llvm-17/bin/UnicodeNameMappingGenerator

process.name: UnicodeNameMap (in both cases)
process.executable.name: UnicodeNameMappingGenerator (in both cases)

to the `Name` in `proc/[pid]/status`. On Windows, can be set to the
base name of `GetProcessImageFileNameW`.
to the base name of the target of `/proc/[pid]/exe`. On Windows,
can be set to the base name of `GetProcessImageFileNameW`.
Copy link
Contributor

@lmolkova lmolkova Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems process.name and process.executable.name are always the same on windows - is it the case?

In the spirit of T-shaped API, do you think it could be one of these linux-specific things? I.e. process.linux.exe|exectuable.name ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this specification, it is the case that process.name and process.executable.name will be the same on Windows. I dug as far as I could into Win32 and .NET APIs to see if there was ever a way to change a process's name at runtime like in Linux, and I could not find any way to do so. So I think it is fundamentally the case even outside of this specification that process name and executable name will always be the same on Windows.

I think there is a good use-case for maintaining process.executable.name as a cross-platform name, and that's how it's used in CLI. In this case, it seems the way it's used is for the attribute to be the executable name on both Linux and Windows. On Linux this distinction matters, whereas on Windows it doesn't. However, this ensures that on either platform the cli.internal span will always have a correct executable name, and doesn't need to worry about special attributes based on the platform.

Perhaps I could add a note in process.executable.name's description that on Windows it will always be the same, and can be excluded if you're already using process.name?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can generalize this a bit to be OS-agnostic?

  • OTEL is not only about Linux and Windows
  • missing features in Windows or other OSes may appear with the next version/upgrade

So, can we just say that the two fields may have the same value?
That is often true on Linux and always true for current Windows versions and below.

It still makes sense for Windows clients to send both fields, otherwise there should be a written hint/rule like "if process.name isn't set, fallback to process.executable.name" or the other way round.

Copy link
Contributor

@lmolkova lmolkova Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use process.executable.name in CLI conventions because it was the only one that existed.

The question I feel we should address:

If I have some instrumentation that would benefit from having a process name, but does not need a lot of details ("General Class") - which one of those attributes I should use? I'm going to say process.name is the first candidate just because it's shorter and looks very general.

In this sense, I would even suggest to change process.name definition to be the best known name (yes, you can get it using OS APIs, but maybe you have a smart way to generate better process names, or maybe you want to record friendly process name when self-reporting it from within a process).

Let me try to phrase it (will leave a separate comment with suggestion)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| <a id="process-working-directory" href="#process-working-directory">`process.working_directory`</a> | string | The working directory of the process. | `/root` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[1] `process.args_count`:** This field can be useful for querying or performing bucket analysis on how many arguments were provided to start a process. More arguments may be an indication of suspicious activity.

**[2] `process.title`:** In many Unix-like systems, process title (proctitle), is the string that represents the name or command line of a running process, displayed by system monitoring tools like ps, top, and htop.
**[2] `process.name`:** The value of this attribute will be equivalent to `process.executable.name` on Windows, but may not be on Linux. On Linux, the process name from `/proc/[pid]/comm` is truncated if its name is longer than `TASK_COMM_LEN`-1, and it can be manually changed by the process itself via [`prctl(2)`](https://man7.org/linux/man-pages/man2/prctl.2.html). On Windows, it won't be necessary to have both `process.name` and `process.executable.name`, but it may be on Linux depending on your use case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should a backend operate if one of the fields is omitted? Take the value from the other field? What if one of the names isn't available for some reason?

What if on Linux both fields have the same value (a common case), is it OK to just send one of the two fields?

Also, is Windows the only OS where both fields are always the same? Can we have a completely list? If not, should we better not mention "Windows" here as the only affected OS?

Even if we say something like "In cases where process.name and process.executable.name are identical, only one of the fields is required.", there is room for interpretation. Why not drop the last sentence, which may lead to confusion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Linux, when using memfd, there can be processes created with no executable file.

So I don't think the backend/SDKs should be copying the fields when missing or omitting an existent field, since it could cause confusion as to whether process.executable.name really existed or not.

On the other hand, I agree the fields would commonly be duplicated, so it would be nice have a way to only send one and reduce data volumes. Especially since some workloads can create very large number of processes, so the data volume involving process telemetry can be very large.

Comment on lines +17 to +28
brief: >
The name of the process. On Linux based systems, this SHOULD be set to
the value of `/proc/[pid]/comm` or to the `Name` field in `proc/[pid]/status`
(these values are equivalent). On Windows, this SHOULD be set to the
base name of `GetProcessImageFileNameW`.
note: >
The value of this attribute will be equivalent to `process.executable.name`
on Windows, but may not be on Linux. On Linux, the process name from `/proc/[pid]/comm`
is truncated if its name is longer than `TASK_COMM_LEN`-1, and it can be manually
changed by the process itself via [`prctl(2)`](https://man7.org/linux/man-pages/man2/prctl.2.html).
On Windows, it won't be necessary to have both `process.name` and `process.executable.name`, but
it may be on Linux depending on your use case.
Copy link
Contributor

@lmolkova lmolkova Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a suggestion for #1737 (comment)

Suggested change
brief: >
The name of the process. On Linux based systems, this SHOULD be set to
the value of `/proc/[pid]/comm` or to the `Name` field in `proc/[pid]/status`
(these values are equivalent). On Windows, this SHOULD be set to the
base name of `GetProcessImageFileNameW`.
note: >
The value of this attribute will be equivalent to `process.executable.name`
on Windows, but may not be on Linux. On Linux, the process name from `/proc/[pid]/comm`
is truncated if its name is longer than `TASK_COMM_LEN`-1, and it can be manually
changed by the process itself via [`prctl(2)`](https://man7.org/linux/man-pages/man2/prctl.2.html).
On Windows, it won't be necessary to have both `process.name` and `process.executable.name`, but
it may be on Linux depending on your use case.
brief: >
The name of the process.
note: >
The attribute represents the best-known friendly process name. When there is
no additional context about the process, the SHOULD be obtained from OS-specific API.
On Linux based systems, this SHOULD be set to
the value of `/proc/[pid]/comm` or to the `Name` field in `proc/[pid]/status`
(these values are equivalent). On Windows, this SHOULD be set to the
base name of `GetProcessImageFileNameW`.
On Linux, the process name from `/proc/[pid]/comm`
is truncated if its name is longer than `TASK_COMM_LEN`-1, and it can be manually
changed by the process itself via [`prctl(2)`](https://man7.org/linux/man-pages/man2/prctl.2.html).
The value of the `process.name` frequently matches the value of the
`process.executable.name` attribute. Semantic conventions and
instrumentation authors that want to capture a general process name
SHOULD use `process.name` attribute and MAY also use `process.executable.name`
when additional details are important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid calling anything the "general process name", or that it belongs in process.name. There's at least three three different things that might be considered the process name.

  1. process.name, the name of the actual process/ the runnable process struct in the kernel.
  2. process.executable.name, the name of the executable file that was used to create the process.
  3. process.title, the value from proctitle, or the human readable name/title.

It's better to ensure the correct field is used for the correct data, and not suggest it's optional which information goes into the different fields. I think part of the confusion before this change is because the differences between these weren't clear.

Which one is the most important might also depend on the use case. Sometimes the executable file name might be considered more important than the process name.

@arminru arminru changed the title add process.name attribute add process.name attribute and adapt process.executable.name Jan 14, 2025
@braydonk
Copy link
Contributor Author

Seeing lots of good points in the discussions but having trouble figuring out where to take this. Here's my attempt to synthesize the current state:

  • It's unclear when each attribute should be used, and how consumers should handle the common case where process.executable.name and process.name will be the same
  • The notes are specific only to Linux and Windows right now
    • This is partially because I was adapting the original notes that only mentioned the two operating systems, but I am myself biased because they are the only two operating systems I know anything about
  • Descriptions of the attributes should provide examples of how process.executable.name and process.name could differ
    • There was a suggestion to make use of the examples field for this but I'm not sure how to codify it
  • process.title is another attribute that can be considered in this arena of the process's name
  • The idea of the user's interpretation of the best name for the process is a good candidate for what we consider the "General Class" of instrumentation within our group, however the names exist as they are today because the different ways to "name" the process (name, executable name, title) need to be instrumented in specific ways to avoid user confusion (as described in the notes, albeit not clear enough based on feedback)

For the most part I agree with every point of feedback I've seen, but unfortunately that means I agree with some that are that conflict with each other. process.name feels like "General Class" instrumentation and yet its definition is inextricably linked with the exact concept of the "process name" from the OS. The descriptions do only mention Linux and Windows and should find ways to be more general to operating systems, and yet if we aren't somewhat prescriptive on instrumentation in the descriptions then the relationship between these attributes could become muddy if different instrumentation makes different decisions about what values to use.

So I hate to say it, but I'm kinda stuck. I don't have a good idea where to take these attributes from here. I think the only thing I'm confident on is that process.title should be represented somehow (I am conflicted on exact naming because proctitle is Linux-exclusive iirc and the equivalent on Windows would I guess be MainWindowTitle but idk about other OS's).

I'm open to suggestions on what to do with this one.

@github-actions github-actions bot added the enhancement New feature or request label Jan 23, 2025
@christos68k
Copy link
Member

christos68k commented Jan 23, 2025

  • It's unclear when each attribute should be used, and how consumers should handle the common case where process.executable.name and process.name will be the same

Regarding avoiding duplication, we could specify that if process.name is not present, the consumer may retrieve it from process.executable.name. This should work for all cases presented in this issue, including memfd on Linux. The trade-off is some extra complexity and stateful processing on the consumer side. Alternatively, we could opt for simplicity, allow for the same value to be present in both attributes and rely on protocol-level deduplication mechanisms such as a string table which we use in OTLP profiling (outside of which I don't have any context as to the extent of the issue).

@lmolkova
Copy link
Contributor

lmolkova commented Jan 24, 2025

I don't have a strong opinion, a few things we could consider to get unstuck:

  • not adding process.name for now. Sticking with process.executable.name, just changing its description to fix process.executable.name shouldn't use /proc/[pid]/status #1736
  • going forward with process.name, but leaving its description open and soft so it can evolve after attribute is stabilized.
  • going forward with process.name in whatever form, but excluding it from initial stability scope.

For the most part I agree with every point of feedback I've seen, but unfortunately that means I agree with some that are that conflict with each other. process.name feels like "General Class" instrumentation and yet its definition is inextricably linked with the exact concept of the "process name" from the OS. The descriptions do only mention Linux and Windows and should find ways to be more general to operating systems, and yet if we aren't somewhat prescriptive on instrumentation in the descriptions then the relationship between these attributes could become muddy if different instrumentation makes different decisions about what values to use.

If we go down this road ("process.name" is the best known process name), we'd be:

  • prescriptive on how to populate it if instrumentation has no additional context
  • say that instrumentations that know better than that MAY populate it in different way. process.executable.name would be the precise one.
  • expect app developers to set whatever value they feel right.

Either way, prototyping and final stabilization push are usually good time to clean up descriptions and also if we don't believe that some of it is essential for stability, let's just not add it or let's keep it experimental.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

process.executable.name shouldn't use /proc/[pid]/status
5 participants