Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(techStackGeneric): improve getProcNames #250

Merged
merged 18 commits into from
Mar 21, 2024

Conversation

ee7
Copy link
Contributor

@ee7 ee7 commented Mar 20, 2024

Description

The getProcNames procedure was doing a couple of strange things:

  • It was trying to read files at /proc/foo/status, where foo is the name of a file, not directory, in /proc.
  • It scanned each file at /proc/[pid]/status more than once (instead, once per digit of the pid).

This wasn't producing any incorrect results because:

  • It caught and ignored the exceptions from opening nonexistent files.
  • The name values are added to a HashSet[string], which deduplicates them.

It was also doing some further unnecessary work: it kept scanning lines of /proc/[pid]/status even after it found the Name value (which is specified to be on the first line).

This PR fixes those issues.

Note that /proc/[pid]/comm also exists in Linux since 2.6.33 (2010-02-24), but that's less portable.

Refs: #249 (to a tiny degree)

Testing

Add to plugins/techStackDetection.nim

when isMainModule:
  echo getProcNames()

and run nim r plugins/techStackDetection.nim

It'll be easier to add tests for this kind of thing when we're running unit tests in CI.

ee7 added 15 commits March 20, 2024 16:15
Each status file was previously scanned N times, where N is the length
of its parent directory name.
Clarify that p_path isn't used later.
I find this more readable, and it avoids allocating a string for `head`.

Examples for lastPathPart():

    doAssert lastPathPart("foo/bar") == "bar"
    doAssert lastPathPart("foo/bar/") == "bar"

Examples for splitPath():

    doAssert splitPath("foo/bar") == (head: "foo", tail: "bar")
    doAssert splitPath("foo/bar/") == (head: "foo/bar", tail: "")
Improve readability (well, at least for me), and allocate less.
Previously, every line of each status file was scanned even after
finding the Name value, which should be on the first line.
@ee7 ee7 requested a review from miki725 March 20, 2024 15:31
@ee7 ee7 requested a review from viega as a code owner March 20, 2024 15:31
nettrino
nettrino previously approved these changes Mar 20, 2024
result.incl(name)
except:
continue
for line in data.split("\n"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not splitLines stdlib function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also read the file one line at a time if you use the file stream from fd_cache so that we don't need to read the full file and then split by lines

Copy link
Contributor Author

@ee7 ee7 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not splitLines stdlib function?

Just to reduce the diff size.

I would personally have written splitLines though. Done in 2dbbf77.

You can also read the file one line at a time if you use the file stream from fd_cache so that we don't need to read the full file and then split by lines

Generally if we wanted to optimize for performance, we could use mmap or just read a single line (taking advantage of the fact that the Name value is specified to be on the first line). Both avoid the stream overhead, and the readability cost of the withFileStream template that injects a variable.

But in this case, the data at /proc/[pid]/status is in memory, not on disk. And it's not a significant speedup to read a line from memory instead of the full data (which is about 1 KiB here). The file reads here are a completely negligible contributor to the execution time of chalk insert. Let's stick with tryToLoadFile for now?

Comment on lines 182 to 184
for kind, path in walkDir(directory):
if inFileScope[category][subcategory]:
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following logic here. Why break out of the inner loop on a variable which is not controlled by the loop? Can we bypass loop if the condition doesn't match?

Copy link
Contributor Author

@ee7 ee7 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR doesn't touch the behavior here, and I'll have future PRs that continue to refactor this file. There's definitely some more things in this file that are unexpected and hard to read.

Can we leave this till later? This PR is aimed at getProcNames, and it's already a stretch to include the current walkDir refactoring in this PR unless I title the squashed commit something like "improve walkDir usage", which makes the changes sound stylistic only.

Edit: moved the walkDir changes to #252 to keep this PR focused on one thing.

if filePath.kind == pcDir:
scanDirectory(filePath.path, category, subcategory)
continue
if kind == pcFile:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the nim conventions are but maybe use switch here as it switched on the kind and there are no other conditions in the branches?

Copy link
Contributor Author

@ee7 ee7 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the existing

if kind == pcFile:
  foo
elif kind == pcDir:
  bar

mainly to minimize the diff size. Whether that's better than the case version:

case kind
of pcFile:
  foo
of pcDir:
  bar
of pcLinkToFile, pcLinkToDir:
  discard

depends mostly on personal taste: does the reader prefer to minimize the number of lines, or to see all the enum members?

Recall that in Nim, it is a compile-time error if a case statement is not exhaustative.

Some factors that make it more likely for me personally to use case:

  • I'm doing something with most of the enum members
  • The case-based version is not significantly longer in relative terms
  • The enum is not defined by the Nim stdlib
  • There is no other condition on the branch (as you mention), and there is not likely to be in the future
  • There's little readability benefit from trying to express the likelihood of branches

For performance-critical code (not here) there is some subtlety: we can use the linearScanEnd pragma to express which branches are most likely. The ordering of an if statements branches already suggest that, although Nim also provides likely and unlikely.

Let's just leave as-is for now? In the long term, I might propose a helper iterator in nimutils for walking with filtering.

@ee7 ee7 mentioned this pull request Mar 21, 2024
11 tasks
@ee7 ee7 merged commit 7cf3029 into main Mar 21, 2024
2 checks passed
@ee7 ee7 deleted the ee7/refactor-techStackGeneric branch March 21, 2024 17:28
@ee7
Copy link
Contributor Author

ee7 commented Mar 21, 2024

CI passed on the PR, but fails on the commit merged to main. I believe the failure is unrelated to the change:


main.go:62: error during command execution: unknown flag: --output-key-prefix

subprocess.CalledProcessError: Command '['cosign', 'generate-key-pair', '--output-key-prefix', 'chalk']' returned non-zero exit status 1.

[...]

Error: unknown flag: --tlog-upload

[...]

=========================== short test summary info >============================
FAILED test_command.py::test_setup_existing_keys[copy_files0-config1-False]
FAILED test_command.py::test_setup[config0-copy_files0] - subprocess.CalledPr...
FAILED test_command.py::test_setup[config1-copy_files0] - subprocess.CalledPr...
FAILED test_command.py::test_setup[config2-copy_files0] - subprocess.CalledPr...
FAILED test_command.py::test_setup_existing_keys[copy_files0-config0-True] - ...
======= 5 failed, 136 passed, 14 skipped, 1 warning in 273.03s (0:04:33) =======

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants