-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(techStackGeneric): improve getProcNames #250
Conversation
Each status file was previously scanned N times, where N is the length of its parent directory name.
Use the more natural construct.
Clarify that p_path isn't used later.
I find this more readable, and it avoids allocating a string for `head`. Examples for lastPathPart(): doAssert lastPathPart("foo/bar") == "bar" doAssert lastPathPart("foo/bar/") == "bar" Examples for splitPath(): doAssert splitPath("foo/bar") == (head: "foo", tail: "bar") doAssert splitPath("foo/bar/") == (head: "foo/bar", tail: "")
Improve readability (well, at least for me), and allocate less.
Previously, every line of each status file was scanned even after finding the Name value, which should be on the first line.
Improve readability.
src/plugins/techStackGeneric.nim
Outdated
result.incl(name) | ||
except: | ||
continue | ||
for line in data.split("\n"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not splitLines stdlib function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also read the file one line at a time if you use the file stream from fd_cache so that we don't need to read the full file and then split by lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not splitLines stdlib function?
Just to reduce the diff size.
I would personally have written splitLines
though. Done in 2dbbf77.
You can also read the file one line at a time if you use the file stream from fd_cache so that we don't need to read the full file and then split by lines
Generally if we wanted to optimize for performance, we could use mmap or just read a single line (taking advantage of the fact that the Name value is specified to be on the first line). Both avoid the stream overhead, and the readability cost of the withFileStream
template that injects a variable.
But in this case, the data at /proc/[pid]/status
is in memory, not on disk. And it's not a significant speedup to read a line from memory instead of the full data (which is about 1 KiB here). The file reads here are a completely negligible contributor to the execution time of chalk insert
. Let's stick with tryToLoadFile
for now?
src/plugins/techStackGeneric.nim
Outdated
for kind, path in walkDir(directory): | ||
if inFileScope[category][subcategory]: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following logic here. Why break out of the inner loop on a variable which is not controlled by the loop? Can we bypass loop if the condition doesn't match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't touch the behavior here, and I'll have future PRs that continue to refactor this file. There's definitely some more things in this file that are unexpected and hard to read.
Can we leave this till later? This PR is aimed at getProcNames
, and it's already a stretch to include the current walkDir
refactoring in this PR unless I title the squashed commit something like "improve walkDir usage", which makes the changes sound stylistic only.
Edit: moved the walkDir changes to #252 to keep this PR focused on one thing.
src/plugins/techStackGeneric.nim
Outdated
if filePath.kind == pcDir: | ||
scanDirectory(filePath.path, category, subcategory) | ||
continue | ||
if kind == pcFile: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the nim conventions are but maybe use switch here as it switched on the kind and there are no other conditions in the branches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the existing
if kind == pcFile:
foo
elif kind == pcDir:
bar
mainly to minimize the diff size. Whether that's better than the case
version:
case kind
of pcFile:
foo
of pcDir:
bar
of pcLinkToFile, pcLinkToDir:
discard
depends mostly on personal taste: does the reader prefer to minimize the number of lines, or to see all the enum members?
Recall that in Nim, it is a compile-time error if a case statement is not exhaustative.
Some factors that make it more likely for me personally to use case
:
- I'm doing something with most of the enum members
- The
case
-based version is not significantly longer in relative terms - The enum is not defined by the Nim stdlib
- There is no other condition on the branch (as you mention), and there is not likely to be in the future
- There's little readability benefit from trying to express the likelihood of branches
For performance-critical code (not here) there is some subtlety: we can use the linearScanEnd
pragma to express which branches are most likely. The ordering of an if
statements branches already suggest that, although Nim also provides likely
and unlikely
.
Let's just leave as-is for now? In the long term, I might propose a helper iterator in nimutils for walking with filtering.
Let's handle this in a separate PR.
CI passed on the PR, but fails on the commit merged to main. I believe the failure is unrelated to the change:
|
Description
The
getProcNames
procedure was doing a couple of strange things:/proc/foo/status
, wherefoo
is the name of a file, not directory, in/proc
./proc/[pid]/status
more than once (instead, once per digit of the pid).This wasn't producing any incorrect results because:
HashSet[string]
, which deduplicates them.It was also doing some further unnecessary work: it kept scanning lines of
/proc/[pid]/status
even after it found theName
value (which is specified to be on the first line).This PR fixes those issues.
Note that
/proc/[pid]/comm
also exists in Linux since 2.6.33 (2010-02-24), but that's less portable.Refs: #249 (to a tiny degree)
Testing
Add to
plugins/techStackDetection.nim
and run
nim r plugins/techStackDetection.nim
It'll be easier to add tests for this kind of thing when we're running unit tests in CI.