-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: chalking large zip files #229
Conversation
Removing previous chalk FD caching mechanism which stored file stream on the chalk object which restricted where it can be used. Now most places use the fd_cache mechanism to interact with FDs except a few specific places which need to do something custom such as open FD to TTY which is not cached.
If we open by default with fmReadWriteExisting mode during all codec chalk scans, it leaves all FDs open in write mode which means any other program using flock on the same file will not work. For example we scan docker binary to determine if its chalked and docker seems to use flock and so as chalk would open it in write mode docker would not be executable in a subprocess. By explicitly specifying the file mode it allows the caller to determine how to open the file hence avoiding these type of conflicts, which are especially pronounced when multiple chalks run in parallel on the same machine.
otherwise for example running lots of plugins on non-valid files such as docker image name was not working as expected as there is no corresponding file
otherwise any subscan triggered within chalk command would reset context directories and therefore plugin collection was not working as expected this was impacting zip codec so far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is very well done. The only thing is that 'yield' no longer has a correct meaning, so I think to avoid future confusion, please do change the naming as I mention in other comments!~
src/commands/cmd_docker.nim
Outdated
trace("New docker file: \n" & newcontents) | ||
f.write(newcontents) | ||
f.close() | ||
let (f, path) = getNewTempFile() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have an analogue writeNewTempFile()
that does the stuff you do below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. added in crashappsec/nimutils@c523586
src/docker_git.nim
Outdated
@@ -54,8 +54,10 @@ proc createTempKnownHosts(data: string): string = | |||
if data == "": | |||
return "" | |||
let (f, path) = getNewTempFile() | |||
f.write(data) | |||
f.close() | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, should have a 'writeNewTempFile()`; Plus, it occurs to me we should have a semgrep rule to warn on new adds of getNewTempFile()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @indecisivedragon was just mentioning semgrep to me as well. good idea to add that to chalk CI
src/fd_cache.nim
Outdated
## for all opened file streams. | ||
## | ||
## Some things you can do: | ||
## * yield - create or get existing file stream from cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'yield' generally means "I'm done with this (for now)". Generally, I would take that to mean the system can decide whether it needs to be closed (whereas close generally asserts we need to close). 'Acquire' is the right word to use here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was going between the 2 but yeah the logic for acquire makes more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed in 411b868
(#229)
|
||
# ---------------------------------------------------------------------------- | ||
|
||
proc getOpenLimit(): int = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth limiting ourselves to some percentage of the rlimit (or subtract out a healthy number). Hard to know what 3rd party stuff we'll pull in that can't use the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. we already do. we still honor the cache_fd_limit
config:
Line 301 in caf6e27
limitFDCacheSize(chalkConfig.getCacheFdLimit()) |
and it checks that value is less than 1/2 of the rlimit:
Lines 223 to 234 in caf6e27
let | |
# dont use all FDs in the cache and allow other descriptors | |
# to be opened in external libs/etc | |
fdLimit = getOpenLimit() div 2 | |
fdCache = newFDCache(size = fdLimit) | |
proc limitFDCacheSize*(size: int) = | |
if size > fdLimit: | |
raise newException(OSError, | |
"attempting to set FD cache size limit to " & $size & | |
" which is too large given system limit of " & $fdLimit) | |
fdCache.limitSize(size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also for context default fd limit in the config is 50:
chalk/src/configs/chalk.c42spec
Lines 2524 to 2527 in 938e474
field cache_fd_limit { | |
type: int | |
default: 50 | |
range: (0, high()) |
so we should be ok for most systems. dont imagine well see anything much lower with rlimit of 1024
src/plugins/codecMacOs.nim
Outdated
"Replace the script or rename the executable") | ||
scanFail() | ||
# Drop down below for the chalk mark. | ||
# chalked mac binary is a macho binary wrapped as shell script |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mach-O is not macho 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rofl 😂
|
||
return some(chalk) | ||
let |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someday we should really use an in-memory option here :/ Not now of course.
also using writeNewTempFile from nimtuils for more consise temp file handling in chalk
Issue
none of the codecs were using chalk fd caching machinery and some of them were leaking FDs which was causing issues when trying to chalk a jar file with ~3K files
fixes #225
fixes #230
Description
ensure consistent use of the FD cache mechanism across chalk codebase. previous each
ChalkObj
contained its ownstream: FileStream
attribute which was used for caching that paths FD. that had some issues such as it was impossible to use that mechanism outside ofChalkObj
such as in codecs as they are responsible for createChalkObj
in the first place. to make it all consistent:fd_cache
implementationstream
fromChalkObj
and instead use thefd_cache
from abovefd_cache
Testing
existing tests plus some tests on large zip files such as zap jar file which has ~3K files