-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #89 Base64 and hexadecimal entropy score override #223
Issue #89 Base64 and hexadecimal entropy score override #223
Conversation
…ore override Updated readme
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm curious -- what's the use case for this enhancement? I've always treated the original (now default) values as more or less magic, although thinking about it I guess I can see that a hex digit 0-f has 16 unique values and represents 4 bits of data. And base64 represents 24 bits of input in 4 encoded output characters, so each character we see represents 6 bits of data. The "magic" part to me is "why do we use thresholds that are 75% of the raw bit rate?" Is this "if the data looks at least 75% random it's a problem?" That's the lead-up to wondering why somebody would want to override these numbers and how they would make intelligent choices and what the implications of those choices would be. It also makes me think (although not strongly enough to withdraw my approval) that these are the wrong knobs for this problem space. The potential entropy of a hex digit string or a base64 character string is just math (see above) -- and changing those properties makes about as much sense as trying to legislate the value of pi. ;-) Perhaps this would be more manageable -- if not more understandable -- to have a "sensitivity" argument with a range of 0-100 and a default of 75 (so people don't have to mess with a decimal point) and then compute This way at least there's only one thing to twiddle, the effects are consistent (at least to a computer scientist or mathematician lol), and we can frame the discussion in terms of entropy generally without getting bogged down in the actual representation, which isn't supposed to matter anyway. |
FYI. I edited this after sending, and I think subscribers only saw the original version. So I'm spamming you all again. Apologies. |
It occurs to me that "sensitivity" is a bad name, since I think higher numbers are less sensitive than lower numbers, which is counterintuitive. My interpretation is that this number (0-100) is the probability that the string in question is totally random (aka encrypted, at a fuzzy level), and therefore higher numbers (above 75) make tartufo more reluctant to report a given string, while lower numbers make tartufo more likely to report that string. BTW, one reason this is nicer than tweaking scores -- it's easier to reason about. The range is 0-100 because it's a probability, and you can't talk about probabilities greater than 1.0 (aka 100%) or less than 0.0 (aka 0%). I still have a curious urge to understand where the original 75% came from, but both trufflehog and the original blog it references seem silent on the subject. |
We have repositories that have secrets that needs to be moved out of GitHub before we migrated it to GHEC. Tartufo could not catch all of the secrets. For example we had a password like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do really like the idea that @rbailey-godaddy presented of working with a percentage, rather than a mysterious number, and I would love to see that enhancement eventually. That said, this should be a very useful change for experimentation and learning more about effective values! Great work on this @namithasind!
To help us get this pull request reviewed and merged quickly, please be sure to include the following items:
PR Type
What kind of change does this PR introduce?
Backward Compatibility
Is this change backward compatible with the most recently released version? Does it introduce changes which might change the user experience in any way? Does it alter the API in any way?
Issue Linking
#89
What's new?
-b64
/--b64-entropy-score
and-hex
/--hex-entropy-score
which can be used to override the hard coded base64 and hexadecimal entropy scores