-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gather FQDN IPs using DNS lookups from previous chunks #676
Conversation
…hecking which hostnames are associated with the external IPs we saw in the current import run. Move fqdnInput out of hostnames package and into beaconfqdn package. Add ./.vscode to gitignore (vscode dlv debugger creates launch profiles here)
@@ -4,6 +4,7 @@ | |||
*.swp | |||
*.swo | |||
*.exe | |||
/.vscode/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used dlv
while developing this PR. VSCode stores dlv
configurations in .vscode
. VSCode is free to use for both open source and commercial development.
@@ -12,14 +12,14 @@ require ( | |||
github.com/activecm/mgosec v0.1.1 | |||
github.com/activecm/rita-bl v0.0.0-20200806232046-0db4a39fcf49 | |||
github.com/blang/semver v3.5.1+incompatible | |||
github.com/briandowns/spinner v1.16.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While running the query which gathers the hostnames associated with the newly parsed in host IPs, we display a busy spinner.
parser/fsimporter.go
Outdated
@@ -199,7 +199,7 @@ func (fs *FSImporter) Run(indexedFiles []*fpt.IndexedFile) { | |||
fs.buildBeacons(uconnMap) | |||
|
|||
// build or update the FQDN Beacons Table | |||
fs.buildFQDNBeacons(hostnameMap) | |||
fs.buildFQDNBeacons(hostMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FQDNBeacons now takes in the list of hosts in the current parse set rather than the list of DNS queries.
conf *config.Config // contains details needed to access MongoDB | ||
analyzedCallback func(*update) // called on each analyzed result | ||
closedCallback func() // called when .close() is called and no more calls to analyzedCallback will be made | ||
analysisChannel chan *fqdnInput // holds unanalyzed data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fqdnInput
has been moved out of the hostname
package and into the beaconfqdn
package since it is never used in the hostname
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor comments/discussion ideas and a small bug on the first pass. Tested on gcat set and it seems to produce the same FQDN results as before, which is good!
To further test, I broke the gcat set out into two folders. In the first folder, I removed all of the conn logs. This meant that the first import would get all of the DNS entries but wouldn't get any of the conn info. We will call the imported dataset "gcat-test". The second folder had just the conn logs. I imported the conn logs into the gcat-test set that was created with the first import. This meant that the second import would have to rely solely on the DNS entries from the first import for beacon FQDN analysis. After the first import, there were no beaconsFQDN results. After the second import, the number of beaconsFQDN results matched the number we had previously for the gcat sample-data set. I think this is good evidence that it's working.
Might want to do some quick tests on the amount of time that this adds on to the beacons FQDN processing. I know we've talked about it and have each looked at it, but at some point we will probably want to give everything in beacons FQDN yet another look over for any more tricks we can do to speed everything up.
pkg/beaconfqdn/mongodb.go
Outdated
// roughly less than 200,000 external hosts. | ||
func (r *repo) affectedHostnameIPsSimple(hostMap map[string]*host.Input) ([]hostnameIPs, error) { | ||
// preallocate externalHosts slice assuming at least half of the observed hosts are external | ||
var externalHosts []data.UniqueIP = make([]data.UniqueIP, 0, len(hostMap)/2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a major disadvantage to just pre-allocating all of the space other than the additional space requirements?
Looking at the gcat dataset and using dlv, I see that len(hostMap) == 5211. The length of externalHosts == 5189 after going through the isLocal check.
On a large data set, I see len(hostMap) == 15121 and len(externalHosts) == 8490. That is a bit closer to the less than half mark, but will still result in externalHosts needed to grow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only disadvantage is RAM usage as you said.
I went with /2
because I took a guess at how slice allocation works and assumed that it doubled the size of the backing array when reallocation happens. It looks like that Go does actually double the size of the underlying slice when you append
more items than the slice has capacity for: https://github.com/golang/go/blob/0a7f00ae239570d664488085f9c395919bc69066/src/runtime/slice.go#L164.
So, with /2
, we guarantee that we will only need to re-allocate once in the worst case. I could add that in as a comment if you'd like.
I'm happy to change it to len(hostMap)
however if you feel that best suits our needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nope, if there is just one re-allocation then I think it's worth it to go minimal. I was just concerned if it was going to be re-allocating with every new entry. This way is great though if it just does the one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chill, I'll add the explanation above as a comment.
@@ -117,9 +122,31 @@ func (r *repo) Upsert(hostnameMap map[string]*hostname.Input, minTimestamp, maxT | |||
writerWorker.start() | |||
} | |||
|
|||
s := spinner.New(spinner.CharSets[36], 200*time.Millisecond) | |||
s.Prefix = "\t[-] Gathering FQDNs for Beacon Analysis ...\t" | |||
s.Start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot reproduce this issue in my terminal. In my terminal, the second spinner overwrites the first one.
I'll try adding a println
call after s.Stop()
, but I'll need you to let me know whether it helped or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be an ESXi thing. I've had other weird terminal things happen too that nobody else experienced.
parser/fsimporter.go
Outdated
@@ -575,7 +575,7 @@ func (fs *FSImporter) buildFQDNBeacons(hostnameMap map[string]*hostname.Input, m | |||
} | |||
|
|||
// send uconns to beacon analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this comment is from before, but it needs updated to read "hostMap" instead of uconns
pkg/beaconfqdn/mongodb.go
Outdated
func (r *repo) affectedHostnameIPsChunked(hostMap map[string]*host.Input) ([]hostnameIPs, error) { | ||
// preallocate externalHosts slice assuming at least half of the observed hosts are external | ||
// util.Min ensures that we don't preallocate more than the maximum allowed number of hosts in a 16MB BSON document | ||
var externalHosts []data.UniqueIP = make([]data.UniqueIP, 0, util.Min(200000, len(hostMap)/2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here as above
pkg/beaconfqdn/mongodb.go
Outdated
UniqueIPs. Additionally, all of the IPs associated with each hostname are returned. | ||
*/ | ||
func reverseDNSQueryWithIPs(uniqueIPs []data.UniqueIP) []bson.M { | ||
var uniqueIPBsonSelectors []bson.M = make([]bson.M, 0, len(uniqueIPs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal and will just comment in this one place, but I don't think the "var" and data type aren't needed on the left with the "make" on the right.
pkg/beaconfqdn/mongodb.go
Outdated
// and consuming more RAM | ||
|
||
// affectedHostnameIPMap maps hostnames to their respective ResolvedIPs | ||
var affectedHostnameIPMap map[string][]data.UniqueIP = make(map[string][]data.UniqueIP, len(hostMap)/10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe quick comment here explaining the /10 factor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that there would be roughly 10 IPs per observed hostname in the dataset. I'm not sure that's a good assumption to be honest. I didn't want to pre-allocate with len(hostMap)
since that would obviously be too big. But, I also didn't want to start from 0. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 10 is reasonable. Maybe just add a comment mentioning that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hang on though, this wouldn't be 10 IPs per host. This would be len(hostMap)/10 IPs per host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len(hostMap)/10 per hostname* or rather #IPs/10 : #FQDNs.
So, this assumes that for every 10 IPs in the parsed in set of logs, there would be 1 FQDN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, was reading the make statement wrong!
I've addressed the changes by adding comments, using short variable assignments, and adding a call to
In my testing, on small datasets, the additional query performed here cancelled out the performance gains in the FQDN Beaconing analysis made by PR #668. On large datasets, the time savings made in #668 outweighed the additional processing time required by this PR. However, FQDN beaconing still seems rather slow on large datasets. I believe that addressing issue #686 may help speed up the process as well. In general, I agree that in the future, time will need to be devoted towards speeding up the FQDN analysis specifically. |
Approving. The progress bar could very well be an issue just on my system. Up to you if you want to fix it. Otherwise, merge when you are ready! |
Fixes #669 by having the FQDN beaconing module extract all of the external hosts we saw in the parse results from the
hostMap
and query thehostnames
collection to obtain the list of FQDNs we need to update along with all of the associated IPs.Note: old FQDN -> IP mappings are removed from the
hostnames
collection by the remover package in the functionpkg/remover/mongodb.go:removeOutdatedCIDs(cid int) error