Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

Refactor of codebase #66

Merged
merged 87 commits into from
Jan 6, 2022
Merged

Refactor of codebase #66

merged 87 commits into from
Jan 6, 2022

Conversation

perploug
Copy link
Contributor

This is a major refactor of the codebase to follow better practices, enable unit testing and using the evidence collector as an API rather than a standalone application.

  • backwards compatible, no breaking changes to the current application
  • abstracted all functions into 3 parts: collection, inspection and reporting
  • everything now depends on injectable input which enables testing
  • codebase is split into an application and a library
  • Fixed issues with shared resources (such as the logger) which prohibited running the collector multiple times in the same thread.

This version is currently used in a production setup and should provide the same results as the current code base
Unit testing is only partially done, but can be improved over time to increase coverage

- Wrapped lib in a cli so you can still run it in that odd way

Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
so we avoid storing on disk, but can go straight to memory

Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
Passing in logger instance everywhere...

Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
move all collection functions into a seperate component to ease testing

Signed-off-by: Per Ploug <[email protected]>
easier testing and sharing of functions

Signed-off-by: Per Ploug <[email protected]>
this is work in progress, but does the basics

Signed-off-by: Per Ploug <[email protected]>
this makes it reusable and seperates resources use between runs and avoid thread spawning which is messy and slow

Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
rriemann and others added 9 commits October 29, 2021 16:37
Signed-off-by: Per Ploug <[email protected]>
- 3rd party cookies and frames load randomly
- browser did not close correctly, leading to memory leaks

Signed-off-by: Per Ploug <[email protected]>
+ closing session correctly

Signed-off-by: Per Ploug <[email protected]>
Signed-off-by: Per Ploug <[email protected]>
@perploug
Copy link
Contributor Author

perploug commented Nov 1, 2021

@rriemann-eu - can you check if the behaivior is as expected - also, with the change to the flags, is the quite flag really needed?

@@ -155,6 +155,8 @@ async function createBrowserSession(browser_args, browser_logger) {
waitUntil: "networkidle2",
});

await page.waitForNetworkIdle();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this is necessary on top of waitUntil option employed before.
https://pptr.dev/#?product=Puppeteer&version=v2.1.1&show=api-pagegotourl-options

Also, I did not find the method waitForNetworkIdle() in this docs at https://pptr.dev .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm actually not entirely sure this is necessary, but it was given us more stable test results, we saw inconsistent number of 3rd party cookies loaded when using youtube videos via an iframe for instance, adding this, gave some more stability.

I believe the problem is that puppeteer does not always know when all frame elements are loaded and closes the connection before the last cookies might have been set.

@ghost
Copy link

ghost commented Nov 15, 2021

@rriemann-eu

Yeah, I can set the json, yaml, screenshot to be on by default, using the argv method - for turning off STDOUT - could this be controlled by the quite flag?

I have already moved the log output to STDERR so that people can distinguish output from --yaml or alike from the logs.

In order to quickly merge to master, we can go forward without --quiet and if there is feedback to bring it back, I take care of it later.

I wait for some feedback regarding possible breaks with existing tooling and afterwards merge to master. I would then give a bit time for testing and release a new version few weeks later. :)

let uri_ins_https = new url.URL(uri);
uri_ins_https.protocol = "https:";

let testsslExecutable = args.testsslExecutable || "testssl.sh"; // set default location
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression this default value is never used, as an untrue value is already tested in line 10.

So with --testssl only, the entire function is skipped as far as I see. I'll try to look into this during the weekend or early next week. I hope we can merge to master afterwards. :)

@rriemann
Copy link
Contributor

rriemann commented Jan 5, 2022

Dear @perploug ,

I started fixing the failing tests from npm test. It seems there is a problem with the deletion of sessions.

FAIL  tests/collector.test.js (40.748 s)
  ● collector is correctly instantiated

    TypeError: Cannot read properties of null (reading 'end')

      130 |
      131 |   c.endSession = async function () {
    > 132 |     await c.browserSession.end();
          |                            ^
      133 |     c.browserSession = null;
      134 |     c.output.end_time = new Date();
      135 |   };

      at Object.endSession (collector/index.js:132:28)
      at Object.<anonymous> (tests/collector.test.js:38:11)

  ● collector browser session is correctly discarded

    expect(received).toEqual(expected) // deep equality

    Expected: null
    Received: {"browseSamples": [Function browseSamples], "browser": {"_closeCallback": [Function bound close], "_connection": [Connection], "_contexts": [Map], "_defaultContext": [BrowserContext], "_defaultViewport": [Object], "_ignoreHTTPSErrors": false, "_process": [ChildProcess], "_targetFilterCallback": [Function anonymous], "_targets": [Map], "emitter": [Object], "eventsMap": [Map]}, "gotoPage": [Function gotoPage], "har": {"captureMimeTypes": [Array], "client": [CDPSession], "inProgress": false, "mainFrame": [Frame], "network_events": [Array], "page": [Page], "page_events": [Array], "path": "output/requests.har", "response_body_promises": [Array], "saveResponse": false}, "hosts": {"beacons": [Object], "cookies": [Object], "links": [Object], "localStorage": [Object], "requests": [Object]}, "page": {"_accessibility": [Accessibility], "_client": [CDPSession], "_closed": true, "_coverage": [Coverage], "_emulationManager": [EmulationManager], "_fileChooserInterceptors": [Set], "_frameManager": [FrameManager], "_javascriptEnabled": true, "_keyboard": [Keyboard], "_mouse": [Mouse], "_pageBindings": [Map], "_screenshotTaskQueue": [ScreenshotTaskQueue], "_target": [Target], "_timeoutSettings": [TimeoutSettings], "_touchscreen": [Touchscreen], "_tracing": [Tracing], "_userDragInterceptionEnabled": false, "_viewport": [Object], "_workers": [Map], "emitter": [Object], "eventsMap": [Map]}, "refs_regexp": /^(github\.com)\b/i, "screenshot": [Function screenshot], "webSocketLog": {}}

      60 |
      61 |   expect(c.browserSession).toEqual(null);
    > 62 |   expect(c.pageSession).toEqual(null);
         |                         ^
      63 | });
      64 |
      65 | test("collector can test https + ssl connection", async () => {

      at Object.<anonymous> (tests/collector.test.js:62:25)

Could you please have a look? 🙂

- makes discarding of browser session depending on a session being defined
-adds a seperate test to check for page session
- page session should not be discarded as it is output expected for later user - tests updated to reflect this

- made end 2 end tests more robust and not depending on a changing externally dependant number
@perploug
Copy link
Contributor Author

perploug commented Jan 5, 2022

@rriemann tests should pass now

@rriemann rriemann merged commit 66da039 into EU-EDPS:master Jan 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants