Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add commands to collect and retrieve response bodies #877

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

juliandescottes
Copy link
Contributor

@juliandescottes juliandescottes commented Feb 13, 2025

Overview of what this PR aims to add:

  • Concept of network body collector A network body collector is a concept similar to intercepts and events subscriptions. Clients can add/remove collectors. In theory this should be used for both requests and responses but is only applied to responses in this PR. A network body collector is a struct with contexts or userContexts, and urlPatterns. All are optional so you can potentially define a collector which will match everything (to be discussed)

New BiDi session items:

  • BiDi session has a network body collector map, similar to the intercept map. Simply stores the active body collectors
  • BiDi session has a network maximum body size, js-uint to define the maximum size of collected bodies.
  • BiDi session has a network response map, which contains all the collected bodies, keyed by request id. This map is stored at session level because different sessions might have different configurations about what kind of network bodies can be collected (eg max size).

New commands:

  • new command addBodyCollector to add a new network body collector
  • new command removeBodyCollector to remove an existing network body collector
  • new command setNetworkBodyCollectorConfiguration, which can be used to set session's network maximum body size. In the future we might have more configuration available here, this is why this is setting a generic configuration.
  • also getResponseBody, but is mostly identical to the one in Add a command to get response body #856 . It defaults to base64 at the moment, we probably want to make it easier to receive the body as string if possible? (but I prefered to leave this command as close to the existing PR as possible)

New error:

  • new error no such body collector, for removeBodyCollector

Updates to existing commands

  • When a response is caught in network.responseCompleted, we attempt to collect the body if it is related to a navigable
  • On navigation committed we remove the bodies of all responses linked to this navigable
  • On context destroyed we also remove the bodies of all responses linked to this navigable

Note that I haven't added extra limitations to which responses are collected in responseCompleted, but we can definitely add them (eg no worker requests etc...)


Preview | Diff

@juliandescottes
Copy link
Contributor Author

@OrKoN @jgraham I was not sure how I could.(or if I could?) update PR #856 , so I just created a new one here.
Please take a look at the summary before looking at the patch, you might already have comments on the overview before diving into the details :)

@OrKoN
Copy link
Contributor

OrKoN commented Feb 13, 2025

Thanks for the PR. I think we do not have clear requirements that any clients need the functionality provided by addBodyCollector so we could exclude it for now (unless someone needs it?). At least I would not add browsing contexts params in the same way as we have it in event subscriptions (when context id resolves to the top-level traversable). I think we need an ability to define the overall size limit instead of (in addition?) a per-request limit in setBodyCollectorConfiguration (instead of just not saving the freshest request we should probably evict earlier requests).

@OrKoN
Copy link
Contributor

OrKoN commented Feb 13, 2025

Note that I haven't added extra limitations to which responses are collected in responseCompleted

I am thinking if in my initial draft I should have started collection in responseStarted (I think that would actually be required for interception use cases?)

@@ -5264,6 +5275,9 @@ given |navigable| and |navigation status|:

1. [=Resume=] with "<code>navigation committed</code>", |navigation id|, and |navigation status|.

1. For each |session| in [=active BiDi sessions=], [=delete collected response bodies=]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by this point I believe the navigation request that loaded the document has already happened and we want to retain it. If we really want to follow the CDP model we should key the network data by document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the response already completed by that time? In any case, adding a reference to the document sounds fine to me I almost wanted to include it in the initial design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the headers are read and the body starts being read in parallel. Not having our network hooks in the fetch spec makes it a bit more difficult to cross-check but I think using document's navigation ID would be more resilient (especially if we might be moving the collection to various hooks).

@juliandescottes
Copy link
Contributor Author

Thanks for taking a look!

Thanks for the PR. I think we do not have clear requirements that any clients need the functionality provided by addBodyCollector so we could exclude it for now (unless someone needs it?). At least I would not add browsing contexts params in the same way as we have it in event subscriptions (when context id resolves to the top-level traversable).

I'll wait for feedback from James here, in case that doesn't align with his feedback from PR #856 , but I thought that was one of the main required changes? Having a way to clearly declare whether you want to record responses or not. And if we do I think it makes sense to make it consistent with all our other similar APIs (events and intercepts) (note: intercepts don't have user context support yet, but they really should).

I think we need an ability to define the overall size limit instead of (in addition?) a per-request limit in setBodyCollectorConfiguration (instead of just not saving the freshest request we should probably evict earlier requests).

Yeah I'm happy to update the configuration bit with a total size + FIFO approach to evict requests, let's see if there are any other requested flags/limits.

I am thinking if in my initial draft I should have started collection in responseStarted (I think that would actually be required for interception use cases?)

Maybe we should create the entry as early as beforeRequestSent, and have a "state" in the collected response (pending, available, evicted ...)

@juliandescottes
Copy link
Contributor Author

One thing I wanted to mention re: contexts/userContexts in addBodyCollector.

On our side, considering our current implementation, it is important to have an API where clients can be selective upfront about which requests they are interested in. To record responses, Firefox duplicates them in another (parent) process. Means it's easier for us to control the availability of responses, but we probably use up more memory than Chrome does.

On the client size, if you are only interested in one class of requests coming from a specific tab, if you can't define the contexts userContexts to watch, then you have to fiddle with the "total size" configuration hoping that the requests you are interested in are not going to be evicted first?

Puppeteer and other clients can still just call it without any argument in the beginning? But considering this API is consistent with our subscription and intercept APIs, and seems beneficial for clients, I would still like us to consider it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants