Add commands to collect and retrieve response bodies #877

juliandescottes · 2025-02-13T17:19:08Z

Overview of what this PR aims to add:

Concept of network body collector A network body collector is a concept similar to intercepts and events subscriptions. Clients can add/remove collectors. In theory this should be used for both requests and responses but is only applied to responses in this PR. A network body collector is a struct with contexts or userContexts, and urlPatterns. All are optional so you can potentially define a collector which will match everything (to be discussed)

New BiDi session items:

BiDi session has a network body collector map, similar to the intercept map. Simply stores the active body collectors
BiDi session has a network maximum body size, js-uint to define the maximum size of collected bodies.
BiDi session has a network response map, which contains all the collected bodies, keyed by request id. This map is stored at session level because different sessions might have different configurations about what kind of network bodies can be collected (eg max size).

New commands:

new command addBodyCollector to add a new network body collector
new command removeBodyCollector to remove an existing network body collector
new command setNetworkBodyCollectorConfiguration, which can be used to set session's network maximum body size. In the future we might have more configuration available here, this is why this is setting a generic configuration.
also getResponseBody, but is mostly identical to the one in Add a command to get response body #856 . It defaults to base64 at the moment, we probably want to make it easier to receive the body as string if possible? (but I prefered to leave this command as close to the existing PR as possible)

New error:

new error no such body collector, for removeBodyCollector

Updates to existing commands

When a response is caught in network.responseCompleted, we attempt to collect the body if it is related to a navigable
On navigation committed we remove the bodies of all responses linked to this navigable
On context destroyed we also remove the bodies of all responses linked to this navigable

Note that I haven't added extra limitations to which responses are collected in responseCompleted, but we can definitely add them (eg no worker requests etc...)

Preview | Diff

juliandescottes · 2025-02-13T17:20:24Z

@OrKoN @jgraham I was not sure how I could.(or if I could?) update PR #856 , so I just created a new one here.
Please take a look at the summary before looking at the patch, you might already have comments on the overview before diving into the details :)

OrKoN · 2025-02-13T18:19:27Z

Thanks for the PR. I think we do not have clear requirements that any clients need the functionality provided by addBodyCollector so we could exclude it for now (unless someone needs it?). At least I would not add browsing contexts params in the same way as we have it in event subscriptions (when context id resolves to the top-level traversable). I think we need an ability to define the overall size limit instead of (in addition?) a per-request limit in setBodyCollectorConfiguration (instead of just not saving the freshest request we should probably evict earlier requests).

OrKoN · 2025-02-13T18:24:17Z

Note that I haven't added extra limitations to which responses are collected in responseCompleted

I am thinking if in my initial draft I should have started collection in responseStarted (I think that would actually be required for interception use cases?)

OrKoN · 2025-02-13T18:30:30Z

index.bs

@@ -5264,6 +5275,9 @@ given |navigable| and |navigation status|:

 1. [=Resume=] with "<code>navigation committed</code>", |navigation id|, and |navigation status|.

+1. For each |session| in [=active BiDi sessions=], [=delete collected response bodies=]


by this point I believe the navigation request that loaded the document has already happened and we want to retain it. If we really want to follow the CDP model we should key the network data by document.

Is the response already completed by that time? In any case, adding a reference to the document sounds fine to me I almost wanted to include it in the initial design.

I think the headers are read and the body starts being read in parallel. Not having our network hooks in the fetch spec makes it a bit more difficult to cross-check but I think using document's navigation ID would be more resilient (especially if we might be moving the collection to various hooks).

juliandescottes · 2025-02-13T20:33:38Z

Thanks for taking a look!

Thanks for the PR. I think we do not have clear requirements that any clients need the functionality provided by addBodyCollector so we could exclude it for now (unless someone needs it?). At least I would not add browsing contexts params in the same way as we have it in event subscriptions (when context id resolves to the top-level traversable).

I'll wait for feedback from James here, in case that doesn't align with his feedback from PR #856 , but I thought that was one of the main required changes? Having a way to clearly declare whether you want to record responses or not. And if we do I think it makes sense to make it consistent with all our other similar APIs (events and intercepts) (note: intercepts don't have user context support yet, but they really should).

I think we need an ability to define the overall size limit instead of (in addition?) a per-request limit in setBodyCollectorConfiguration (instead of just not saving the freshest request we should probably evict earlier requests).

Yeah I'm happy to update the configuration bit with a total size + FIFO approach to evict requests, let's see if there are any other requested flags/limits.

I am thinking if in my initial draft I should have started collection in responseStarted (I think that would actually be required for interception use cases?)

Maybe we should create the entry as early as beforeRequestSent, and have a "state" in the collected response (pending, available, evicted ...)

juliandescottes · 2025-02-14T08:11:35Z

One thing I wanted to mention re: contexts/userContexts in addBodyCollector.

On our side, considering our current implementation, it is important to have an API where clients can be selective upfront about which requests they are interested in. To record responses, Firefox duplicates them in another (parent) process. Means it's easier for us to control the availability of responses, but we probably use up more memory than Chrome does.

On the client size, if you are only interested in one class of requests coming from a specific tab, if you can't define the contexts userContexts to watch, then you have to fiddle with the "total size" configuration hoping that the requests you are interested in are not going to be evicted first?

Puppeteer and other clients can still just call it without any argument in the beginning? But considering this API is consistent with our subscription and intercept APIs, and seems beneficial for clients, I would still like us to consider it.

OrKoN and others added 3 commits February 13, 2025 17:30

Add a command to get response body

b0d84ed

lifecycle constraints

500d8e7

Add command to add and remove collectors, set maximum response size

429d636

OrKoN reviewed Feb 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add commands to collect and retrieve response bodies #877

Add commands to collect and retrieve response bodies #877

juliandescottes commented Feb 13, 2025 •

edited by pr-preview bot

Loading

juliandescottes commented Feb 13, 2025

OrKoN commented Feb 13, 2025

OrKoN commented Feb 13, 2025

OrKoN Feb 13, 2025

juliandescottes Feb 13, 2025

OrKoN Feb 13, 2025

juliandescottes commented Feb 13, 2025

juliandescottes commented Feb 14, 2025

		@@ -5264,6 +5275,9 @@ given \|navigable\| and \|navigation status\|:

		1. [=Resume=] with "<code>navigation committed</code>", \|navigation id\|, and \|navigation status\|.

		1. For each \|session\| in [=active BiDi sessions=], [=delete collected response bodies=]

Add commands to collect and retrieve response bodies #877

Are you sure you want to change the base?

Add commands to collect and retrieve response bodies #877

Conversation

juliandescottes commented Feb 13, 2025 • edited by pr-preview bot Loading

juliandescottes commented Feb 13, 2025

OrKoN commented Feb 13, 2025

OrKoN commented Feb 13, 2025

OrKoN Feb 13, 2025

Choose a reason for hiding this comment

juliandescottes Feb 13, 2025

Choose a reason for hiding this comment

OrKoN Feb 13, 2025

Choose a reason for hiding this comment

juliandescottes commented Feb 13, 2025

juliandescottes commented Feb 14, 2025

juliandescottes commented Feb 13, 2025 •

edited by pr-preview bot

Loading