-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
various edge-case loading optimizations: #709
Conversation
ikreymer
commented
Oct 27, 2024
- if too many range requests for same URL are being made, try skipping/failing right away to reduce load
- assume main browser context is used not just for service workers, always enable
- check false positive 'net-aborted' error that may actually be ok for media, as well as documents
- improve logging
- possible fix for issues in Browser disconnected (crashed?) #706
- interrupt any pending requests (that may be loading via browser context) after page timeout, log dropped requests
I'm wondering if we should also move the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, a few small comments. I tested against a few of the examples in #706 (comment) and am not seeing any browser crashes, though these were admittedly smaller-scoped crawls.
This is ready for re-review - though probably want to merge test fixes from #710 first |
Main changes are actually streaming 206 responses (!) which were before always loaded into memory, also failing duplicate 206 responses that are not from 0- to avoid additional load. |
- if too many range requests for same URL are being made, try skipping/failing right away to reduce load - assume main browser context is used not just for service workers, always enable - check false positive 'net-aborted' error that may actually be ok for media, as well as documents - improve logging - possible fix for issues in #706 - interrupt any pending requests (that may be loading via browser context) after page timeout, log dropped requests
…amed by only checking 200, not 206 (ack!) status code new logic: - if content length > 25MB (text rewrite limit), always stream, won't do any rewriting (should be fairly rare) - if content length > 5MB, always stream, unless essential resource (eg. html/js/css) which requires in-memory fetch for possible rewriting - if content length is unknown, stream if non-error code <300, as error codes likely aren't very large.
- add fetchContinued to avoid double-handling requestPaused if intercepted both in browser context and page context - rename isServiceWorker -> isBrowserContext, write record in fetch response if no page context - don't write 'no payload' requests in browser context, as they may be redirects reusing same requestId, and 204 will get skipped anyway - don't auto-attempt aborted media, already handled via behavior fetch
- interrupt pending requests when page is finished, so pageinfo record is written after - add pageFinished flag to recorder, remove unused 'skipping' flag - renable attempt refetch, should be using dedup
5cacea5
to
5ea30d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Great catch on the 206 streaming issue. Just left one small suggestion
Co-authored-by: Tessa Walsh <[email protected]>
- fix: prefer streaming current response via takeStream, not only when size is unknown - ensure partial range requests are not async fetched, only full responses - don't serialize zero-payload responses - don't serialize 206 responses if there is size mismatch
various fixes for streaming, especially related to range requests - follow up to #709 - fix: prefer streaming current response via takeStream, not only when size is unknown - don't serialize async responses prematurely - don't serialize 206 responses if there is size mismatch