Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: send events when user leave the page #1146

Merged
merged 14 commits into from
Feb 10, 2022
Merged

feat: send events when user leave the page #1146

merged 14 commits into from
Feb 10, 2022

Conversation

devcorpio
Copy link
Contributor

@devcorpio devcorpio commented Jan 20, 2022

Context

Currently the RUM agent sends events to the APM server in intervals of 500ms. Despite so, there are still scenarios where no information is being sent.

Take into account the following scenarios

  • user closes the tab
  • user closes the browser
  • user navigates to another page
  • user reloads the page

If they take place before the interval is triggered the information collected until this moment will be lost. And that is precisely what we want to reduce as much as possible with the changes made in this PR.

Summary

A new logic that combines the lifecycle API events: visibilitychange, pagehide and the usage of fetch with the flag keepalive in order to send events when user leaves the page.

"The keepalive option can be used to allow the request to outlive the page. Fetch with the keepalive flag is a replacement for the Navigator.sendBeacon() API.". More info here

Why are not we using Beacon API?

  • Since the Content-Type of our requests is application/x-ndjson the browser performs a preflight request and in case of Beacon API the request sets the credentials mode to include. Hence, if we don't want to see the following error in the browser: Response to preflight request doesn't pass access control check: The value of the 'Access-Control-Allow-Credentials' header in the response is '' which must be 'true' when the request's credentials mode is 'include' the APM Server has to include Access-Control-Allow-Credentials: true in the response
  • Beacon API does not allow to add custom headers and we need to add “Content-Encoding” after compressing the payload with Compression Streams API

Thoughts about this

  • Could it make sense to stop sending the content-type in our requests? (At this moment is a mandatory field and value)
    Without that constraint navigator.sendBeacon() could be used in all browsers except Chromium browsers (because currently are the ones compressing the payload with GZIP).

What browsers do not support the keepalive flag in fetch?

  1. IE 11 (that's because this browser do not even support fetch)
  2. Firefox performs the request as if the flag was set to false
  3. Safari < 13 performs the request as if the flag was set to false

What browsers do support the keepalive flag in fetch?

Chrome, Edge (Chromium), and Safari >=13 support it properly

Unfortunately, there is an exception in Chromium browsers, there are two scenarios that are not covered:

  • page reload
  • navigation to a different page

Those scenarios are not covered because the compression process is not fast enough and the APM server do not receive the request.

In Chromium browsers there are two more interesting nuances (in this case good ones):

  • When closing tab or browser the compression process delay is not a problem and the request is being sent
  • When closing tab or browser even XHR are being sent too!

When is all this logic executed?

When the events visibilitychange or pagehide are triggered.

Why are not we using the unload event?

In this link the details are well explained. But mainly because are not reliable events (especially on mobile) and prevents the bfcache of working as expected. In fact, because of that we might need to change our web-vitals implementation

Why are we still using the 500ms interval?

In SPAs the information collected after navigating a few pages exceeds 64kb which is the limit for fetch with keepalive (beacon api has the very same limit). We need to bear in mind that at this moment we are only compressing the payload in Chromium browsers, so having payloads of 64kb would not be that uncommon. If we remove the interval we will not send information at all for the SPA scenario.

Is there any fallback?

XHR is the fallback in two different situations:

  • if payload exceeds the payload limit (in fact, I set the limit a bit lower since it may very depending on the browser)
  • Chrome < 81 had a bug with preflight requests when using keepalive. So, in a more general way, if there is a fetch network error, there is a fallback with XHR

But keep in mind that the XHR request will not be done when:

  • user closes the tab
  • user closes the browser
  • user navigates to another page
  • user reloads the page

@devcorpio devcorpio linked an issue Jan 20, 2022 that may be closed by this pull request
@apmmachine
Copy link
Contributor

apmmachine commented Jan 20, 2022

📦 Bundlesize report

Filename Size(bundled) Size(gzip) Diff(gzip)
elastic-apm-opentracing.umd.min.js 66.0 KiB 21.0 KiB ⚠️ 611 Bytes
elastic-apm-rum.umd.min.js 59.8 KiB 19.4 KiB ⚠️ 608 Bytes

@apmmachine
Copy link
Contributor

apmmachine commented Jan 20, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-02-10T12:46:30.348+0000

  • Duration: 97 min 36 sec

Test stats 🧪

Test Results
Failed 0
Passed 770
Skipped 8
Total 778

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run benchmark tests : Run the benchmark test.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@devcorpio devcorpio marked this pull request as ready for review January 25, 2022 15:07
TRANSACTION_SERVICE
])

const enabled = bootstrap(configService, transactionService)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason why bootstrap was called first before creating service factory is that we need to account for when the time the agent got initialized, we might need to split this with installing page visibility handlers. if we need service factories to be created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment just make me realise that a better place to add the page visibility handlers could be inside ApmBase.init otherwise we would be adding listeners even with the agent not initialised

@@ -258,6 +260,18 @@ export default class PerformanceMonitoring {
const configService = this._configService
const transactionService = this._transactionService

// do not process calls to our own endpoints
if (task.data && task.data.url) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the request's url points to our own endpoints we will NOT process it. This new behaviour works for XHR and fetch.

P.S. Nevertheless, I'm not removing the XHR_IGNORE logic since it might be useful for other uses cases (although I would advocate to remove it in future PRs if there are not valid use cases)

P.S.2. This new logic also helps to cover the case where our users make use of fetch polyfills that rely on XHR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make an issue and get rid of XHR_IGNORE in the next PR's? I don't think its required as we have this logic which can handle both now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created! #1164

Copy link
Member

@vigneshshanmugam vigneshshanmugam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work and the details are super helpful @devcorpio 🎉

Thanks for going extra mile in adding lots of tests. I just added couple of small points, Once we clear that, this PR is good to go.

* @param transactionService
*/
export function observePageVisibility(configService, transactionService) {
if (document.visibilityState === 'hidden') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually thinking, Do we want to capture the lastHiddenStart time when the rum agent got loaded vs when the rum agent gets initialized?

The listeners should still be when the agent is initialized though. Thoughts @devcorpio ?

Copy link
Contributor Author

@devcorpio devcorpio Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigneshshanmugam it is true that before lastHiddenStart value was being updated even when the agent was not initialised and now is the opposite.

Umm, to be honest, I'm not able to see the benefit of allowing that value to be set with the agent uninitialised. Are you seeing any nuance that I'm missing?

And when it comes to the listeners maybe I'm a bit purist (or stubborn 😄 ) but I don't like the idea of having code within the agent that reacts to user interactions having the agent uninitialised

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldnt think of a solid case as well. Was just trying to remember why I did it this way before and wondering if it would cause any problems.

But lets move ahead, not a blocker for sure. We can revisit if we see any problems.

@@ -258,6 +260,18 @@ export default class PerformanceMonitoring {
const configService = this._configService
const transactionService = this._transactionService

// do not process calls to our own endpoints
if (task.data && task.data.url) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make an issue and get rid of XHR_IGNORE in the next PR's? I don't think its required as we have this logic which can handle both now.

Copy link
Member

@vigneshshanmugam vigneshshanmugam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@devcorpio devcorpio merged commit 2429814 into elastic:main Feb 10, 2022
@devcorpio devcorpio deleted the send_beacon_unload branch February 10, 2022 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider using Beacon API for sending payload on unload event
3 participants