-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ [RUM-162] Truncate resources URL containing data URLs #2690
✨ [RUM-162] Truncate resources URL containing data URLs #2690
Conversation
Bundles Sizes Evolution
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2690 +/- ##
==========================================
- Coverage 93.36% 93.36% -0.01%
==========================================
Files 240 240
Lines 6981 6992 +11
Branches 1539 1542 +3
==========================================
+ Hits 6518 6528 +10
- Misses 463 464 +1 ☔ View full report in Codecov by Sentry. |
/to-staging |
🚂 Branch Integration: starting soon, merge in < 10m Commit 477629a771 will soon be integrated into staging-14. This build is going to start soon! (estimated merge in less than 10m) Use |
…resources into staging-14 Co-authored-by: cy-moi <[email protected]>
🚂 Branch Integration: This commit was successfully integrated Commit 477629a771 has been merged into staging-14 in merge commit 8ca824d45b. Check out the triggered pipeline on Gitlab 🦊 |
|
||
return attributeValue | ||
// Truncate data:url to avoid performance impact | ||
return findDataUrlAndTruncate(attributeValue) ?? attributeValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ question: You are truncating all data:url not only the one above a MAX_ATTRIBUTE_VALUE_CHAR_LENGTH. Is it intended? If yes, could it have an impact on Replay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was intended. But then indeed the truncation would prevent some allowed data to show up properly. So fixing on this right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the length check back to the recorder as well as for the resource url.
/to-staging |
🚂 Branch Integration: starting soon, merge in < 0s Commit 9b3336f9e3 will soon be integrated into staging-15. This build is going to start soon! (estimated merge in less than 0s) Use |
🚨 Branch Integration: The build pipeline contains failing jobs for this merge request We couldn't automatically merge the commit 9b3336f9e3 into staging-15. Since those jobs are not marked as being allowed to fail, the pipeline will most likely fail. You should have a look at the pipeline, wait for the build to finish and investigate the failures.
|
/create-fix-branch -b staging-15 |
🚂 Devflow: Created fix branch fix-merge-9b3336f9e3-into-staging-15 - #2698 |
🚂 Branch Integration: starting soon, merge in < 0s Commit 9b3336f9e3 will soon be integrated into staging-15. This build is going to start soon! (estimated merge in less than 0s) |
🚂 Branch Integration Commit 9b3336f9e3 has been merged into staging-15 in merge commit 2f8877703d. Check out the triggered pipeline on Gitlab 🦊 |
const DATA_URL_REGEX = /data:(.+)?(;base64)?,/g | ||
export const MAX_ATTRIBUTE_VALUE_CHAR_LENGTH = 24_000 | ||
|
||
export function isDataUrlTooLong(url: string): boolean { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💬 suggestion: this function name suggests it's taking a data URL as an input and check if it's too long, while what it is actually doing is taking a string as en input check if it's a data URL and if it's long.
I propose to either:
- rename to something like
isLongDataUrl
(I think thetoo
is not adding much value) - split the function in two such as
isAboveLimit
andisDataUrl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I searched the codebase and it seems like we are only using this limit length for data URL, hence making it into separate functions feels redundant. But the naming is indeed not accurate. I have renamed it to isLongDataUrl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also truncate the url coming from performance resource entries?
url: entry.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes on our slack discussion, we found that here it says:
If an HTML IMG element has a data: URI as its source [RFC2397], then this resource will not be included as a PerformanceResourceTiming object in the Performance Timeline. By definition data: URI contains embedded data and does not require a fetch.
So we do not sanitize data url in this case.
} | ||
|
||
export function sanitizeDataUrl(url: string): string { | ||
return url.match(DATA_URL_REGEX)![0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💬 suggestion:
It could be nice to have something indicating that it the url has been truncated like data:[<mediatype>] [...]
(the same way we do for action names)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the [...] as in action names to indicate the truncation.
Motivation
We're currently collecting the full url data of resources using data: based resource.url .
This is highly inefficient for the ingestion and indexing side and may contain sensitive informations. This data is always truncated to 24k characters when being indexed, which would result in partial images.
In this case, we want to truncate the partial data till only keep the useful information (MIME type, encodings).
Changes
We truncate data:url to the embedding codec when they exceeds 24k
![Screenshot 2024-04-08 at 18 21 51](https://private-user-images.githubusercontent.com/9922567/320574620-891e929b-3ec5-4118-9540-6c726f032c73.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMDUyMzQsIm5iZiI6MTczOTMwNDkzNCwicGF0aCI6Ii85OTIyNTY3LzMyMDU3NDYyMC04OTFlOTI5Yi0zZWM1LTQxMTgtOTU0MC02YzcyNmYwMzJjNzMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTFUMjAxNTM0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Njg2MTQxNDA0OTM4NTliYjk2MjMxZTcyMWNiZmJmZTlhYTRmY2UzYWRiNjMwNTA1NDMwYzhjMDIwMjAyM2VmNCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.3AZLkz8KpuFnFRidu23Dn6OmgT5CHlpWLQEImux0mGA)
Examples
Edit (Added truncate indicator [...])
![Screenshot 2024-04-10 at 17 21 26](https://private-user-images.githubusercontent.com/9922567/321296551-1a8ce051-3b91-4157-8833-76dda621e248.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzMDUyMzQsIm5iZiI6MTczOTMwNDkzNCwicGF0aCI6Ii85OTIyNTY3LzMyMTI5NjU1MS0xYThjZTA1MS0zYjkxLTQxNTctODgzMy03NmRkYTYyMWUyNDgucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTFUMjAxNTM0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OTMxNGMxOWQzYjY2NTg1NTM1MTg5OTg0YjRjNTZiNWQxN2ZjZmIyMDFmYjQ5YmJkNDdlNDNhMDAxZDlkZjQ5YSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.KfHi1YM2mhqcx1j366eSFsowZBomjQPALivsvpbUT9Y)
Testing
I have gone over the contributing documentation.