Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert base64-encoded screenshots to bytes ASAP #28929

Open
foolip opened this issue May 10, 2021 · 4 comments
Open

Convert base64-encoded screenshots to bytes ASAP #28929

foolip opened this issue May 10, 2021 · 4 comments
Labels
infra priority:backlog wptrunner The automated test runner, commonly called through ./wpt run

Comments

@foolip
Copy link
Member

foolip commented May 10, 2021

This suggestion is a bit of a nice-to-have improvement to avoid extra conversions.

As it is, the screenshot implementations of RefTestExecutor return the screenshot as a Base64-encoded string. For example:

def _screenshot(self, protocol, url, timeout):
protocol.marionette.navigate(url)
protocol.base.execute_script(self.wait_script, asynchronous=True)
screenshot = protocol.marionette.screenshot(full=False)
# strip off the data:img/png, part of the url
if screenshot.startswith("data:image/png;base64,"):
screenshot = screenshot.split(",", 1)[1]
return screenshot

Later, when comparing the screenshots, those strings are decoded to bytes for hashing and comparing with Pillow, and it appears the same string might be decoded multiple times:

def hash_screenshots(screenshots):
"""Computes the sha1 checksum of a list of base64-encoded screenshots."""
return [hashlib.sha1(base64.b64decode(screenshot)).hexdigest()
for screenshot in screenshots]

def get_differences(self, screenshots, urls, page_idx=None):
from PIL import Image, ImageChops, ImageStat
lhs = Image.open(io.BytesIO(base64.b64decode(screenshots[0]))).convert("RGB")
rhs = Image.open(io.BytesIO(base64.b64decode(screenshots[1]))).convert("RGB")

Whether this matters at all for performance I have not investigated, but it seems like it would be simpler overall if the executor returned the screenshot as bytes.

One optimization that would be made impossible is, which I can't tell if it's currently used, is to first check for equality by comparing the base64-encoded strings, to avoid even decoding them if they are the same.

@jgraham has told me in the past of a screenshot optimization use by Marionette, so there's probably more nuance here.

@foolip foolip added infra wptrunner The automated test runner, commonly called through ./wpt run labels May 10, 2021
@foolip
Copy link
Member Author

foolip commented May 10, 2021

Although I haven't found a working example, I think https://docs.python.org/3/library/codecs.html#binary-transforms could be useful here too if we got the WebDriver response at bytes directly.

@jgraham
Copy link
Contributor

jgraham commented May 10, 2021

WebDriver is using JSON as the transport so "getting the response as bytes directly" seems like it would mean avoiding the normal JSON parsing library.

In the case that we get a base64 string from WebDriver, we do have to decode it to bytes for comparison, but we also need to have the base64-encoded version if we want to end up logging it. Based on that I'd expect the most effecient implementation would keep exactly one copy as bytes and one as base64 and use the appropriate one in each case, so we're only doing a single conversion from base64 string -> bytes. But that may be tricky in the actual code.

(The marionette optimisation here is basically "have a special endpoint for doing reftests, so that this all happens inside the browser process, and only send the screenshots over the wire in the case they'll end up in the logs").

@foolip
Copy link
Member Author

foolip commented May 10, 2021

Ah right, I was imagining the response body was something starting with "data:image/png;base64," but it's wrapped in JSON. There would be a stretch of bytes in the raw response which can be interpreted as base64, but that would be a rather elaborate thing to do while ensuring that invalid JSON doesn't work.

In any case, I was mainly thinking of simplifying internals here, motivated by the code I found in #28930, not optimization.

we also need to have the base64-encoded version if we want to end up logging it

Are the base64-encoded screenshots logged by default anywhere still? Is that just with --log-wptscreenshot currently, or do screenshots get logged as base64 in Gecko CI with --log-tbpl still? I'm thinking maybe it's still an overall simplification and performance improvement even if we'd convert back bytes to base64 when logging.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue May 14, 2021
…() for screenshot decoding, a=testonly

Automatic update from web-platform-tests
Replace six.ensure_str with just .decode() for screenshot decoding (#28930)

Part of web-platform-tests/wpt#28776.

See also web-platform-tests/wpt#28929.
--

wpt-commits: 9c9f86c4a4ca0c5842a046d7dad2525649f37ae1
wpt-pr: 28930
@jgraham
Copy link
Contributor

jgraham commented May 19, 2021

We log failing screenshots by default in gecko CI.

It definitely could be better overall to convert back to base64 just for logging, but it's at least not a slam-dunk win.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra priority:backlog wptrunner The automated test runner, commonly called through ./wpt run
Projects
None yet
Development

No branches or pull requests

2 participants