Convert base64-encoded screenshots to bytes ASAP #28929

foolip · 2021-05-10T10:07:43Z

This suggestion is a bit of a nice-to-have improvement to avoid extra conversions.

As it is, the screenshot implementations of RefTestExecutor return the screenshot as a Base64-encoded string. For example:

wpt/tools/wptrunner/wptrunner/executors/executormarionette.py

Lines 1023 to 1033 in 208e86d

    
           def _screenshot(self, protocol, url, timeout): 
        
               protocol.marionette.navigate(url) 
        
               protocol.base.execute_script(self.wait_script, asynchronous=True) 
        
               screenshot = protocol.marionette.screenshot(full=False) 
        
               # strip off the data:img/png, part of the url 
        
               if screenshot.startswith("data:image/png;base64,"): 
        
                   screenshot = screenshot.split(",", 1)[1] 
        
               return screenshot

Later, when comparing the screenshots, those strings are decoded to bytes for hashing and comparing with Pillow, and it appears the same string might be decoded multiple times:

wpt/tools/wptrunner/wptrunner/executors/base.py

Lines 88 to 91 in 208e86d

    
           def hash_screenshots(screenshots): 
        
               """Computes the sha1 checksum of a list of base64-encoded screenshots.""" 
        
               return [hashlib.sha1(base64.b64decode(screenshot)).hexdigest() 
        
                       for screenshot in screenshots]

wpt/tools/wptrunner/wptrunner/executors/base.py

Lines 464 to 468 in 208e86d

    
           def get_differences(self, screenshots, urls, page_idx=None): 
        
               from PIL import Image, ImageChops, ImageStat 
        
               lhs = Image.open(io.BytesIO(base64.b64decode(screenshots[0]))).convert("RGB") 
        
               rhs = Image.open(io.BytesIO(base64.b64decode(screenshots[1]))).convert("RGB")

Whether this matters at all for performance I have not investigated, but it seems like it would be simpler overall if the executor returned the screenshot as bytes.

One optimization that would be made impossible is, which I can't tell if it's currently used, is to first check for equality by comparing the base64-encoded strings, to avoid even decoding them if they are the same.

@jgraham has told me in the past of a screenshot optimization use by Marionette, so there's probably more nuance here.

The text was updated successfully, but these errors were encountered:

foolip · 2021-05-10T10:22:49Z

Although I haven't found a working example, I think https://docs.python.org/3/library/codecs.html#binary-transforms could be useful here too if we got the WebDriver response at bytes directly.

Part of #28776. See also #28929.

jgraham · 2021-05-10T10:40:18Z

WebDriver is using JSON as the transport so "getting the response as bytes directly" seems like it would mean avoiding the normal JSON parsing library.

In the case that we get a base64 string from WebDriver, we do have to decode it to bytes for comparison, but we also need to have the base64-encoded version if we want to end up logging it. Based on that I'd expect the most effecient implementation would keep exactly one copy as bytes and one as base64 and use the appropriate one in each case, so we're only doing a single conversion from base64 string -> bytes. But that may be tricky in the actual code.

(The marionette optimisation here is basically "have a special endpoint for doing reftests, so that this all happens inside the browser process, and only send the screenshots over the wire in the case they'll end up in the logs").

foolip · 2021-05-10T11:54:27Z

Ah right, I was imagining the response body was something starting with "data:image/png;base64," but it's wrapped in JSON. There would be a stretch of bytes in the raw response which can be interpreted as base64, but that would be a rather elaborate thing to do while ensuring that invalid JSON doesn't work.

In any case, I was mainly thinking of simplifying internals here, motivated by the code I found in #28930, not optimization.

we also need to have the base64-encoded version if we want to end up logging it

Are the base64-encoded screenshots logged by default anywhere still? Is that just with --log-wptscreenshot currently, or do screenshots get logged as base64 in Gecko CI with --log-tbpl still? I'm thinking maybe it's still an overall simplification and performance improvement even if we'd convert back bytes to base64 when logging.

Part of #28776. See also #28929.

…28930) Part of #28776. See also #28929.

…() for screenshot decoding, a=testonly Automatic update from web-platform-tests Replace six.ensure_str with just .decode() for screenshot decoding (#28930) Part of web-platform-tests/wpt#28776. See also web-platform-tests/wpt#28929. -- wpt-commits: 9c9f86c4a4ca0c5842a046d7dad2525649f37ae1 wpt-pr: 28930

jgraham · 2021-05-19T10:28:59Z

We log failing screenshots by default in gecko CI.

It definitely could be better overall to convert back to base64 just for logging, but it's at least not a slam-dunk win.

foolip added infra wptrunner The automated test runner, commonly called through ./wpt run labels May 10, 2021

foolip added a commit that referenced this issue May 10, 2021

Replace six.ensure_str with just .decode() for screenshot decoding

c7a7726

Part of #28776. See also #28929.

foolip mentioned this issue May 10, 2021

Replace six.ensure_str with just .decode() for screenshot decoding #28930

Merged

foolip added a commit that referenced this issue May 10, 2021

Replace six.ensure_str with just .decode() for screenshot decoding

5b6bf28

Part of #28776. See also #28929.

foolip added a commit that referenced this issue May 10, 2021

Replace six.ensure_str with just .decode() for screenshot decoding (#…

9c9f86c

…28930) Part of #28776. See also #28929.

foolip added the priority:backlog label Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert base64-encoded screenshots to bytes ASAP #28929

Convert base64-encoded screenshots to bytes ASAP #28929

foolip commented May 10, 2021

foolip commented May 10, 2021

jgraham commented May 10, 2021

foolip commented May 10, 2021

jgraham commented May 19, 2021

Convert base64-encoded screenshots to bytes ASAP #28929

Convert base64-encoded screenshots to bytes ASAP #28929

Comments

foolip commented May 10, 2021

foolip commented May 10, 2021

jgraham commented May 10, 2021

foolip commented May 10, 2021

jgraham commented May 19, 2021