Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chrome 100 results are ... interesting #1020

Closed
krausest opened this issue Mar 30, 2022 · 38 comments
Closed

Chrome 100 results are ... interesting #1020

krausest opened this issue Mar 30, 2022 · 38 comments

Comments

@krausest
Copy link
Owner

Results for Chrome 100 look pretty different.
The fastest strategy for clear rows has changed and results for create rows are significantly slower. I think I have to take a closer look before publishing those results as official.
If anyone finds the trick for clear rows please post it here ;-)

https://krausest.github.io/js-framework-benchmark/2022/table_chrome_100_preliminary.html

@krausest
Copy link
Owner Author

Looks like #166
Within a RAF call vanillajs clears in ~ 50 msecs.

@krausest
Copy link
Owner Author

Vanillajs can be improved significantly by wrapping create, replace and clear in RAF (left with RAF, right without). Not sure if I should commit this as it'll certainly spreads into benchmark implementations.

@fabiospampinato
Copy link
Contributor

fabiospampinato commented Mar 30, 2022

Personally I only see a bug in Chrome in here. Simply wrapping clearing in a raf is probably a bug even, now if you clear and append fast enough you won't see the appended rows. I don't see any reason to push frameworks toward this workaround for something that should probably be fixed in Chrome.

@ryansolid
Copy link
Contributor

Yeah RAF has always been slightly faster and it has varied at different times. We made a mark for it because it clearly was gaming it. That being said there are a couple of frameworks that have lower scores on clear and I'm not sure in all cases it is just RAF. I looked at doohtml and didn't see an obvious RAF call.

Also worth noting is slight hit on replace rows. Most frameworks have taken a slight hit. sifr interestingly seems to be good on both these so I might try to see what the difference is.

@krausest
Copy link
Owner Author

krausest commented Apr 1, 2022

Well, doohtml's implementation is particularly clear:


It's actually what vanillajs does Home come doohtml is 24 msecs faster?
I'll try to profile this weekend...

@krausest
Copy link
Owner Author

krausest commented Apr 2, 2022

I actually have doubts that the webdriver measurement is correct.
First I removed the CPU slowdown. The webdriver results still show that doohtml is faster than vanillajs:

result vanillajs-keyed_09_clear1k_x8.json min 8.897 max 11.889 mean 10.4755 median 10.78 stddev 0.9645715514039265
result doohtml-keyed_09_clear1k_x8.json min 6.334 max 7.056 mean 6.6656 median 6.6165 stddev 0.3000385901105982

I prepared a branch for puppeteer. How do results look there (measurement isn't strictly comparable since it closes the browser after every benchmark loop):
result vanillajs-keyed_09_clear1k_x8.json min 7.516 max 10.225 mean 8.3141 median 8.1485 stddev 0.84825709546104
result doohtml-keyed_09_clear1k_x8.json min 6.162 max 11.239 mean 7.912800000000002 median 6.6165 stddev 2.207555299420606
doohtml is still a bit faster, but way closer than with webdriver.

If I profile with chrome I can't see any difference between both. I chose a 4x CPU slowdown:
vanillajs:
Screenshot from 2022-04-02 13-09-56
doohtml
Screenshot from 2022-04-02 13-10-09

And I think that's how it should be. Both clear tbody.textContent. I changed vanillajs to clear textContent only and skip all other logic, but it didn't change the results.

@krausest
Copy link
Owner Author

krausest commented Apr 4, 2022

This was nastier than I thought. I couldn't find the reason for the numbers reported by chromedriver and chrome's profiler showed a different picture.

Some time ago I prepared an alternative benchmark runner that uses puppeteer, but I didn't finalize the switch. But I'm planning to do it now.
The biggest advantage is that it produces traces that can be loaded in the chrome profiler. This makes it much easier to check the results.

The first candidate can be seen here: puppeteer results

And here's an excerpt (left new puppeteer results, right old webdriver results):
Screenshot from 2022-04-04 20-46-34

One large difference can be seen for doothml and create benchmark. Let's take a look at a trace from puppeteer:
Screenshot from 2022-04-04 21-13-09
This trace shows (at least together with the others) that a median of 86 msecs appears to be plausible.

But there was another important change to get there. Some frameworks got worse after switching to puppeteer. The traces looked like that:
Screenshot from 2022-04-04 21-19-50
Here we see that after performing the create operation a timer fires which causes a redraw.
The old logic was to compute the duration from the start of the click event to the last paint event. And this would result in a value of 180 msecs for doohtml! And I guess a similar issue happened for the old results.

How can we measure the duration better? I propose the following: For all benchmarks except select row we compute the duration like that: from the start of the click event to the end of the first paint after the last layout event (usually there's just one layout event).
Cases with multiple layout events happen e.g. for aurelia (replace rows):
Screenshot from 2022-04-04 21-45-00
There's a first layout event after the click event and a second after the actual dom update happened.
Or for ember (create rows):
Screenshot from 2022-04-04 21-53-03
There's a first layout event right after the click before a timer event that appears to cause the real redraw.

For select row no layout event should happen.
There are cases with multiple paint event e.g. for fre:
Screenshot from 2022-04-04 22-06-24
The first paint event happens about 10 msecs after the click but the second one (after 417 msecs) is the correct paint event for our duration.
Or glimmer:
Screenshot from 2022-04-04 22-17-45
Or marko - which is interesting, since it happens only sometimes for marko:
Screenshot from 2022-04-04 22-23-01

There are even frameworks that happen to have a layout event (and three paint events) for select rows like reflex-dom:
Screenshot from 2022-04-04 22-13-43

So for select rows I'm staying with the old logic. The duration is from the starting of the click event to the end of the last paint event. Please note that this leaves the issue open that we solved above for the other benchmarks, but I have no other idea how to measure the duration better if there's no layout event.

@hman61 Sorry, I'm afraid to say that it currently seems like I measured doohtml incorrectly!
Do you have an idea what's causing the second paint in traces like these? Is it the test driver or are there any timers fired in doothml? I think I've never seen them in the chrome profile so it may be the test driver.
Screenshot from 2022-04-04 23-10-40

@krausest
Copy link
Owner Author

krausest commented Apr 5, 2022

I ported the logic back to the webdriver to get a better understanding of the differences between webdriver and puppeteer results.

A few questions arise when comparing puppeteer and webdriver.

  1. Why is doohtml much faster for create rows with puppeteer?
  2. Why is svelte much faster for replace rows with puppeteer?
  3. What's the reason for most frameworks to be faster for replace rows with puppeteer (vanilla, solid, vue)?
  4. Is puppeteer slower for select row for react?
  5. Why are vanillajs and solid much faster for clear rows with pupeteer?

@krausest
Copy link
Owner Author

krausest commented Apr 5, 2022

1.Why is doohtml much faster for create rows with puppeteer?

For svelte the new logic returns the same result for create rows, since there's just a single paint event.

For doohtml there's a difference:

more than one paint event found
paints event {
  type: 'paint',
  ts: 181465296692,
  dur: 629,
  end: 181465297321,
  evt: '{"method":"Tracing.dataCollected","params":{"args":{"data":{"clip":[-16777215,-16777215,16777216,-16777215,16777216,16777216,-16777215,16777216],"frame":"F318F3FB0BBBC36CD591D24FEF975201","layerId":0,"nodeId":212}},"cat":"devtools.timeline,rail","dur":629,"name":"Paint","ph":"X","pid":1938853,"tdur":622,"tid":1,"ts":181465296692,"tts":880680}}'
} 88.744
paints event {
  type: 'paint',
  ts: 181465333324,
  dur: 2633,
  end: 181465335957,
  evt: '{"method":"Tracing.dataCollected","params":{"args":{"data":{"clip":[-16777215,-16777215,16777216,-16777215,16777216,16777216,-16777215,16777216],"frame":"F318F3FB0BBBC36CD591D24FEF975201","layerId":0,"nodeId":212}},"cat":"devtools.timeline,rail","dur":2633,"name":"Paint","ph":"X","pid":1938853,"tdur":2622,"tid":1,"ts":181465333324,"tts":907592}}'
} 127.38
WARNING: New and old measurement code return different values. New  88.744 old 127.38

The mean for doohtml with the new logic is 86.45, i.e. comparable to the puppeteer result.
Seems like the old idea "measure from click till last paint" didn't work right for this framework, since there's a two paint events.

@ryansolid
Copy link
Contributor

Yeah it's interesting DooHTML does very well in the webdriver in areas you wouldn't expect. On average Svelte is a tiny bit better on puppeteer in relation to others but most things do stay in line so to speak other than those 2 (and React's slow select row).

As for 5. I think other than DooHTML which is a bit of an enigma, VanillaJS and SolidJS are the only 2 doing el.textContent = "" to clear. Everything stays in line on clear but I think the difference is more emphasized in puppeteer.

@krausest
Copy link
Owner Author

krausest commented Apr 5, 2022

  1. Why is svelte much faster for replace rows with puppeteer?

That's somehow different. The old and the new logic return the same result for svelte for replace rows since there's just a single paint event and that event takes about 124 msecs.

The puppeteer trace also looks completely unsuspicious:
Screenshot from 2022-04-05 20-47-54

I have currently no idea where that large difference comes from.

When I perform replace rows manually and profile the 5th create rows I'm getting that profile in chrome:
Screenshot from 2022-04-05 20-53-32
Those numbers look closer to the webdriver results.

So I came back to the last resort: Adding some client side hack for measuring the duration ( 89b546f ).
The result is [105, 99.30, 94.30, 97.60, 95.40, 115.10, 111.7, 102.10, 94.80, 87.10, 94.60, 97.9] for puppeteer and [115.80, 145,10, 132, 141.10, 145.30, 133.60, 132.9, 131.9, 138.40, 134.20, 119.70, 139.30] for webdriver. This confirms that the two benchmark drivers observe different performance.
Running manually in chrome gives me numbers close to webdriver.

[Updated 4/9/22]
I'm getting the fast results for a very small list of categories like 'devtools.timeline,disabled-by-default-devtools.timeline'. Those categories are enough to compute the duration from the timeline and show event names.
If I enable full tracing and enable many categories (and it seem like disabled-by-default-devtools.timeline.stack is the one that causes the slow down) I'm getting much slower results, closer to webdriver-ts (but this is not an explanation since the webdriver benchmark driver itself also uses just the small category list).
What's interesting to see what the client hack reveals:

BROWSER: JSHandle:run took 94.79999999701977
BROWSER: JSHandle:run took 96.70000000298023
BROWSER: JSHandle:run took 106.59999999403954
BROWSER: JSHandle:run took 84.90000000596046
BROWSER: JSHandle:run took 93.30000001192093
after initialized  02_replace1k 0 svelte
runBenchmark
BROWSER: JSHandle:run took 111.6000000089407

The replace row operation invokes create rows and then 5 warm up create rows (which replace the content) before the 6th create row is called. The duration of this 6th operation is counted. All 5 replace row invocations are all faster than the final one and I suppose this is due to chrome's profiler which is active only for that last call. With the small list of enabled categories the last run is in the same range as the others:

BROWSER: JSHandle:run took 95
BROWSER: JSHandle:run took 99.5
BROWSER: JSHandle:run took 93
BROWSER: JSHandle:run took 95.30000001192093
BROWSER: JSHandle:run took 99.29999999701977
after initialized  02_replace1k 0 svelte
runBenchmark
BROWSER: JSHandle:run took 97.20000000298023

It seems like the impact of full profiling is much less for the other frameworks. Here's vanillajs and there's hardly a difference for the final run:

BROWSER: JSHandle:run took 85.40000000596046
BROWSER: JSHandle:run took 87.79999999701977
BROWSER: JSHandle:run took 79.59999999403954
BROWSER: JSHandle:run took 96.79999999701977
BROWSER: JSHandle:run took 79.90000000596046
after initialized  02_replace1k 0 vanillajs
runBenchmark
BROWSER: JSHandle:run took 85.5

or react hooks:

BROWSER: JSHandle:RUN took 107.6000000089407
BROWSER: JSHandle:RUN took 133
BROWSER: JSHandle:RUN took 93.20000000298023
BROWSER: JSHandle:RUN took 96.20000000298023
BROWSER: JSHandle:RUN took 99.29999999701977
after initialized  02_replace1k 0 react-hooks
runBenchmark
BROWSER: JSHandle:RUN took 95.09999999403954

=> So if using puppeteer we should use the smallest list of enabled trace categories.

@krausest
Copy link
Owner Author

krausest commented Apr 5, 2022

[Updated 4/9/22]
3. What's the reason for most frameworks to be faster for replace rows with puppeteer (vanilla, solid, vue)?

When I first looked into it I ran into the usual trap: I used my default chrome with extension (1password, ublock) enabled.
I really should remember to always use incognito mode for all profiling!

Vanilllajs then gives me results that look close to the puppeteer results:
Screenshot from 2022-04-09 10-09-18
Most replace rows tasks were all close to 83 msecs.

=> Seems like puppeteer results are ok.

@krausest
Copy link
Owner Author

krausest commented Apr 5, 2022

[Updated 4/9/22]
4.Is puppeteer slower for select row for react?

It seems like this was also caused by enabling many tracing categories. With the minimal trace category list it looks pretty similar.

webdriver react-hooks
mean 54.8273 median 55.2395
mean 53.99490000000001 median 53.8345

puppeteer react-hooks
mean 52.32250000000001 median 52.4035 stddev 2.6654604459434195
mean 53.0296 median 52.593500000000006 stddev 2.7626347810257763

@krausest
Copy link
Owner Author

krausest commented Apr 5, 2022

  1. Why are vanillajs and solid much faster for clear rows with pupeteer?

As for clear rows there is just 1 paint event for vanillajs and solid, so the new logic can't help.
The client side hack reports also numbers in the 70 msecs range.

@krausest
Copy link
Owner Author

krausest commented Apr 6, 2022

I applied the new logic (duration = time span between start of click and last paint for select row benchmark,
time span between start of click and first paint after a layout event for all benchmarks but select row) to the webdriver client.

Results differ for create 1k, create 10k, remove and select for a few framework.

First I took a look at select rows benchmark.
Is the above logic (click until last paint) reasonable?

Most frameworks have only a single paint event and thus are fine.

More than one paint event have fre, glimmer, glimmer-2, maquette, marko, miso, react-recoil, reflex-dom

Let's take a closer look (CPU slowdown is 1 for the screenshots):
Most frameworks report the first paint after about 1 msec and the second paint event much later.
Like glimmer: (first zoomed in to show the first paint event)
Screenshot from 2022-04-06 21-36-08
Screenshot from 2022-04-06 21-36-32
glimmer-2:
Screenshot from 2022-04-06 21-38-23
For those frameworks it would be wrong to take the first paint! So the logic looks fine.

Maquette is different. The first paint is about 12 msecs after the click and the second about 18 msecs:
Screenshot from 2022-04-06 21-40-42

And reflex-dom is different too. It has three paints. One screenshot is zoomed in to show the first paint.
Screenshot from 2022-04-06 21-43-43
Screenshot from 2022-04-06 21-43-57
I also consider the logic to be correct for this case. The whole operation ends with the third paint event.

=> It seems fine to compute the duration for select row as the time span from the click event to the last paint event.

@krausest
Copy link
Owner Author

krausest commented Apr 6, 2022

The second benchmark that differs is create rows.
But create 1k rows and create 10k rows only differ for doohtml. And we've seen above that it was important for doohtml to use the first paint after the layout event. No other frameworks were affected for those benchmarks.

@hman61
Copy link
Contributor

hman61 commented Apr 6, 2022 via email

@krausest
Copy link
Owner Author

krausest commented Apr 6, 2022

The third is remove row benchmark.

The following frameworks cause more than one paint: elm, endorphin, etch, hyperapp, imba, maquette, miso, reflex-dom,
skruv, ui5-webcomponents, vidom, mithril, reagent.

Most of them look very similar, like
endorphin
Screenshot from 2022-04-06 22-10-52
imba
Screenshot from 2022-04-06 22-14-27

Mithril and reagent can look a bit different, like that:
Screenshot from 2022-04-06 22-18-07
(there's sometimes a gap between the first and the second paint)

=> For remove rows we should continue with the old logic, i.e. duration to the last paint.

@krausest
Copy link
Owner Author

krausest commented Apr 7, 2022

@hman61 I‘m not completely sure that I fully got the idea (I‘d have to add a client onblur-handler and measure to this event if I got the idea right). If you have some example code I‘d like to take a look at it. Would be interesting to see how it compares with the chrome profile.
After taking a look at those many profiles I currently have the feeling that rendering duration varies between efficient and not so frameworks and thus should indeed be counted.

@hman61
Copy link
Contributor

hman61 commented Apr 7, 2022 via email

@krausest
Copy link
Owner Author

krausest commented Apr 9, 2022

Here's my conclusion.

  1. The old measuring logic discriminated doohtml for create 1k and 10k rows. For some reason there's a second late and probably unrelated paint event that caused too slow results for doohtml. No other frameworks were affected. The measurement logic of create rows has been adjusted.
  2. Switching to puppeteer for the whole benchmark looks promising. The biggest advantage is that due to the trace files written it's very simple to justify our measurements and discuss potential bugs of this benchmark.

I'll run the benchmark with puppeteer (minimal trace categories) and we'll take a look at the results
@hman61 Thanks - I'll take a look at the client side measurement hopefully soon.

@hman61
Copy link
Contributor

hman61 commented Apr 10, 2022 via email

@krausest
Copy link
Owner Author

Another night, another run. Left new puppeteer, right chromedriver:
Screenshot from 2022-04-10 09-46-09
All results are now from puppeteer traces. create 1/10k rows for doohtml looks fine and clear rows makes more sense.

@krausest
Copy link
Owner Author

krausest commented Apr 10, 2022

One (hopefully) last finding. Some frameworks like ui5-webcomponents and vidom show bad results e.g. for swap rows with puppeteer.
ui5-webcomponents yields between 55 msecs and 120 msecs for swap rows.
It turns out that there's often a very long delay waiting for the RAF to fire:
4x slowdown:
Screenshot from 2022-04-10 20-43-34
2x slowdown:
Screenshot from 2022-04-10 20-41-02
1x slowdown
Screenshot from 2022-04-10 20-42-33

It seems like the delay depends on the CPU slowdown.

Webdriver does not show this behaviour. Results for ui5-components are all in the 55 msecs range for webdriver.

Just to give an impression:
Results for puppeteer:
117.377, 119.684, 116.923, 123.66, 54.985, 125.074, 116.245, 115.509, 117.372, 119.572, 119.117, 54.576

Results for chromedriver:
51.164, 51.94, 52.187,54.211, 58.01, 52.564, 54.324, 55.684, 53.655, 53.431, 58.224, 57.056

@krausest
Copy link
Owner Author

Some other frameworks are also a bit slower for puppeteer e.g. for remove rows. But this appears not to be due to too long RAF-delays.
ui5-webcomponents looks correct with ~33 msecs:
Screenshot from 2022-04-10 21-28-08
The RAF-delay is 6 msecs and thus ok.

@krausest
Copy link
Owner Author

I tried to find a workaround for the delayed animation frames with CPU throttling.

Among the many frameworks that use RAF I identified the following frameworks as having sometimes a long delay:
angular-nozone
bobril
choo
dojo
elm
endorphin
hyperapp
imba
maquette
mithril
reagent
skruv
ui5-webcomponents
vidom

I identify them with some frustratingly complex logic: one request animation frame within the click event, one fire animation frame at least 16 msecs later than the click and no layout event between the click and the fire animation frame.
Here's such an example with a long delay.
Screenshot from 2022-04-13 23-11-35
In this case the idle delay should not be counted.

But there are so many variations, like crui. In this case the duration should be measured from click to paint.
Screenshot from 2022-04-13 23-10-14

I've little idea why aurelia is doing things like that:
Screenshot from 2022-04-11 20-09-53

@krausest
Copy link
Owner Author

So here's the suggested final result for chrome 100: https://krausest.github.io/js-framework-benchmark/2022/table_chrome_100_puppeteer_raf.html

Any comments?

@fabiospampinato
Copy link
Contributor

It looks alright to me, a couple of things that I can spot:

  • vanillajs is ~19% slower in "remove row" than vanillajs-1, and one has to scroll relatively far down the table to find similar results, that seems very unlikely to be representative? And it's outside the confidence interval.
  • xania has the fastest "create many rows", but at the same time is 16% slower than vanilla at "create rows", that seems unlikely also, in the table for Chrome 99 it is only 3% off.

Other than that I can't see any super strange thing like we saw with the initial run for chrome 100.

If possible it'd be useful to output more statistically significant results, they seem to vary ~significantly with different runs 🤔

@hman61
Copy link
Contributor

hman61 commented Apr 14, 2022 via email

@krausest
Copy link
Owner Author

krausest commented Apr 14, 2022

Thanks. I did one more change. I subtracted the whole delay until the animation frame fires. This actually gives an advantage for longer delays. I changed that to subtract the delay - 16 msecs.

@fabiospampinato
The best thing about switching to puppeteer is that I can answer such questions much better than for webdriver.
I can't explain why a framework is slower, but I can show some chrome traces and "proof" that the reported value actually happened on my machine, which I'll do now :)

vanillajs reports for remove 1k the following values [22.979,24.419,24.572,24.673,24.721,24.744,24.777,25.048,25.45,25.936].
Here's the trace for the 24.777 run:
Screenshot from 2022-04-14 21-23-40

...and for vanillajs-1 I measured [20.561,20.635,20.73,20.802,20.919,20.992,21.655,21.715,21.718,22.475]
And here's the trace for the 20.992 run:
Screenshot from 2022-04-14 21-25-05

xania is statistically significant slower than vanillajs for create 1k, but not for for create 10k.

duration for vanillajs and create 1k rows: [78.277,78.781,81.697,83.022,83.674,83.885,88.241,91.318,92.782,95.063]
The trace for 78.277:
Screenshot from 2022-04-14 21-28-10
And the trace for 92.782:
Screenshot from 2022-04-14 21-29-23

duration for xania and create 1k rows: [82.587,82.613,86.362,95.322,95.995,97.547,99.23,106.812,109.966,112.8]
Here's one of the slower runs with 109.966:
Screenshot from 2022-04-14 21-32-33

@fabiospampinato I'll re-run the whole benchmark tonight and then we compare those values to https://krausest.github.io/js-framework-benchmark/2022/table_chrome_100_puppeteer_raf.html. Please do not compare chrome 100 results with any other chrome 100 results above since I played with the measurement rules. Still variance for puppeteer is higher than for webdriver (but that doesn't help when webdriver does strange things for remove rows...).

@hman61 I agree, this is why I'm running those benchmarks in a live browser and not headless.

@krausest
Copy link
Owner Author

2nd run is available: https://krausest.github.io/js-framework-benchmark/2022/table_chrome_100_puppeteer_raf2.html
Overall I'm not happy with the results. The statistically significant differences between vanillajs and xania are gone now. Mikado and fullweb-helpers are much faster for create 1k rows.

The standard error for most operations is worse than the older webdriver results.

I'm a bit afraid that puppeteer results will stay less stable than the old results. Just to make sure I didn't mix anything up else (there were many changes) I'll make another run with no other changes and then we'll see.

@fabiospampinato
Copy link
Contributor

fabiospampinato commented Apr 15, 2022

Yeah the results seem to vary quite a bit, now "remove row" in the vanillajs implementations is about the same, while in the previous run it seemed as if vanillajs was significantly slower than vanillajs-1 for some reason.

I don't know how long it takes to update the entire table, and increasing that amount of time is probably a pain in the ass, but maybe the table should be updated 3~5 times and then those runs could be averaged out to produce the final table? I don't know how else we could get more statistically significant results.


Or maybe individual tests could be ran 3~5x as much and only the fastest 2~4 results for each test could be averaged out.

@krausest
Copy link
Owner Author

krausest commented Apr 16, 2022

The tests run about 8 1/2 hours.
Here's the 3rd run and I carefully payed attention that I made no changes on my system (like kernel updates, chrome updates, graphics driver update, configuration changes and of course same source code): https://krausest.github.io/js-framework-benchmark/2022/table_chrome_100_puppeteer_raf3.html
Results are imo pretty close to the second run, so maybe it's not as bad as I thought with puppeteer.
Unless someone finds a larger difference I'd take that for the official results.

@krausest
Copy link
Owner Author

I published chrome 100 results: https://github.com/krausest/js-framework-benchmark/releases/tag/chrome100

@krausest
Copy link
Owner Author

For the sake of completeness some more notes:

  1. I implemented four different benchmark drivers:
    webdriver: The one used for the other runs so far. Issue: Clear rows was (?) pretty different for chrome 100
    puppeteer: The latest official results use the puppeteer driver.
    playwright: A new driver very similar to puppeteer.
    webdriver-CDP: A new webdriver implementation that uses a CDPSession to manually start tracing that allows to use the same timeline calculation as puppeteer and playwright, writes the traces to disk and runs the trace for each benchmark (webdriver has to use a single trace for the whole benchmark loop). Issue: Trace files are much bigger than playwright or puppeteer for unknown reasons (same tracing categories are chosen)

You can switch between those drivers for CPU benchmarks in benchmarkConfiguration.ts. Just use the benchmark class from the right import, recompile and run.

  1. Sadly the results differ between the test drivers for unknown reasons. Looking at the chrome flags in chrome://version or the tracing categories I can't find a setting that explains the differences.
    Still the three new implementations have the great advantage that they write trace files that can be loaded in the chrome profiler if you don't trust the numbers.
    left webdriver, right webdriver with CDP
    webdriver_webdriverCDP
    left puppeteer, right playwright
    puppeteer_playwright
    Biggest differences: Svelte and elm create rows

For the time being I'm using puppeteer by default now.

@LifeIsStrange
Copy link

Other test/automation frameworks would be selenium RC (without webdriver) and Cypress.
It's great that you tested that many already and good to see that puppeteer and playwright performs better

@krausest
Copy link
Owner Author

I moved to playwright now that I have the option, since it causes very few errors (puppeteer sometimes doesn't record a paint event in the trace, maybe a longer wait would help. playwright doesn't show this issue). I notices that the variance is much lower if I use the default categories for tracing - so I went with them.

@krausest
Copy link
Owner Author

krausest commented Jul 3, 2022

I'm closing this issue. Migration to OSX and playwright / puppeteer is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants