Performance of high-level pollEvent #178

chrisdone · 2018-04-26T16:03:18Z

This is a general performance issue where I'll discuss results and issues.

I thought I would do a very simple measurement of the standard SDL event polling loop, with no window or anything:

import SDL
import Weigh

main :: IO ()
main =
  mainWith
    (do setColumns [Case, Allocated, GCs]
        sequence_
          [ action ("pollEvents " ++ show i) (sdl i)
          | i <- [10, 100, 1000, 10000, 100000]
          ])

sdl :: Int -> IO ()
sdl iters = do
  initializeAll
  let go :: Int -> IO ()
      go 0 = pure ()
      go i = do
        _events <- pollEvents
        go (i - 1)
  go iters

It turns out to allocate quite a bit: 😱

Case                Allocated  GCs
pollEvents 10           4,632    0
pollEvents 100         38,416    0
pollEvents 1000       369,112    0
pollEvents 10000    3,680,088    3
pollEvents 100000  36,802,144   35

If instead I do this, i.e. allocate once and repeatedly write to the same pointer and peek when there's something available:

sdl :: Int -> IO ()
sdl iters = do
  initializeAll
  alloca
    (\(e) -> do
       let go :: Int -> IO ()
           go 0 = pure ()
           go i = do
             do n <- Raw.pollEvent e
                if n == 0
                  then return ()
                  else void (peek e)
             go (i - 1)
       go iters)

Allocations are constant, no GCs: 💪

Case               Allocated  GCs
pollEvents 10          1,400    0
pollEvents 100         1,400    0
pollEvents 1000        1,400    0
pollEvents 10000       1,400    0
pollEvents 100000      1,400    0

So currently the example program in the docs which uses pollEvents in a loop is pretty inefficient. ☝️

I suspect waitEvent is probably better, but the SDL docs don't endors it like pollEvent:

SDL_PollEvent() is the favored way of receiving system events since it can be done from the main loop and does not suspend the main loop while waiting on an event to be posted.

But using pollEvent in a loop is guaranteed to allocate whether or not there are any events to be processed.

I realise there are worse things in life than a bit of allocation; I just have a hobby project as part of haskell-perf called "gcless" Haskell, in which I try out little programs and see what useful toys I can make that happen to not allocate beyond a constant initial factor.

Anyway, one alternative is to tweak pollEvent so that it checks whether there is an event before allocating:

{-# LANGUAGE LambdaCase #-}
import Foreign
import SDL
import qualified SDL.Raw as Raw
import Criterion.Main

main :: IO ()
main = do
  initializeAll
  defaultMain
    [ bgroup
        "pollEvents"
        [ bench ("orig") (whnfIO orig)
        , bench ("check") (whnfIO check)
        ]
    ]

check :: IO ()
check = do
  ret <- Raw.pollEvent nullPtr
  if ret == 0
    then pure ()
    else alloca
           (\e ->
              (do Raw.pollEvent e
                  peek e >>= \case
                    Raw.AudioDeviceEvent {} -> pure ()
                    _ -> pure ()))

orig :: IO ()
orig = pollEvent

No discernible speed difference:

benchmarking pollEvents/orig
time                 11.43 μs   (11.37 μs .. 11.49 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 11.41 μs   (11.37 μs .. 11.48 μs)
std dev              167.3 ns   (122.4 ns .. 274.7 ns)
variance introduced by outliers: 11% (moderately inflated)

benchmarking pollEvents/check
time                 11.05 μs   (11.00 μs .. 11.11 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 11.04 μs   (10.99 μs .. 11.13 μs)
std dev              216.5 ns   (154.8 ns .. 296.8 ns)
variance introduced by outliers: 19% (moderately inflated)

So we just avoid allocating where not necessary. 👍

The text was updated successfully, but these errors were encountered:

chrisdone · 2018-04-26T16:22:52Z

Just to summarize those numbers for future reference:

> let timeToPoll = 11 * 1e-6 -- from criterion
> let bytesPerPoll = 36802144/100000 -- from weigh
> let pollsPerSec = 1/timeToPoll 
> let bytesPerSecond = bytesPerPoll * pollsPerSec
> pollsPerSec
90909.09090909091
> printf "%f" (pollsPerSec * bytesPerPoll)
33456494.545454547

So that's 90,909 polls and 33.5MB per second of allocations. That's
consistent with the other findings. The GC count for 100,000 polls (36.8MB)
is 35 GCs, so that's roughly 1 GC per 1MB which corresponds to GHC's
default settings.

If your rendering function is in an SDL timer handled by pollEvent, then this might hinder your top performance.

chrisdone · 2018-04-26T16:30:15Z

More findings: waitEvent is apparently just a loop that has an interval. It doesn't, it would seem, use OS timers or wait for user input to trigger.

Just to summarize the numbers below for future reference: > let timeToPoll = 11 * 1e-6 -- from criterion > let bytesPerPoll = 36802144/100000 -- from weigh > let pollsPerSec = 1/timeToPoll > let bytesPerSecond = bytesPerPoll * pollsPerSec > pollsPerSec 90909.09090909091 > printf "%f" (pollsPerSec * bytesPerPoll) 33456494.545454547 So that's 90,909 polls and 33.5MB per second of allocations. That's consistent with the below findings. The GC count for 100,000 polls (36.8MB) is 35 GCs, so that's roughly 1 GC per 1MB which corresponds to GHC's default GC settings. Before: Case Allocated GCs pollEvents 10 4,632 0 pollEvents 100 38,416 0 pollEvents 1000 369,112 0 pollEvents 10000 3,680,088 3 pollEvents 100000 36,802,144 35 benchmarking pollEvents/orig time 11.43 μs (11.37 μs .. 11.49 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 11.41 μs (11.37 μs .. 11.48 μs) std dev 167.3 ns (122.4 ns .. 274.7 ns) variance introduced by outliers: 11% (moderately inflated) After Case Allocated GCs pollEvents 10 1,400 0 pollEvents 100 1,400 0 pollEvents 1000 1,400 0 pollEvents 10000 1,400 0 pollEvents 100000 1,400 0 benchmarking pollEvents/check time 11.05 μs (11.00 μs .. 11.11 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 11.04 μs (10.99 μs .. 11.13 μs) std dev 216.5 ns (154.8 ns .. 296.8 ns) variance introduced by outliers: 19% (moderately inflated)

Make pollEvent not allocate unnecessarily (#178)

chrisdone · 2018-04-26T22:44:46Z

Closing, I'll open different PRs for other operations.

chrisdone · 2018-04-27T09:49:39Z

Added regression tests for the allocations here: 2bd91b1

ocharles · 2018-04-30T10:46:46Z

Hi Chris,

Just got back from holiday but wanted to express my thanks for this work. The follow up commits seem obvious, but thanks for explaining what you're doing at the same time.

Crack on, I say!

chrisdone mentioned this issue Apr 26, 2018

Make pollEvent not allocate unnecessarily (#178) #179

Merged

chrisdone added a commit that referenced this issue Apr 26, 2018

Merge pull request #179 from haskell-game/poll-event-allocs

4bde0c1

Make pollEvent not allocate unnecessarily (#178)

chrisdone changed the title ~~Performance of sdl2 high-level bindings~~ Performance of high-level pollEvent Apr 26, 2018

chrisdone closed this as completed Apr 26, 2018

chrisdone added a commit that referenced this issue Apr 27, 2018

Add regression tests for #178 pollEvent space use

2bd91b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of high-level pollEvent #178

Performance of high-level pollEvent #178

chrisdone commented Apr 26, 2018

chrisdone commented Apr 26, 2018 •

edited

Loading

chrisdone commented Apr 26, 2018

chrisdone commented Apr 26, 2018

chrisdone commented Apr 27, 2018

ocharles commented Apr 30, 2018

Performance of high-level pollEvent #178

Performance of high-level pollEvent #178

Comments

chrisdone commented Apr 26, 2018

chrisdone commented Apr 26, 2018 • edited Loading

chrisdone commented Apr 26, 2018

chrisdone commented Apr 26, 2018

chrisdone commented Apr 27, 2018

ocharles commented Apr 30, 2018

chrisdone commented Apr 26, 2018 •

edited

Loading