Skip to content

Latest commit

 

History

History
248 lines (172 loc) · 4.91 KB

conclusion.md

File metadata and controls

248 lines (172 loc) · 4.91 KB

Conclusions


Favourite?

  • All of the above!!
  • C++ is best when we can blend all its features

Performance

  • Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz
  • cpupower frequency-set --governor performance
  • Single threaded
  • GCC 9.2

Cornell box scene

A noisy image of the Cornell box scene
256x256 32spp (1 sphere, 38 triangles)
Style Render time (secs)
Object Oriented 72
Functional 82
Data-Oriented 72

Owl scene

A noisy image of the Owl sphere scene

128spp (100 spheres, 12 triangles)

Style Render time (secs)
Object Oriented 96
Functional 101
Data-Oriented 64

Suzanne

A noisy image of the Suzanne monkey scene

256x256 8spp (2 spheres, 970 triangles)

Style Render time (secs)
Object Oriented 80
Functional 119
Data-Oriented 106

What on earth!?!


What happened?


Object oriented
   252,203,645,149   instructions    #    2.71  insn per cycle         
    23,158,824,048   branches        #  845.438 M/sec                  
       139,741,218   branch-misses   #    0.60% of all branches        
Functional
   238,866,691,159   instructions    #    1.78  insn per cycle         
    21,881,070,251   branches        #  560.513 M/sec                  
     1,105,066,725   branch-misses   #    5.05% of all branches        
Data-oriented Design

   154,821,779,748   instructions    #    1.34  insn per cycle         
    10,353,392,805   branches        #  305.213 M/sec                  
1,242,670,094 branch-misses # 12.00% of all branches

12% of all branches!?!


Performance analysis screenshot showing 90% of branch misses on a couple of branches



for (/* all triangles */) {
  auto u = calcU(/*...*/);
if (u < 0 || u > 1) { continue; }
auto v = calcV(/*...*/); if (v < 0 || u + v > 1) { continue; } auto dist = calcD(/*..*/); if (dist < nearest) { nearest = dist; } }

vcomisd xmm14, xmm0 ; 0 >= u?
ja skip
vcomisd xmm0, [1.0] ; u >= 1.0?
ja skip

Branch Prediction

  • u < 0 unpredictable
  • u > 1 unpredictable
  • (u < 0 || u > 1) should be predictable*
  • (u < 0 || u > 1 || v < 0 || u + v > 1) more so
* Provided compiler combines conditions...
(u < 0) | (u > 1) | ...

auto u = calcU(/*...*/);
if (u < 0 || u > 1) {
  continue;
}
auto v = calcV(/*...*/);
if (v <  0 || u + v > 1) {
  continue;
}
auto u = calcU(/*...*/);
auto v = calcV(/*...*/);
if ((u < 0) | (u > 1) 
    | (v < 0) | (u + v > 1)) {
  continue;
}

FINAL STATS

Suzanne DoD: 106s → 68s (36% faster)


FINAL STATS

Scene OO FP DoD
Cornell 72 → 64 82 → 64 72 → 56
Owl 96 → 96 101 → 100 64 → 64
Suzanne 80 → 104 (!) 119 → 82 106 → 68

If I had more time...

  • Code on GitHub
  • Threading
  • Devirtualisation
  • Future directions...
  • DoD improvements
  • Thanks

GO WRITE SOMETHING COOL!

Some spheres