- Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz
cpupower frequency-set --governor performance
- Single threaded
- GCC 9.2
256x256 32spp (1 sphere, 38 triangles)
Style | Render time (secs) |
---|---|
Object Oriented | 72 |
Functional | 82 |
Data-Oriented | 72 |
128spp (100 spheres, 12 triangles)
Style | Render time (secs) |
---|---|
Object Oriented | 96 |
Functional | 101 |
Data-Oriented | 64 |
256x256 8spp (2 spheres, 970 triangles)
Style | Render time (secs) |
---|---|
Object Oriented | 80 |
Functional | 119 |
Data-Oriented | 106 |
What on earth!?!
252,203,645,149 instructions # 2.71 insn per cycle
23,158,824,048 branches # 845.438 M/sec
139,741,218 branch-misses # 0.60% of all branches
238,866,691,159 instructions # 1.78 insn per cycle
21,881,070,251 branches # 560.513 M/sec
1,105,066,725 branch-misses # 5.05% of all branches
154,821,779,748 instructions # 1.34 insn per cycle
10,353,392,805 branches # 305.213 M/sec
1,242,670,094 branch-misses # 12.00% of all branches
12% of all branches!?!
for (/* all triangles */) {
auto u = calcU(/*...*/);
if (u < 0 || u > 1) {
continue;
}
auto v = calcV(/*...*/);
if (v < 0 || u + v > 1) {
continue;
}
auto dist = calcD(/*..*/);
if (dist < nearest) {
nearest = dist;
}
}
u < 0
unpredictableu > 1
unpredictable(u < 0 || u > 1)
should be predictable*(u < 0 || u > 1 || v < 0 || u + v > 1)
more so
* Provided compiler combines conditions...
(u < 0) | (u > 1) | ...
auto u = calcU(/*...*/);
if (u < 0 || u > 1) {
continue;
}
auto v = calcV(/*...*/);
if (v < 0 || u + v > 1) {
continue;
}
auto u = calcU(/*...*/);
auto v = calcV(/*...*/);
if ((u < 0) | (u > 1)
| (v < 0) | (u + v > 1)) {
continue;
}
Scene | OO | FP | DoD |
---|---|---|---|
Cornell | 72 → 64 | 82 → 64 | 72 → 56 |
Owl | 96 → 96 | 101 → 100 | 64 → 64 |
Suzanne | 80 → 104 (!) | 119 → 82 | 106 → 68 |
- Code on GitHub
- Threading
- Devirtualisation
- Future directions...
- DoD improvements
- Thanks