We have several different implementations of FLIP (Python, PyTorch, C++, and CUDA) and we have tried to make the implementations as similar as possible. However, there are several facts about these that make it very hard to get perfect matches between the implementations. These include:
- Our computations are made using 32-bit floating-point arithmetic.
- The order of operations matter, with respect to the result.
- We are using different versions of functions in the different versions of FLIP, and these may not all use similar implementations. Even computing the mean of an array can give different results because of this.
- As an example, if a 2D filter implementation's outer loop is on
x
and the inner loop is ony
, that will in the majority of cases give a different floating-point result compared to have the outer loop bey
and the innerx
.
- GPUs attempt to try to use fused multiply-and-add (FMA) operations, i.e.,
a*b+c
, as much as possible. These are faster, but the entire operation is also computed at higher precision. Since the CPU implementation may not use FMA, this is another source of difference between implementations. - Depending on compiler flags,
sqrt()
may be computed using lower precision on GPUs. - For the C++ and CUDA implementations, we have changed to using separated filters for faster performance.
This has given rise to small differences compared to previous versions. For our tests,
we have therefore updated the
images/correct_{ldr|hdr}flip_{cpp|cuda}.{png|exr}
images.
That said, we have tried to make the results of our different implementations as close to each other as we could. There may still be differences.
Furthermore, the Python version of FLIP, installed using pip install flip_evaluator
, runs on Windows, Linux (tested on Ubuntu 24.04),
and OS X (flip_evaluator/tests/test.py
are made for Windows. While the mean tests (means compared up to six decimal points)
pass on each mentioned operative system, not all error map pixels are identical.