Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a FAQ explaining why there is no fast-math mode. #260

Merged
merged 4 commits into from
Jul 23, 2015
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,3 +210,22 @@ Note that this discussion applies to use of LLVM IR as a standardized format. LL
and midlevel optimizers can still be used to generate WebAssembly code from C and C++, and will use
LLVM IR in their implementation similarly to how PNaCl and Emscripten do today.

## Why is there no fast-math mode with relaxed floating point semantics?

Optimizing compilers commonly have fast-math flags which permit the compiler to relax the rules around floating point in order to optimize more aggressively. This can including assuming that NaNs or infinities don't occur, ignoring the difference between negative zero and positive zero, making algebraic manipulations which change how rounding is performed or when overflow might occur, or replacing operations with approximations that are cheaper to compute.

These optimizations effectively introduce nondeterminism; it isn't possible to determine how the code will behave without knowing the specific choices made by the optimizer. This often isn't a serious problem in native code scenarios, because all the nondeterminism is resolved by the time native code is produced. Since most hardware doesn't have floating point nondeterminism, developers have an opportunity to test the generated code, and then count on it behaving consistently for all users thereafter.

WebAssembly implementations run on the user side, so there is no opportunity for developers to test the final behavior of the code. Nondeterminism at this level could cause distributed WebAssembly programs to behave differently in different implementations, or change over time. WebAssembly does have [some nondeterminism](Nondeterminism.md) in cases where the tradeoffs warrant it, but fast-math flags are not believed to be important enough:

* Many of the important fast-math optimizations happen in the mid-level optimizer of a compiler, before WebAssembly code is emitted. For example, loop vectorization that depends on floating point reassociation can still be done at this level if the user applies the appropriate fast-math flags, so WebAssembly programs can still enjoy these benefits. As another example, compilers can replace floating point division with floating point multiplication by a reciprocal in WebAssembly programs just as they do for other platforms.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what you're suggesting for vectorization. Would it be done in the dev's machine, or on-device? I think we want to allow both, though maybe at different dates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be on the dev-machine, or on-device in a JIT library, both of which would have the freedom to utilize fast-math flags, and the latter would have the ability to feature-test for things like available SIMD features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe expand on this as we discussed yesterday? Reassociation can be a middle-end thing, but may also be something that requires precise target ISA information (e.g. vector width, number of registers, cache information). The middle end can ask for that information, or wasm backends can implement clever vectorization if they want (but I'd rather have the middle end do it).

Halide would be a great example to use here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added a bullet point for this. It's an interesting question whether the number of registers or cache information are things we want to expose in feature tests (no doubt some developers will require us to provide them, but we have other requirements to consider as well).


* Mid-level compiler optimizations may also be augmented by implementing them in a JIT library in WebAssembly. This would allow them to perform optimizations that benefit from having [information about the target](FeatureTest.md) and information about the source program semantics such as fast-math flags at the same time. For example, if SIMD types wider than 128-bit are added, it's expected that there would be feature tests allowing WebAssembly code to determine which SIMD types to use on a given platform.

* When WebAssembly [adds an FMA operation](FutureFeatures.md#additional-floating-point-operations), folding multiply and add sequences into FMA operators will be possible.

* WebAssembly doesn't include its own math functions like `sin`, `cos`, `exp`, `pow`, and so on. WebAssembly's strategy for such functions is to allow them to be implemented as library routines in WebAssembly itself (note that x86's `sin` and `cos` instructions are slow and imprecise and are generally avoided these days anyway). Users wishing to use faster and less precise math functions on WebAssembly can simply select a math library implementation which does so.

* Most of the individual floating point operations that WebAssembly does have already map to individual fast instructions in hardware. Telling `add`, `sub`, or `mul` they don't have to worry about NaN for example doesn't make them any faster, because NaN is handled quickly and transparently in hardware on all modern platforms.

* WebAssembly has no floating point traps, status register, dynamic rounding modes, or signalling NaNs, so optimizations that depend on the absense of these features are all safe.