-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
65 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
To get full use of the repo, you need a modern processor which has AVX512 instructions. | ||
If your processor only has AVX2, you need to change target instruction sets in the projects to AVX2, and don’t generate AVX512 | ||
In the projects because your machine wont run them. | ||
|
||
The projects build with clang, IC2022 and VS2019. | ||
Select x64 and solution configuration for IC2022, release, debug and clang | ||
|
||
Getting started shows some example use cases for vectors filters and views, together with an experimental | ||
vectorised forward AAD. | ||
|
||
Accumulate example shows some of the use cases given in the cppCon2022 talk. | ||
Additionally it gives an example of error correction in Khan accumulation | ||
|
||
The example build an run with VS2019, clang and intel compilers. The target instruction set | ||
generated by the framework can be changed by changing the namespace. These are double and float | ||
types VecDb is pair of doubles. Uncomment the namespace ande build the example. | ||
|
||
//using namespace DRC::VecDb; | ||
//using namespace DRC::VecD2D; //sse2 double | ||
using namespace DRC::VecD4D; //avx2 double | ||
//using namespace DRC::VecF8F; // avx2 float | ||
//using namespace DRC::VecD8D; //avx512 double | ||
//using namespace DRC::VecF16F; //avx512 float | ||
|
||
For a machine supporting AVX512 ensure all the VC projects are set to use enhanced instruction set must be set | ||
ConfigurationProperties C++/Instruction Set /Enable Enhanced Instruction Set to ARCh:AVX512 | ||
If your machine doesnt support this, reduce to AVX2 or SSE2, and dont select a namespace in the code requiring more advanced instruction | ||
sets. | ||
|
||
Uncomment one of the Using namespace lines select the instruction set that you wish to run | ||
Those ending in F have float type as underlying, those ending with D have a double. | ||
|
||
The project is set to compile using the AVX512 enhanced instruction set. The namespace selection | ||
choses the type of the intrinsics that are used to instantiate lambdas. | ||
|
||
If your hardware does not support AVX512 chose the next level down AVX2 and avoid using namespaces | ||
DRC::VecD8D or DRC::VecF16F which will cause generation of code with instructions that your computer doesn't support. | ||
|
||
check device manager/processor to determine what processor you have and check against web site | ||
https://ark.intel.com/content/www/us/en/ark/products/123550/intel-xeon-silver-4114-processor-13-75m-cache-2-20-ghz.html | ||
or | ||
https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html | ||
|
||
|
||
The getting started project shows the useage of vectors lambdas and filters | ||
|
||
The accumulateExample builds performance examples covered in the cppCon2022 talk. | ||
They give the user the chance to change between ICC,clang and VS2019 builds but changing the | ||
instruction set used via the using declaration. | ||
|
||
The inverseCumNormalExample gives the performance example shown in cppCon2022, although there might be some slight | ||
perfrormance regression on one or two of the examples. Its instructive to run the examples after building with the | ||
different compilers and chosing different instruction sets for the Lambdas (via namespace). | ||
|
||
The AVX512Dance function runs a routine which finds the max value in in array, using AVX2 amnd AVX512. By monitoring the | ||
power useage using something like openhardware monitor its possible to see that using the AVX512 instructions, use less | ||
energy to do the compute than teh AVX2 ( on this silver4114 xeon). | ||
|
||
VectorTest is a selection of tests using googletest. | ||
The main library is Vectorisation. This refrence a local copy of the VCL2 library. It has a slight change to enable | ||
VCL2 to be used with the intel IC2022 compiler. | ||
|
||
|
||
|
||
|