-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix compile issues when build on RHEL5_64 with gcc 4.9.4 #8
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nebeid
pushed a commit
that referenced
this pull request
Jul 29, 2022
Resolves parsing issues for ARMv8 assembly with clang7 on ubuntu 20.04 in fips static build (found through PR #566 for SHA3 assembly implementation). - Fix parsing issue in `delocate.peg` for ARM assembly. - Edit rule `RegisterOrConstant` to allow shifting a register/constant by two digit value (e.g., the case of ARMv8 mask for SHA3 hardware support) instead of just one digit. - Add a new rule for allowing addition, subtraction and multiplication in the offset. (Note: useful for looping address accesses, e.g., `#8*($i+2)`). Add a set of `OffsetOperator` to define the operations allowed in the offset. Add a new set of `Offset` rule operations interpreted depending on parenthesis location, if added. Note: The parenthesis in the `Offset` rule should be either both included or both left out; i.e., the parenthesis set should be closed. The `OffsetOperator` includes addition, subtraction and multiplication only. This change was tested successfully in PR #566.
hanno-becker
added a commit
to hanno-becker/aws-lc
that referenced
this pull request
Jan 15, 2024
Implementations of AES-GCM in AWS-LC may use an "H-Table" to precompute and cache common computations across multiple invocations of AES-GCM using the same key, thereby improving performance. The main example of such a common precomputation is the computation of powers of the H-value used the GHASH algorithm -- giving the H-Table its name. However, despite the name, the structure of the H-Table is opaque to the code invoking AES-GCM, and implementations are free to populate it with arbitrary data. This freedom is already being leveraged: Currently, the AArch64 implementation of AES-GCM not only stores powers of H in the HTable (H1-H8 in the code), but also their 'Karatsuba preprocessing's, which are the EORs of the low and high halves. Those naturally occur when using Karatsuba's algorithm to reduce a 128-bit polynomial multiplication over GF(2) to 3x 64-bit polynomial. This commit changes the structure of the H-Table for AArch64 implementations slightly for better performance: It is observed that every time a power of H is loaded from the H-Table (H1-H8), the first operation that happens to it in both aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low and high halves via `ext arg.16b, arg.16b, arg.16b, aws#8`. Those swaps can be precomputed, and the Hi values stores in swapped form in the HTable, thereby eliminating the swaps from the critical loop of AES-GCM.
hanno-becker
added a commit
to hanno-becker/aws-lc
that referenced
this pull request
Jan 15, 2024
Implementations of AES-GCM in AWS-LC may use an "H-Table" to precompute and cache common computations across multiple invocations of AES-GCM using the same key, thereby improving performance. The main example of such common precomputation is the computation of powers of the H-value used in the GHASH algorithm -- giving the H-Table its name. However, despite the name, the structure of the H-Table is opaque to the code invoking AES-GCM, and implementations are free to populate it with arbitrary data. This freedom is already being leveraged: Currently, the AArch64 implementation of AES-GCM not only stores powers of H in the HTable (H1-H8 in the code), but also their 'Karatsuba preprocessing's, which are the EORs of the low and high halves. Those naturally occur when using Karatsuba's algorithm to reduce a 128-bit polynomial multiplication over GF(2) to 3x 64-bit polynomial. This commit changes the structure of the H-Table for AArch64 implementations of AES-GCM slightly to obtain a small performance gain: It is observed that every time a power of H is loaded from the H-Table (H1-H8), the first operation that happens to it in both aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low and high halves via `ext arg.16b, arg.16b, arg.16b, aws#8`. Those swaps can be precomputed, and the H{1-8} values stored in swapped form in the HTable, thereby eliminating the swaps from the critical loop of AES-GCM.
hanno-becker
added a commit
to hanno-becker/aws-lc
that referenced
this pull request
Jan 16, 2024
Implementations of AES-GCM in AWS-LC may use an "H-Table" to precompute and cache common computations across multiple invocations of AES-GCM using the same key, thereby improving performance. The main example of such common precomputation is the computation of powers of the H-value used in the GHASH algorithm -- giving the H-Table its name. However, despite the name, the structure of the H-Table is opaque to the code invoking AES-GCM, and implementations are free to populate it with arbitrary data. This freedom is already being leveraged: Currently, the AArch64 implementation of AES-GCM not only stores powers of H in the HTable (H1-H8 in the code), but also their 'Karatsuba preprocessing's, which are the EORs of the low and high halves. Those naturally occur when using Karatsuba's algorithm to reduce a 128-bit polynomial multiplication over GF(2) to 3x 64-bit polynomial. This commit changes the structure of the H-Table for AArch64 implementations of AES-GCM slightly to obtain a small performance gain: It is observed that every time a power of H is loaded from the H-Table (H1-H8), the first operation that happens to it in both aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low and high halves via `ext arg.16b, arg.16b, arg.16b, aws#8`. Those swaps can be precomputed, and the H{1-8} values stored in swapped form in the HTable, thereby eliminating the swaps from the critical loop of AES-GCM. This commit modifies the H-table precomputation ghash_init_v8 in the simplest way possible to introduce the desired swaps, bracketing store instructions for H-table values X with `vext.8 X, X, X, aws#8`. The resulting initialization code is slightly slower than the original one and will be simplified in the next commit.
hanno-becker
added a commit
to hanno-becker/aws-lc
that referenced
this pull request
Jan 16, 2024
This is the first in a series of commits aiming to rewrite gcm_ghash_v8 to work directly with the swapped H-table values, rather than swapping them back after loading and falling back to the old code. As a first step, the swapping of A = {H,H2} are removed and all uses of ``` pmull.64 Y, A, X ``` replaced by the equivalent ``` vext.8 X, X, X, aws#8 pmull2.64 Y, A, X vext.8 X, X, X, aws#8 ``` (and similarly for pmull2). This works so long as X and Y don't alias. Of course, the above conversion makes the code much less efficient, and is not final. The next commit will eliminate `vext`.
hanno-becker
added a commit
to hanno-becker/aws-lc
that referenced
this pull request
Jan 16, 2024
`In` and `t1` are swapped versions of each other. Therefore, ``` vext.8 $In, $In, $In, aws#8 vpmull2.p64 $Xln,$H,$In @ H·Ii+1 vext.8 $In, $In, $In, aws#8 ``` is equivalent to ``` vext.8 $In, $In, $In, aws#8 vpmull2.p64 $Xln,$H,$t1 @ H·Ii+1 vext.8 $In, $In, $In, aws#8 ``` is equivalent to ``` vpmull2.p64 $Xln,$H,$t1 @ H·Ii+1 vext.8 $In, $In, $In, aws#8 vext.8 $In, $In, $In, aws#8 ``` is equivalent to ``` vpmull2.p64 $Xln,$H,$t1 @ H·Ii+1 ```
hanno-becker
added a commit
to hanno-becker/aws-lc
that referenced
this pull request
Jan 16, 2024
In the context of the change, t0 and IN are the same after ``` veor $IN,$t0,$t2 @ inp^=Xi veor $t1,$t0,$t2 @ $t1 is rotated inp^Xi ``` Moreover, after all of ``` vpmull2.p64 $Xl,$H,$IN @ H.lo·Xi.lo vext.8 $IN, $IN, $IN, aws#8 veor $t1,$t1,$IN @ Karatsuba pre-processing vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi) vext.8 $IN, $IN, $IN, aws#8 ``` `IN` is unchanged because it was swapped twice, and t1 only feeds into the computation of Xm and is not used further afterwards. Hence, the above is equivalent to ``` vpmull2.p64 $Xl,$H,$IN @ H.lo·Xi.lo vext.8 $t1, $IN, $IN, aws#8 veor $t1,$t1,$IN @ Karatsuba pre-processing vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi) ``` removing one `vext`.
hanno-becker
added a commit
to hanno-becker/aws-lc
that referenced
this pull request
Jan 16, 2024
In the context of the change, t0 and IN are the same after ``` veor $IN,$t0,$t2 @ inp^=Xi veor $t1,$t0,$t2 @ $t1 is rotated inp^Xi ``` Moreover, after all of ``` vpmull2.p64 $Xl,$H,$IN @ H.lo·Xi.lo vext.8 $IN, $IN, $IN, aws#8 veor $t1,$t1,$IN @ Karatsuba pre-processing vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi) vext.8 $IN, $IN, $IN, aws#8 ``` `IN` is unchanged because it was swapped twice, and t1 only feeds into the computation of Xm and is not used further afterwards. Hence, the above is equivalent to ``` vpmull2.p64 $Xl,$H,$IN @ H.lo·Xi.lo vext.8 $t1, $IN, $IN, aws#8 veor $t1,$t1,$IN @ Karatsuba pre-processing vpmull.p64 $Xm,$Hhl,$t1 @ (H.lo+H.hi)·(Xi.lo+Xi.hi) ``` removing one `vext`.
hanno-becker
added a commit
to hanno-becker/aws-lc
that referenced
this pull request
Mar 21, 2024
Implementations of AES-GCM in AWS-LC may use an "H-Table" to precompute and cache common computations across multiple invocations of AES-GCM using the same key, thereby improving performance. The main example of such common precomputation is the computation of powers of the H-value used in the GHASH algorithm -- giving the H-Table its name. However, despite the name, the structure of the H-Table is opaque to the code invoking AES-GCM, and implementations are free to populate it with arbitrary data. This freedom is already being leveraged: Currently, the AArch64 implementation of AES-GCM not only stores powers of H in the HTable (H1-H8 in the code), but also their 'Karatsuba preprocessing's, which are the EORs of the low and high halves. Those naturally occur when using Karatsuba's algorithm to reduce a 128-bit polynomial multiplication over GF(2) to 3x 64-bit polynomial. This commit changes the structure of the H-Table for AArch64 implementations of AES-GCM slightly to obtain a small performance gain: It is observed that every time a power of H is loaded from the H-Table (H1-H8), the first operation that happens to it in both aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low and high halves via `ext arg.16b, arg.16b, arg.16b, aws#8`. Those swaps can be precomputed, and the H{1-8} values stored in swapped form in the HTable, thereby eliminating the swaps from the critical loop of AES-GCM. This commit modifies the H-table precomputation ghash_init_v8 in the simplest way possible to introduce the desired swaps, bracketing store instructions for H-table values X with `vext.8 X, X, X, aws#8`. The resulting initialization code is slightly slower than the original one and will be simplified in the next commit.
hanno-becker
added a commit
to hanno-becker/aws-lc
that referenced
this pull request
Jul 8, 2024
Implementations of AES-GCM in AWS-LC may use an "H-Table" to precompute and cache common computations across multiple invocations of AES-GCM using the same key, thereby improving performance. The main example of such common precomputation is the computation of powers of the H-value used in the GHASH algorithm -- giving the H-Table its name. However, despite the name, the structure of the H-Table is opaque to the code invoking AES-GCM, and implementations are free to populate it with arbitrary data. This freedom is already being leveraged: Currently, the AArch64 implementation of AES-GCM not only stores powers of H in the HTable (H1-H8 in the code), but also their 'Karatsuba preprocessing's, which are the EORs of the low and high halves. Those naturally occur when using Karatsuba's algorithm to reduce a 128-bit polynomial multiplication over GF(2) to 3x 64-bit polynomial. This commit changes the structure of the H-Table for AArch64 implementations of AES-GCM slightly to obtain a small performance gain: It is observed that every time a power of H is loaded from the H-Table (H1-H8), the first operation that happens to it in both aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low and high halves via `ext arg.16b, arg.16b, arg.16b, aws#8`. Those swaps can be precomputed, and the H{1-8} values stored in swapped form in the HTable, thereby eliminating the swaps from the critical loop of AES-GCM. This commit modifies the H-table precomputation ghash_init_v8 in the simplest way possible to introduce the desired swaps, bracketing store instructions for H-table values X with `vext.8 X, X, X, aws#8`. The resulting initialization code is slightly slower than the original one and will be simplified in the next commit.
nebeid
pushed a commit
that referenced
this pull request
Jul 11, 2024
AArch64 assembly implementations of AES-GCM in AWS-LC use an "H-Table" to precompute and cache common computations across multiple invocations of AES-GCM using the same key, thereby improving performance. The main example of such common precomputation is the computation of powers of the H-value used in the GHASH algorithm -- giving the H-Table its name. However, despite the name, the structure of the H-Table is opaque to the code invoking AES-GCM, and implementations are free to populate it with arbitrary data. This freedom is already being leveraged: Currently, the AArch64 implementation of AES-GCM not only stores powers of H in the HTable (H1-H8 in the code), but also their 'Karatsuba preprocessing's, which are the EORs of the low and high halves. Those naturally occur when using Karatsuba's algorithm to reduce a 128-bit polynomial multiplication over GF(2) to 3x 64-bit polynomial. This commit changes the structure of the H-Table for AArch64 implementations of AES-GCM slightly to obtain a small performance gain: It is observed that every time a power of H is loaded from the H-Table (H1-H8), the first operation that happens to it in both aesv8-gcm-armv8.pl and aesv8-gcm-armv8-unroll8.pl is to swap low and high halves via `ext arg.16b, arg.16b, arg.16b, #8`. Those swaps can be precomputed, and the H{1-8} values stored in swapped form in the HTable, thereby eliminating the swaps from the critical loop of AES-GCM. This gives a small performance gain for AES-GCM on Graviton3, at the cost of slightly slower one-off initialization. For Graviton2, the AES-GCM AArch64 assembly loads the H-table only once, outside of the critical loop; hence, there is no performance benefit.
dkostic
pushed a commit
to dkostic/aws-lc
that referenced
this pull request
Dec 5, 2024
Typo fix in CodeBuild job names
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change is to fix compile issues when build on RHEL5_64 with gcc 4.9.4.