Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AES-GCM AArch64: Store swapped Htable values #1403

Merged
merged 4 commits into from
Jul 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 0 additions & 144 deletions crypto/fipsmodule/modes/asm/aesv8-gcm-armv8-unroll8.pl

Large diffs are not rendered by default.

8 changes: 0 additions & 8 deletions crypto/fipsmodule/modes/asm/aesv8-gcm-armv8.pl
Original file line number Diff line number Diff line change
Expand Up @@ -353,18 +353,15 @@
ldr $rk5q, [$cc, #80] // load rk5
aese $ctr1b, $rk1 \n aesmc $ctr1b, $ctr1b // AES block 1 - round 1
ldr $h3q, [$Htable, #48] // load h3l | h3h
ext $h3b, $h3b, $h3b, #8
aese $ctr3b, $rk0 \n aesmc $ctr3b, $ctr3b // AES block 3 - round 0
aese $ctr2b, $rk1 \n aesmc $ctr2b, $ctr2b // AES block 2 - round 1
ldr $rk4q, [$cc, #64] // load rk4
aese $ctr1b, $rk2 \n aesmc $ctr1b, $ctr1b // AES block 1 - round 2
ldr $h2q, [$Htable, #32] // load h2l | h2h
ext $h2b, $h2b, $h2b, #8
aese $ctr3b, $rk1 \n aesmc $ctr3b, $ctr3b // AES block 3 - round 1
ldr $rk12q, [$cc, #192] // load rk12
aese $ctr2b, $rk2 \n aesmc $ctr2b, $ctr2b // AES block 2 - round 2
ldr $h4q, [$Htable, #80] // load h4l | h4h
ext $h4b, $h4b, $h4b, #8
aese $ctr1b, $rk3 \n aesmc $ctr1b, $ctr1b // AES block 1 - round 3
ldr $rk11q, [$cc, #176] // load rk11
aese $ctr3b, $rk2 \n aesmc $ctr3b, $ctr3b // AES block 3 - round 2
Expand All @@ -391,7 +388,6 @@
ldr $rk9q, [$cc, #144] // load rk9
aese $ctr0b, $rk6 \n aesmc $ctr0b, $ctr0b // AES block 0 - round 6
ldr $h1q, [$Htable] // load h1l | h1h
ext $h1b, $h1b, $h1b, #8
aese $ctr2b, $rk6 \n aesmc $ctr2b, $ctr2b // AES block 2 - round 6
ldr $rk10q, [$cc, #160] // load rk10
aese $ctr1b, $rk7 \n aesmc $ctr1b, $ctr1b // AES block 1 - round 7
Expand Down Expand Up @@ -962,13 +958,10 @@
ldr $rk1q, [$cc, #16] // load rk1
aese $ctr0b, $rk0 \n aesmc $ctr0b, $ctr0b // AES block 0 - round 0
ldr $h3q, [$Htable, #48] // load h3l | h3h
ext $h3b, $h3b, $h3b, #8
aese $ctr3b, $rk0 \n aesmc $ctr3b, $ctr3b // AES block 3 - round 0
ldr $h4q, [$Htable, #80] // load h4l | h4h
ext $h4b, $h4b, $h4b, #8
aese $ctr1b, $rk0 \n aesmc $ctr1b, $ctr1b // AES block 1 - round 0
ldr $h2q, [$Htable, #32] // load h2l | h2h
ext $h2b, $h2b, $h2b, #8
aese $ctr2b, $rk0 \n aesmc $ctr2b, $ctr2b // AES block 2 - round 0
ldr $rk2q, [$cc, #32] // load rk2
aese $ctr0b, $rk1 \n aesmc $ctr0b, $ctr0b // AES block 0 - round 1
Expand All @@ -982,7 +975,6 @@
ldr $rk12q, [$cc, #192] // load rk12
aese $ctr0b, $rk2 \n aesmc $ctr0b, $ctr0b // AES block 0 - round 2
ldr $h1q, [$Htable] // load h1l | h1h
ext $h1b, $h1b, $h1b, #8
aese $ctr2b, $rk2 \n aesmc $ctr2b, $ctr2b // AES block 2 - round 2
ldr $rk10q, [$cc, #160] // load rk10
aese $ctr3b, $rk2 \n aesmc $ctr3b, $ctr3b // AES block 3 - round 2
Expand Down
75 changes: 43 additions & 32 deletions crypto/fipsmodule/modes/asm/ghashv8-armx.pl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hanno-becker for this change. I suggest that since you dove into the details of this implementation to add comments at the beginning to explain what's calculated and where it is stored in the H table, maybe using ASCII representation of the table.

Copy link
Contributor Author

@hanno-becker hanno-becker Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nebeid It is a good idea to document better what is being stored in the HTable. However, it is not necessary to vet this PR, I think: The main point is that certain entries in the HTable are always swapped right after loading -- so one may just store the swapped versions to begin with. This does not rely on knowledge of what it is that is being stored.

Original file line number Diff line number Diff line change
Expand Up @@ -113,13 +113,14 @@
vand $t0,$t0,$t1
vorr $IN,$IN,$t2 @ H<<<=1
veor $H,$IN,$t0 @ twisted H
vext.8 $H, $H, $H, #8
vst1.64 {$H},[x0],#16 @ store Htable[0]

@ calculate H^2
@ calculate H^2
vext.8 $t0,$H,$H,#8 @ Karatsuba pre-processing
vpmull.p64 $Xl,$H,$H
vpmull2.p64 $Xl,$H,$H
veor $t0,$t0,$H
vpmull2.p64 $Xh,$H,$H
vpmull.p64 $Xh,$H,$H
vpmull.p64 $Xm,$t0,$t0

vext.8 $t1,$Xl,$Xh,#8 @ Karatsuba post-processing
Expand All @@ -135,23 +136,25 @@
vext.8 $t2,$Xl,$Xl,#8 @ 2nd phase
vpmull.p64 $Xl,$Xl,$xC2
veor $t2,$t2,$Xh
veor $H2,$Xl,$t2
veor $t1,$Xl,$t2

vext.8 $t1,$H2,$H2,#8 @ Karatsuba pre-processing
vext.8 $H2,$t1,$t1,#8 @ Karatsuba pre-processing
veor $t1,$t1,$H2
vext.8 $Hhl,$t0,$t1,#8 @ pack Karatsuba pre-processed
vst1.64 {$Hhl-$H2},[x0],#32 @ store Htable[1..2]
vst1.64 {$Hhl},[x0],#16 @ store Htable[1..2]
vst1.64 {$H2},[x0],#16 @ store Htable[1..2]
___
if ($flavour =~ /64/) {
my ($t3,$Yl,$Ym,$Yh) = map("q$_",(4..7));
my ($H3,$H34k,$H4,$H5,$H56k,$H6,$H7,$H78k,$H8) = map("q$_",(15..23));

$code.=<<___;

@ calculate H^3 and H^4
vpmull.p64 $Xl,$H, $H2
vpmull.p64 $Yl,$H2,$H2
vpmull2.p64 $Xh,$H, $H2
vpmull2.p64 $Yh,$H2,$H2
vpmull2.p64 $Xl,$H, $H2
vpmull2.p64 $Yl,$H2,$H2
vpmull.p64 $Xh,$H, $H2
vpmull.p64 $Yh,$H2,$H2
vpmull.p64 $Xm,$t0,$t1
vpmull.p64 $Ym,$t1,$t1

Expand Down Expand Up @@ -180,23 +183,23 @@
veor $t2,$t2,$Xh
veor $t3,$t3,$Yh

veor $H3, $Xl,$t2 @ H^3
veor $H4,$Yl,$t3 @ H^4
veor $t0, $Xl,$t2 @ H^3
veor $t1, $Yl,$t3 @ H^4

vext.8 $t0,$H3, $H3,#8 @ Karatsuba pre-processing
vext.8 $t1,$H4,$H4,#8
vext.8 $H3,$t0,$t0,#8 @ Karatsuba pre-processing
vext.8 $H4,$t1,$t1,#8
vext.8 $t2,$H2,$H2,#8
veor $t0,$t0,$H3
veor $t1,$t1,$H4
veor $t2,$t2,$H2
vext.8 $H34k,$t0,$t1,#8 @ pack Karatsuba pre-processed
vext.8 $H34k,$t0,$t1,#8 @ pack Karatsuba pre-processed
vst1.64 {$H3-$H4},[x0],#48 @ store Htable[3..5]

@ calculate H^5 and H^6
vpmull.p64 $Xl,$H2, $H3
vpmull.p64 $Yl,$H3,$H3
vpmull2.p64 $Xh,$H2, $H3
vpmull2.p64 $Yh,$H3,$H3
vpmull2.p64 $Xl,$H2, $H3
vpmull2.p64 $Yl,$H3,$H3
vpmull.p64 $Xh,$H2, $H3
vpmull.p64 $Yh,$H3,$H3
vpmull.p64 $Xm,$t0,$t2
vpmull.p64 $Ym,$t0,$t0

Expand All @@ -223,12 +226,13 @@
vpmull.p64 $Xl,$Xl,$xC2
vpmull.p64 $Yl,$Yl,$xC2
veor $t2,$t2,$Xh
veor $t3,$t3,$Yh
veor $H5,$Xl,$t2 @ H^5
veor $H6,$Yl,$t3 @ H^6
veor $t3,$t3,$Yh

veor $t0,$Xl,$t2 @ H^5
veor $t1,$Yl,$t3 @ H^6

vext.8 $t0,$H5, $H5,#8 @ Karatsuba pre-processing
vext.8 $t1,$H6,$H6,#8
vext.8 $H5, $t0, $t0,#8 @ Karatsuba pre-processing
vext.8 $H6, $t1, $t1,#8
vext.8 $t2,$H2,$H2,#8
veor $t0,$t0,$H5
veor $t1,$t1,$H6
Expand All @@ -237,10 +241,10 @@
vst1.64 {$H5-$H6},[x0],#48 @ store Htable[6..8]

@ calculate H^7 and H^8
vpmull.p64 $Xl,$H2,$H5
vpmull.p64 $Yl,$H2,$H6
vpmull2.p64 $Xh,$H2,$H5
vpmull2.p64 $Yh,$H2,$H6
vpmull2.p64 $Xl,$H2,$H5
vpmull2.p64 $Yl,$H2,$H6
vpmull.p64 $Xh,$H2,$H5
vpmull.p64 $Yh,$H2,$H6
vpmull.p64 $Xm,$t0,$t2
vpmull.p64 $Ym,$t1,$t2

Expand Down Expand Up @@ -268,11 +272,11 @@
vpmull.p64 $Yl,$Yl,$xC2
veor $t2,$t2,$Xh
veor $t3,$t3,$Yh
veor $H7,$Xl,$t2 @ H^7
veor $H8,$Yl,$t3 @ H^8
veor $t0,$Xl,$t2 @ H^7
veor $t1,$Yl,$t3 @ H^8

vext.8 $t0,$H7,$H7,#8 @ Karatsuba pre-processing
vext.8 $t1,$H8,$H8,#8
vext.8 $H7,$t0,$t0,#8 @ Karatsuba pre-processing
vext.8 $H8,$t1,$t1,#8
veor $t0,$t0,$H7
veor $t1,$t1,$H8
vext.8 $H78k,$t0,$t1,#8 @ pack Karatsuba pre-processed
Expand All @@ -299,6 +303,7 @@
vld1.64 {$t1},[$Xi] @ load Xi
vmov.i8 $xC2,#0xe1
vld1.64 {$H-$Hhl},[$Htbl] @ load twisted H, ...
vext.8 $H,$H,$H,#8
vshl.u64 $xC2,$xC2,#57
#ifndef __ARMEB__
vrev64.8 $t1,$t1
Expand Down Expand Up @@ -375,8 +380,10 @@
@ loaded twice, but last
@ copy is not processed
vld1.64 {$H-$Hhl},[$Htbl],#32 @ load twisted H, ..., H^2
vext.8 $H,$H,$H,#8
vmov.i8 $xC2,#0xe1
vld1.64 {$H2},[$Htbl]
vext.8 $H2,$H2,$H2,#8
cclr $inc,eq @ is it time to zero $inc?
vext.8 $Xl,$Xl,$Xl,#8 @ rotate Xi
vld1.64 {$t0},[$inp],#16 @ load [rotated] I[0]
Expand Down Expand Up @@ -513,8 +520,12 @@
.Lgcm_ghash_v8_4x:
vld1.64 {$Xl},[$Xi] @ load [rotated] Xi
vld1.64 {$H-$H2},[$Htbl],#48 @ load twisted H, ..., H^2
vext.8 $H,$H,$H,#8
vext.8 $H2,$H2,$H2,#8
vmov.i8 $xC2,#0xe1
vld1.64 {$H3-$H4},[$Htbl] @ load twisted H^3, ..., H^4
vext.8 $H3,$H3,$H3,#8
vext.8 $H4,$H4,$H4,#8
vshl.u64 $xC2,$xC2,#57 @ compose 0xc2.0 constant

vld1.64 {$I0-$j3},[$inp],#64
Expand Down
Loading
Loading