Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cryptest: armv7l: failed to generate expected vector load/store instructions #481

Closed
anonimal opened this issue Aug 30, 2017 · 13 comments
Closed

Comments

@anonimal
Copy link
Contributor

cryptest-result_armv7l-rev4.txt

processor       : 0
model name      : ARMv7 Processor rev 4 (v7l)
BogoMIPS        : 50.52
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 4
Testing: ARM NEON code generation
g++ -DNDEBUG -g2 -O2 -fPIC -pipe -march=armv7-a -mfloat-abi=hard -mfpu=neon -c aria-simd.cpp
ERROR: failed to generate expected vector load instructions
ERROR: failed to generate expected vector store instructions

Referencing #480. Possibly related to monero-project/kovri#699?

@noloader
Copy link
Collaborator

noloader commented Aug 30, 2017

I believe this failure is common. When I see this one I usually ignore it. To say more, I would need to see the disassembly.

You can produce the disassembly by hand:

g++ -DNDEBUG -g2 -O2 -fPIC -pipe -march=armv7-a -mfloat-abi=hard -mfpu=neon -c aria-simd.cpp
objdump --disassemble aria-simd.o

The test is performed in cryptest.sh : 1440. What we are looking for is, in ARIA_UncheckedSetKey_Schedule_NEON, look at ARIA_GSRK_NEON. In the past ARIA_GSRK_NEON used to perform a vector load and save. There used to be over 50 of them.

Now, we perform the vector load ARIA_UncheckedSetKey_Schedule_NEON, so only 4 loads occur. ARIA_GSRK_NEON still performs the saves, so there's about 20 of them. So cryptest.sh is out-of-sync:

# ARIA::UncheckedKeySet: 8 vld1q.32
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c 'vld')
if [[ ("$COUNT" -lt "8") ]]; then
    FAILED=1
    echo "ERROR: failed to generate expected vector load instructions" ...

Its is also possible vld1q.32 is using a different mnemonic than we are searching for. I usually see that from Clang.

With that said, this looks wrong:

g++ -DNDEBUG -g2 -O2 ...

We should be at -O3. That's when GCC applies vectorization (or more heavily applies it).

noloader added a commit that referenced this issue Aug 30, 2017
cryptest.sh still needs some tweaking from the recent changes
noloader referenced this issue Aug 30, 2017
This change should have been made with Commit 18a0565
@anonimal
Copy link
Contributor Author

To say more, I would need to see the disassembly.

aria-simd.objdump.txt

We should be at -O3. That's when GCC applies vectorization (or more heavily applies it).

1aecb3d confirmed. Logs attached in #480 (comment).

@noloader
Copy link
Collaborator

noloader commented Aug 30, 2017

aria-simd.objdump.txt

Here another example that causes the failures. Its called Multiple Register Transfer. You have one at offset 0x20, 0x3e, 0x4e, 0x5e, ... A single load instruction is actually executing multiple loads and incrementing pointers behind the scenes.

   0:	b5f0      	push	{r4, r5, r6, r7, lr}
   2:	f101 0420 	add.w	r4, r1, #32
   6:	ed2d 8b10 	vpush	{d8-d15}
   a:	f101 0330 	add.w	r3, r1, #48	; 0x30
   e:	f101 0740 	add.w	r7, r1, #64	; 0x40
  12:	f100 0610 	add.w	r6, r0, #16
  16:	f100 0520 	add.w	r5, r0, #32
  1a:	f100 0e60 	add.w	lr, r0, #96	; 0x60
  1e:	2a10      	cmp	r2, #16
  20:	f964 4a8f 	vld1.32	{d20-d21}, [r4]
  ...
  3e:	f961 0a8f 	vld1.32	{d16-d17}, [r1]
  ...
  4e:	f963 2a8f 	vld1.32	{d18-d19}, [r3]
  ...

For whatever reason the compiler determined it was more profitable to perform the double-D loads rather than a 1-Q load. Or maybe NEON cannot load a Q register due to lack of instruction support. Or maybe they are the same loads, but the mnemonics are different depending on the version of Binutils.

As I understand things, 2D = 1Q (and also 16B = 8S = 4W = 2D =1Q). For the load at 0x20, D20-21 is just Q10. They are just different views of the same register bank. Also see 1.3.2. NEON registers in the ARM ARM.

I think we can safely ignore this one. I opened a question on Stack Overflow at Difference between vld1.32 {d20-d21} and vld1q q10?.

noloader added a commit that referenced this issue Aug 30, 2017
The tests were already present; they just needed some tuning
@noloader
Copy link
Collaborator

I think Commit de8478af2ab5 cleared the issue. It did two things. First, it added vld1d.32 {d1,d2} awareness. Second, it reduced the minimum count required for success.

@anonimal
Copy link
Contributor Author

I think Commit de8478a cleared the issue.

cryptest-result_armv7l-rev4.txt

Is it still safe to ignore?

@noloader
Copy link
Collaborator

noloader commented Aug 31, 2017

That's weird. Did you pull the latest?

According to the objdump, the gadget is producing 6 vld1.32 {d1,d2} insns. We added this test:

# ARIA::UncheckedKeySet: 6 vld1.32 {d1,d2}
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vld1.32[[:space:]]*{')
if [[ ("$COUNT" -lt "6") ]]; then
    FAILED=1
    echo "ERROR: failed to generate expected vector load instructions"
fi

The objdump also indicates 19 vst1.32 {d1,d2} insns. And we added this test:

# ARIA::UncheckedKeySet: 16 vstr1.32 {d1,d2}
COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vst1.32[[:space:]]*{')
if [[ ("$COUNT" -lt "16") ]]; then
    FAILED=1
    echo "ERROR: failed to generate expected NEON store instructions"
fi

Both should succeed.

@anonimal
Copy link
Contributor Author

Did you pull the latest?

Sorry, my mistake: I did pull latest but that was on a separate machine (which this machine then needed to then pull from).

Both should succeed.

But they do, don't they? This patch:

diff --git a/cryptest.sh b/cryptest.sh
index 71bfaa5..4c1acbd 100755
--- a/cryptest.sh
+++ b/cryptest.sh
@@ -1308,6 +1308,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
                if [[ ("$HAVE_ARMV8A") ]]; then
                        # ARIA::UncheckedKeySet: 8 vld1q.32
                        COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c 'vld')
+                        echo "ARIA::UncheckedKeySet: 8 vld1q.32 - our count: " $COUNT
                        if [[ ("$COUNT" -lt "8") ]]; then
                                FAILED=1
                                echo "ERROR: failed to generate NEON load instructions" | tee -a "$TEST_RESULTS"
@@ -1315,6 +1316,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
                else  # ARMv7
                        # ARIA::UncheckedKeySet: 6 vld1.32 {d1,d2}
                        COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vld1.32[[:space:]]*{')
+                        echo "ARIA::UncheckedKeySet: 6 vld1.32 {d1,d2} - our count: " $COUNT
                        if [[ ("$COUNT" -lt "6") ]]; then
                                FAILED=1
                                echo "ERROR: failed to generate NEON load instructions" | tee -a "$TEST_RESULTS"
@@ -1324,6 +1326,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
                if [[ ("$HAVE_ARMV8A") ]]; then
                        # ARIA::UncheckedKeySet: 20 vstr1q.32
                        COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c 'vst')
+                        echo "ARIA::UncheckedKeySet: 20 vstr1q.32 - our count: " $COUNT
                        if [[ ("$COUNT" -lt "20") ]]; then
                                FAILED=1
                                echo "ERROR: failed to generate NEON store instructions" | tee -a "$TEST_RESULTS"
@@ -1331,6 +1334,7 @@ if [[ ("$HAVE_DISASS" -ne "0" && ("$IS_ARM32" -ne "0" || "$IS_ARM64" -ne "0")) ]
                else
                        # ARIA::UncheckedKeySet: 16 vstr1.32 {d1,d2}
                        COUNT=$(echo -n "$DISASS_TEXT" | "$GREP" -i -c -E 'vst1.32[[:space:]]*{')
+                        echo "ARIA::UncheckedKeySet: 16 vstr1.32 {d1,d2} - our count: " $COUNT
                        if [[ ("$COUNT" -lt "16") ]]; then
                                FAILED=1
                                echo "ERROR: failed to generate NEON store instructions" | tee -a "$TEST_RESULTS"

with: Git branch: master (commit 5cd854b2d3cff4f8)

produces:

************************************
Testing: ARM NEON code generation

g++ -DNDEBUG -g2 -O3 -fPIC -pipe -march=armv7-a -mfloat-abi=hard -mfpu=neon -c aria-simd.cpp
ARIA::UncheckedKeySet: 8 vld1q.32 - our count:  7
ERROR: failed to generate NEON load instructions
ARIA::UncheckedKeySet: 20 vstr1q.32 - our count:  19
ERROR: failed to generate NEON store instructions

************************************

@noloader
Copy link
Collaborator

noloader commented Aug 31, 2017

@anonimal,

Both should succeed.

But they do, don't they?

Does that mean if [[ ("$COUNT" -lt "6") ]] and if [[ ("$COUNT" -lt "16") ]] are broke? In the universe I live in, 7 is greater than 6; and 19 is greater than 16. We should never enter the block and echo "ERROR: failed to generate NEON....

@anonimal
Copy link
Contributor Author

Does that mean if [[ ("$COUNT" -lt "6") ]] and if [[ ("$COUNT" -lt "16") ]] are broke?

No, not that I can see.

In the universe I live in, 7 is greater than 6; and 19 is greater than 16. We should never enter the block

And we don't. The if [[ ("$HAVE_ARMV8A") ]]; then blocks are always executed, not the else blocks which contain if [[ ("$COUNT" -lt "6") ]] and if [[ ("$COUNT" -lt "16") ]].

@noloader
Copy link
Collaborator

noloader commented Sep 2, 2017

The if [[ ("$HAVE_ARMV8A") ]]; then blocks are always executed, not the else blocks

Ack. Try Commit 5b12be29e673.

@anonimal
Copy link
Contributor Author

anonimal commented Sep 2, 2017

Commit 5b12be2.

************************************
Testing: ARM NEON code generation

g++ -DNDEBUG -g2 -O3 -fPIC -pthread -pipe -march=armv7-a -mfloat-abi=hard -mfpu=neon -c aria-simd.cpp
Verified NEON load, store, shfit left, shift right, xor machine instructions

************************************

👍 thanks @noloader, should we close as resolved?

@noloader
Copy link
Collaborator

noloader commented Sep 2, 2017

thanks @noloader, should we close as resolved?

Yeah, I think we are OK.

I tested locally on my BananaPi (ARMv7-a) and CubieTruck (ARMv7-a), and my HiKey (ARMv8-a).

Previously I was just testing under my BananaPi. I don't know why my BananaPi was returning success when you witnessed failures.

@anonimal
Copy link
Contributor Author

anonimal commented Sep 6, 2017

Thank you for testing 😄

BananaPi

I look forward to getting my hands on one.

@anonimal anonimal closed this as completed Sep 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants