Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable valgrind tests on AArch64 by setting ARMCAP via static defines to avoid reading from MIDR_EL1. #978

Merged
merged 24 commits into from
May 5, 2023

Conversation

nebeid
Copy link
Contributor

@nebeid nebeid commented Apr 25, 2023

Issues:

V886376127

Description of changes:

Enable Valgrind tests on Arm as a follow-up to #970.
Valgrind was choking (illegal opcode) on

__asm__ volatile("mrs %0, MIDR_EL1" : "=r" (val));

Despite that this issue is said to be resolved, versions 3.19 and 3.20 were outputting the same error of "illegal opcode". (Note: The newer versions solved the issue we had with reading the capabilities at runtime via getauxval(). But then we added to that getting the CPU part as mentioned above.)

Our workaround was to avoid compiling cpu_aarch64_linux.c by passing -DOPENSSL_STATIC_ARMCAP to the test build and setting all capabilities via the STATIC_ARMCAP macros (except SHA512 which wasn't used before with valgrind and was also causing an "illegal opcode" error when set).
Note: Setting -DOPENSSL_STATIC_ARMCAP without specifying any capabilities disables all of them as in the second
set of tests.

Also, the ssl runner valgrind tests are here made to run in parallel to the valgrind tests (only with the build where all capabilities are set).

Call-outs:

  • Timeout for ssl tests was increased to 60 min (from 10 min), due to valgrind requiring more time.
  • ssl tests are performed only with all capabilities enabled; and not when they are disabled.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and
the ISC license.

@nebeid
Copy link
Contributor Author

nebeid commented Apr 26, 2023

I disabled SHA512 after I tried running tests/ci/run_valgrind_tests.sh on v 1.8.0 with the following line
https://github.com/aws/aws-lc/blame/b78d997395d543e557b1309dd9f4b3522b3a4118/util/all_tests.json#L27
changed to

"env": ["OPENSSL_armcap=0x7D"]

and getting the following response

crypto/crypto_test [shard 4/4]
crypto/crypto_test failed to complete: exit status 99
crypto/crypto_test [shard 3/4]
crypto/crypto_test [shard 1/4] (custom environment)
crypto/crypto_test [shard 2/4] (custom environment)
crypto/crypto_test [shard 3/4] (custom environment)
crypto/crypto_test [shard 4/4] (custom environment)
Fatal Error: HW capability found: 0x3D, but HW capability requested: 0x7D.
OPENSSL_armcap=0x7D crypto/crypto_test [shard 1/4]
crypto/crypto_test failed to complete: exit status 1
Fatal Error: HW capability found: 0x3D, but HW capability requested: 0x7D.
OPENSSL_armcap=0x7D crypto/crypto_test [shard 2/4]
crypto/crypto_test failed to complete: exit status 1
Fatal Error: HW capability found: 0x3D, but HW capability requested: 0x7D.
OPENSSL_armcap=0x7D crypto/crypto_test [shard 3/4]
crypto/crypto_test failed to complete: exit status 1
Fatal Error: HW capability found: 0x3D, but HW capability requested: 0x7D.
OPENSSL_armcap=0x7D crypto/crypto_test [shard 4/4]

@nebeid nebeid changed the title Set armcap with static defines to avoid reading from MIDR_EL1. Enable valgrind tests on AArch64 by setting ARMCAP via static defines to avoid reading from MIDR_EL1. Apr 27, 2023
@nebeid nebeid marked this pull request as ready for review April 27, 2023 22:23
@nebeid

This comment was marked as duplicate.

@nebeid nebeid marked this pull request as draft April 28, 2023 20:02
@nebeid
Copy link
Contributor Author

nebeid commented May 1, 2023

As expected, when the buffer overread code was re-introduced in 5e4b920, linux-arm build failed with the following messages:

Line 577

[----------] 1 test from XTSTest
[ RUN      ] XTSTest.TestVectors
==5279== Invalid read of size 16
==5279==    at 0x8D0314: aes_hw_xts_decrypt (aesv8-armx.S:1875)
==5279==  Address 0x50579c0 is 0 bytes after a block of size 144 alloc'd
==5279==    at 0x48A3FCC: operator new(unsigned long) (vg_replace_malloc.c:422)
==5279==    by 0x4EC3D3: __gnu_cxx::new_allocator<unsigned char>::allocate(unsigned long, void const*) (new_allocator.h:127)
==5279==    by 0x4EA7D3: std::allocator_traits<std::allocator<unsigned char> >::allocate(std::allocator<unsigned char>&, unsigned long) (alloc_traits.h:464)
==5279==    by 0x4E7F47: std::_Vector_base<unsigned char, std::allocator<unsigned char> >::_M_allocate(unsigned long) (stl_vector.h:346)
==5279==    by 0x7D0BD3: std::vector<unsigned char, std::allocator<unsigned char> >::reserve(unsigned long) (vector.tcc:78)
==5279==    by 0x7D0973: DecodeHex(std::vector<unsigned char, std::allocator<unsigned char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (test_util.cc:47)
==5279==    by 0x68EB33: XTSTest_TestVectors_Test::TestBody() (xts_test.cc:630)
==5279==    by 0x94465F: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2605)
==5279==    by 0x93EF9F: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2660)
==5279==    by 0x9236AF: testing::Test::Run() (gtest.cc:2680)
==5279==    by 0x923DA7: testing::TestInfo::Run() (gtest.cc:2858)
==5279==    by 0x92448B: testing::TestSuite::Run() (gtest.cc:3012)
==5279==
[       OK ] XTSTest.TestVectors (109 ms)
[----------] 1 test from XTSTest (109 ms total)

Line 1005

1 of 19 tests failed:
    crypto/crypto_test [shard 1/8]
exit status 1
FAILED: CMakeFiles/run_tests_valgrind /codebuild/output/src053248141/src/github.com/aws/aws-lc/test_build_dir/CMakeFiles/run_tests_
cd /codebuild/output/src053248141/src/github.com/aws/aws-lc && /usr/local/go/bin/go run util/all_tests.go -build-dir /codebuild/output/src053248141/src/github.com/aws/aws-lc/test_build_dir -valgrind=true -valgrind-supp-dir tests/ci
ninja: build stopped: subcommand failed.

[Container] 2023/05/01 14:02:08 Command did not exit successfully ./tests/ci/run_valgrind_tests.sh exit status 1
[Container] 2023/05/01 14:02:08 Phase complete: BUILD State: FAILED
[Container] 2023/05/01 14:02:08 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: ./tests/ci/run_valgrind_tests.sh. Reason: exit status 1

@nebeid nebeid marked this pull request as ready for review May 1, 2023 21:28
@nebeid nebeid requested a review from skmcgrail May 1, 2023 21:29
@nebeid nebeid requested review from andrewhop and samuel40791765 May 1, 2023 21:29
@nebeid nebeid merged commit 4edaef5 into aws:main May 5, 2023
WillChilds-Klein pushed a commit to WillChilds-Klein/aws-lc that referenced this pull request May 8, 2023
This was only possible by setting OPENSSL_armcap via static defines in order to avoid reading from MIDR_EL1.
Valgrind was outputting an illegal opcode error on 
__asm__ volatile("mrs %0, MIDR_EL1" : "=r" (val));

This commit avoids compiling cpu_aarch64_linux.c in valgrind build by passing -DOPENSSL_STATIC_ARMCAP to it and setting all capabilities via the STATIC_ARMCAP macros (except SHA512 which wasn't used before with valgrind and was also causing an "illegal opcode" error when set).
Note: Setting -DOPENSSL_STATIC_ARMCAP without specifying any capabilities disables all of them as in the second
set of tests.

The ssl runner valgrind tests are here made to run in parallel to the valgrind tests (only with the build where all capabilities are set).
@nebeid
Copy link
Contributor Author

nebeid commented May 16, 2023

Related to this PR: PR #1006 Check HWCAP_CPUID before reading MIDR_EL1 on aarch64. It seems that Valgrind's fix to the issue, was to mask HWCAP_CPUID as was suggested in the workaround here.
That PR would have fixed the issue we had here with Valgrind but wouldn't enable the code that runs on CPUs for which we check the parts (Neoverse V1, so far). To enable this code, this PR sets the CPU part at compile time bypassing runtime checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants