Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault during tests on FreeBSD #204

Closed
ararslan opened this issue May 11, 2018 · 14 comments
Closed

Segfault during tests on FreeBSD #204

ararslan opened this issue May 11, 2018 · 14 comments

Comments

@ararslan
Copy link
Contributor

FreeBSD 11.1-RELEASE-p10 amd64 (x86-64), cc is Clang 4.0.0. Running on Intel Core i5-2520M @ 2.50GHz. BLIS commit 4b72a46. No changes made to any code.

Steps:

./configure --enable-blas --enable-cblas --enable-shared CC=cc auto
gmake
gmake test

Error:

Running test_libblis.x with output redirected to 'output.testsuite'
bash: line 2: 17434 Segmentation fault      (core dumped) ./test_libblis.x -g ./testsuite/input.general -o ./testsuite/input.operations > output.testsuite
gmake: *** [Makefile:730: testsuite-run] Error 139

Backtrace from LLDB:

(lldb) thread backtrace all
* thread #1, name = 'test_libblis.x', stop reason = signal SIGSEGV
  * frame #0: 0x000000000047cba4 test_libblis.x`bli_sgemm_sandybridge_asm_8x8 + 1140
    frame #1: 0x00000000004634c6 test_libblis.x`bli_gemm_ukernel + 486
    frame #2: 0x0000000000411bed test_libblis.x`libblis_test_gemm_ukr_experiment + 973
    frame #3: 0x000000000041a869 test_libblis.x`libblis_test_op_driver + 2089
    frame #4: 0x0000000000411811 test_libblis.x`libblis_test_gemm_ukr + 273
    frame #5: 0x0000000000416bc5 test_libblis.x`main + 613
    frame #6: 0x000000000040b08f test_libblis.x`_start + 383

The full configuration, build, and test log, as well as more thorough output from LLDB, is here: https://gist.github.com/ararslan/aeab688c2eb2a7432d64c6e2f5169f23.

@devinamatthews
Copy link
Member

@fgvanzee if you want to spin an OpenBSD VM feel free to debug or I can take a look next week.

@ararslan
Copy link
Contributor Author

if you want to spin an OpenBSD VM

Just to clarify, this issue is about FreeBSD, whereas #201 and #202 are about OpenBSD. I've been trying BLIS out on both. 🙂

@fgvanzee
Copy link
Member

@devinamatthews I don't know how to do that (yet). :) Knock yourself out.

@ararslan
Copy link
Contributor Author

Here's one way to do it:

  • Get a hypervisor, e.g. https://www.virtualbox.org/
  • Download a FreeBSD 11.1 virtual machine image from https://download.freebsd.org/ftp/releases/VM-IMAGES/11.1-RELEASE/amd64/Latest/
  • Import the virtual machine image into VirtualBox
  • Once you have the VM set up, log in as root and do:
    pkg update # this will bootstrap pkg, the package manager
    pkg install git gmake bash # bash is required due to /usr/bin/env bash calls in blis
    
  • As a user, do:
    git clone https://github.com/flame/blis.git
    cd blis
    ./configure --enable-blas --enable-cblas --enable-shared auto
    gmake
    gmake test
    
  • This will likely segfault, and you can inspect the coredump with
    lldb test_libblis.x -c test_libblis.x.core
    

@fgvanzee
Copy link
Member

@ararslan We found some integer-related bugs in various microkernels, one of them being sandybridge. Ideally, this segfault would be related to those bugs. I'll let you know when it's fixed.

@fgvanzee
Copy link
Member

@ararslan Would you mind trying the latest commit, 2e31dd7? We may have fixed your issue after stumbling upon it in a different context.

@ararslan
Copy link
Contributor Author

That seems to have done the trick! All tests pass on FreeBSD now with no segfaults. Thanks!

@fgvanzee
Copy link
Member

More good news! :)

@devinamatthews
Copy link
Member

@ararslan how big is long int on FreeBSD amd64? I only think a plain ./configure would cause problems before 2e31dd7 if inc_t == long int was smaller than 64 bits.

@ararslan
Copy link
Contributor Author

sizeof(long int) says 8 on my system, so I guess 64 bits.

@devinamatthews
Copy link
Member

@fgvanzee I can't see why this would fix any problems on x86_64 BSD or Linux, as inc_t is already 64-bit. Ideas?

@fgvanzee
Copy link
Member

fgvanzee commented May 16, 2018

@devinamatthews Did you see 12dfa95, which was committed just before 2e31dd7? The details of the commit log will hopefully explain the answer to your question.

TL;DR: Basically, I messed up the order of #include for some of the headers, which inadvertently forced 32-bit integers on systems that should actually have 64-bit integers (but only when the user did not specify an integer size via configure).

@ararslan
Copy link
Contributor Author

Yes, it seems more likely that 12dfa95 was indeed the one that fixed it, but I can bisect if it's important.

@devinamatthews
Copy link
Member

Ah, that explains it. Good changes to the ukrs for possible Windows support though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants