Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/cgo/test: morestack on g0 on Solaris under "unlimit stacksize" #12210

Closed
fazalmajid opened this issue Aug 19, 2015 · 53 comments
Closed

misc/cgo/test: morestack on g0 on Solaris under "unlimit stacksize" #12210

fazalmajid opened this issue Aug 19, 2015 · 53 comments
Milestone

Comments

@fazalmajid
Copy link

At least on my SmartOS box running joyent_20150514T133314Z:

##### ../misc/cgo/test
SIGTRAP: trace trap (not reset when caught)
PC=0x4a8236 m=0
signal arrived during cgo execution

goroutine 25 [running, locked to thread]:
runtime.morestack()
        /usr/local/go/src/runtime/asm_amd64.s:302 +0x26 fp=0xfffffd7fffdeed88 sp=0xfffffd7fffdeed80
created by testing.RunTests
        /usr/local/go/src/testing/testing.go:561 +0x86d

goroutine 1 [chan receive]:
runtime.gopark(0x69fe68, 0xc82009e418, 0x6535a0, 0xc, 0x17, 0x3)
        /usr/local/go/src/runtime/proc.go:185 +0x179 fp=0xc820059b28 sp=0xc820059b00
runtime.goparkunlock(0xc82009e418, 0x6535a0, 0xc, 0x17, 0x3)
        /usr/local/go/src/runtime/proc.go:191 +0x54 fp=0xc820059b60 sp=0xc820059b28
runtime.chanrecv(0x5bb9a0, 0xc82009e3c0, 0xc820059d78, 0xc820059c01, 0x4c0000)
        /usr/local/go/src/runtime/chan.go:448 +0x47b fp=0xc820059c20 sp=0xc820059b60
runtime.chanrecv1(0x5bb9a0, 0xc82009e3c0, 0xc820059d78)
        /usr/local/go/src/runtime/chan.go:349 +0x2b fp=0xc820059c50 sp=0xc820059c20
testing.RunTests(0x69fb08, 0x76ec20, 0x38, 0x38, 0x481401)
        /usr/local/go/src/testing/testing.go:562 +0x8ad fp=0xc820059dd0 sp=0xc820059c50
testing.(*M).Run(0xc820059ef8, 0xc820014140)
        /usr/local/go/src/testing/testing.go:494 +0x70 fp=0xc820059e58 sp=0xc820059dd0
main.main()
        _/home/majid/build/go-1.5/misc/cgo/test/_test/_testmain.go:166 +0x116 fp=0xc820059f50 sp=0xc820059e58
runtime.main()
        /usr/local/go/src/runtime/proc.go:111 +0x2cb fp=0xc820059fa0 sp=0xc820059f50
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820059fa8 sp=0xc820059fa0

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820048fb8 sp=0xc820048fb0

goroutine 2 [force gc (idle)]:
runtime.gopark(0x69fe68, 0x7807f0, 0x653e40, 0xf, 0x14, 0x1)
        /usr/local/go/src/runtime/proc.go:185 +0x179 fp=0xc820034758 sp=0xc820034730
runtime.goparkunlock(0x7807f0, 0x653e40, 0xf, 0xc820000114, 0x1)
        /usr/local/go/src/runtime/proc.go:191 +0x54 fp=0xc820034790 sp=0xc820034758
runtime.forcegchelper()
        /usr/local/go/src/runtime/proc.go:152 +0xc1 fp=0xc8200347c0 sp=0xc820034790
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc8200347c8 sp=0xc8200347c0
created by runtime.init.4
        /usr/local/go/src/runtime/proc.go:141 +0x2b

goroutine 3 [GC sweep wait]:
runtime.gopark(0x69fe68, 0x780980, 0x64c4d0, 0xd, 0x46ac14, 0x1)
        /usr/local/go/src/runtime/proc.go:185 +0x179 fp=0xc820034f48 sp=0xc820034f20
runtime.goparkunlock(0x780980, 0x64c4d0, 0xd, 0x14, 0x1)
        /usr/local/go/src/runtime/proc.go:191 +0x54 fp=0xc820034f80 sp=0xc820034f48
runtime.bgsweep(0xc82001e070)
        /usr/local/go/src/runtime/mgcsweep.go:51 +0xba fp=0xc820034fb8 sp=0xc820034f80
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820034fc0 sp=0xc820034fb8
created by runtime.gcenable
        /usr/local/go/src/runtime/mgc.go:205 +0x53

goroutine 4 [finalizer wait]:
runtime.gopark(0x69fe68, 0x7a4470, 0x653ab0, 0xe, 0x14, 0x1)
        /usr/local/go/src/runtime/proc.go:185 +0x179 fp=0xc820035718 sp=0xc8200356f0
runtime.goparkunlock(0x7a4470, 0x653ab0, 0xe, 0x14, 0x1)
        /usr/local/go/src/runtime/proc.go:191 +0x54 fp=0xc820035750 sp=0xc820035718
runtime.runfinq()
        /usr/local/go/src/runtime/mfinal.go:154 +0xb3 fp=0xc8200357c0 sp=0xc820035750
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc8200357c8 sp=0xc8200357c0
created by runtime.createfing
        /usr/local/go/src/runtime/mfinal.go:135 +0x60

goroutine 5 [syscall]:
runtime.notetsleepg(0x7a4660, 0xffffffffffffffff, 0x1)
        /usr/local/go/src/runtime/lock_sema.go:264 +0x8f fp=0xc820035f40 sp=0xc820035f00
runtime.signal_recv(0x0)
        /usr/local/go/src/runtime/sigqueue.go:111 +0x132 fp=0xc820035f78 sp=0xc820035f40
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:22 +0x18 fp=0xc820035fc0 sp=0xc820035f78
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820035fc8 sp=0xc820035fc0
created by os/signal.init.1
        /usr/local/go/src/os/signal/signal_unix.go:28 +0x37

goroutine 6 [syscall, locked to thread]:
runtime.cgocall(0x44d620, 0xc8200367b0, 0x0)
        /usr/local/go/src/runtime/cgocall.go:120 +0x11d fp=0xc820036778 sp=0xc820036748
_/home/majid/build/go-1.5/misc/cgo/test._Cfunc_usleep(0xc800002710, 0xc800000000)
        ??:0 +0x35 fp=0xc8200367b0 sp=0xc820036778
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc8200367b8 sp=0xc8200367b0
created by _/home/majid/build/go-1.5/misc/cgo/test.lockOSThreadCallback
        /home/majid/build/go-1.5/misc/cgo/test/issue3775.go:35 +0x100

rax    0xc8200bc000
rbx    0x7812e0
rcx    0xffffdd7fffaa2a40
rdx    0xc820030588
rdi    0x49e4c0
rsi    0x780d80
rbp    0xc8200305d0
rsp    0xfffffd7fffdeed80
r8     0xc8200bc000
r9     0xc820030668
r10    0x1
r11    0xfffffffffbc05648
r12    0x10
r13    0x69ab51
r14    0x1e
r15    0x8
rip    0x4a8236
rflags 0x246
cs     0x53
fs     0x0
gs     0x0
exit status 2
FAIL    _/home/majid/build/go-1.5/misc/cgo/test 0.009s
2015/08/19 15:07:27 Failed: exit status 1
@fazalmajid
Copy link
Author

The hardware is a HP Z230, 3.5GHz Intel Xeon E3-1270v3 Quad Core (8 with Hyperthreading), 8GB of RAM allocated to the zone.

@ianlancetaylor ianlancetaylor changed the title misc/cgo/test fails on Solaris using Go 1.5 cmd/cgo: misc/cgo/test fails on Solaris using Go 1.5 Aug 19, 2015
@ianlancetaylor ianlancetaylor added this to the Go1.5.1 milestone Aug 19, 2015
@binarycrusader
Copy link
Contributor

For what it's worth, this works on Oracle Solaris, so there's likely something subtle here.

@binarycrusader
Copy link
Contributor

@fazalmajid do you have gcc installed? If so, what version?

@minux
Copy link
Member

minux commented Aug 25, 2015 via email

@fazalmajid
Copy link
Author

Yes, GCC 5.2.0 built from source, defaulting to amd64. I build the bootstrap Go-1.4.2 using:
gcc -O2 -std=gnu11 -D__EXTENDED__=1 -D_XPG6=1
(the XPG and EXTENDED flags being required as GCC 5.2 defaults to C11 and some Solaris headers don't like that without stipulating XPG6 compliance.

@fazalmajid
Copy link
Author

I can try and bisect the tests to find out which one is triggering the SIGTRAP, if you'd like.

@rsc
Copy link
Contributor

rsc commented Aug 25, 2015

Bisecting would be great. Thanks.

@binarycrusader
Copy link
Contributor

@fazalmajid is this failure in comparison to the release version of Go 1.4 or did this test pass previously at some point with Go 1.5 while it was in development?

I ask, because Go 1.5 is the first version of Go where cgo support (and thus the cgo test that's failing) was enabled for Solaris:

commit 2b90c3e8edf36c0a76545fa13c195bca68a29420
Author: Aram Hăvărneanu < <[email protected]>
Date:   Mon Mar 30 23:15:51 2015 +0200

    go/build: enable cgo by default on solaris/amd64

    Change-Id: I0110b01fe4c64851ac2cfb5a92c31ce156831bc8
    Reviewed-on: https://go-review.googlesource.com/8265
    Reviewed-by: Minux Ma <[email protected]>

Also, for the record, why are you building Go with -std=c11? It isn't required, and it's unlikely that C11 or C++11 will work as expected on SmartOS. You should have been able to build Go 1.4 with gcc using -std=c89 or -std=gnu89 and without the other special flags you added.

@fazalmajid
Copy link
Author

Really strange: running the tests manually works:
env GOPATH=pwd go tool dist test -no-rebuild
it's when run under all.bash that it fails.

@binarycrusader: I am passing those flags to the bootstrap Go 1.4.2. Without them, I get the error below (this is on an OpenIndiana oi151a1 machine, but I get the same results on SmartOS). If you use an older version of GCC than 5.2, you can probably do without. -std=gnu11 is the default for GCC 5.x.
Go 1.5 itself I build using just env CC=gcc, but if I understand correctly, Go 1.5 does not use gcc at all but its own toolchain (written in Go and compiled using the bootstrap.

(cd go-1.4.2/src; env GOROOT=`pwd`/go-1.4.2 CC="gcc" ./all.bash)
# Building C bootstrap tool.
cmd/dist

# Building compilers and Go bootstrap tool for host, solaris/amd64.
lib9
In file included from /usr/include/inttypes.h:41:0,
                 from /home/majid/build/go-1.4.2/include/u.h:60,
                 from /home/majid/build/go-1.4.2/src/lib9/await.c:31:
/usr/local/lib/gcc/x86_64-pc-solaris2.11/5.2.0/include-fixed/sys/feature_tests.h:362:2: error: #error "Compiler or options invalid for pre-UNIX 03 X/Open applications  and pre-2001 POSIX applications"
 #error "Compiler or options invalid for pre-UNIX 03 X/Open applications \
  ^
In file included from /usr/include/inttypes.h:41:0,
                 from /home/majid/build/go-1.4.2/include/u.h:60,
                 from /home/majid/build/go-1.4.2/src/lib9/_p9dir.c:28:
/usr/local/lib/gcc/x86_64-pc-solaris2.11/5.2.0/include-fixed/sys/feature_tests.h:362:2: error: #error "Compiler or options invalid for pre-UNIX 03 X/Open applications  and pre-2001 POSIX applications"
 #error "Compiler or options invalid for pre-UNIX 03 X/Open applications \
  ^
In file included from /usr/include/inttypes.h:41:0,
                 from /home/majid/build/go-1.4.2/include/u.h:60,
                 from /home/majid/build/go-1.4.2/src/lib9/_exits.c:28:
/usr/local/lib/gcc/x86_64-pc-solaris2.11/5.2.0/include-fixed/sys/feature_tests.h:362:2: error: #error "Compiler or options invalid for pre-UNIX 03 X/Open applications  and pre-2001 POSIX applications"
 #error "Compiler or options invalid for pre-UNIX 03 X/Open applications \
  ^
In file included from /usr/include/inttypes.h:41:0,
                 from /home/majid/build/go-1.4.2/include/u.h:60,
                 from /home/majid/build/go-1.4.2/src/lib9/atoi.c:28:
/usr/local/lib/gcc/x86_64-pc-solaris2.11/5.2.0/include-fixed/sys/feature_tests.h:362:2: error: #error "Compiler or options invalid for pre-UNIX 03 X/Open applications  and pre-2001 POSIX applications"
 #error "Compiler or options invalid for pre-UNIX 03 X/Open applications \
  ^
go tool dist: FAILED: gcc -Wall -Wstrict-prototypes -Wextra -Wunused -Wno-sign-compare -Wno-missing-braces -Wno-parentheses -Wno-unknown-pragmas -Wno-switch -Wno-comment -Wno-missing-field-initializers -Werror -fno-common -ggdb -pipe -Wuninitialized -O2 -fmessage-length=0 -c -m64 -I /home/majid/build/go-1.4.2/include -DPLAN9PORT -I /home/majid/build/go-1.4.2/src/lib9 -o $WORK/atoi.o /home/majid/build/go-1.4.2/src/lib9/atoi.c
go tool dist: FAILED: gcc -Wall -Wstrict-prototypes -Wextra -Wunused -Wno-sign-compare -Wno-missing-braces -Wno-parentheses -Wno-unknown-pragmas -Wno-switch -Wno-comment -Wno-missing-field-initializers -Werror -fno-common -ggdb -pipe -Wuninitialized -O2 -fmessage-length=0 -c -m64 -I /home/majid/build/go-1.4.2/include -DPLAN9PORT -I /home/majid/build/go-1.4.2/src/lib9 -o $WORK/_exits.o /home/majid/build/go-1.4.2/src/lib9/_exits.c
go tool dist: FAILED: gcc -Wall -Wstrict-prototypes -Wextra -Wunused -Wno-sign-compare -Wno-missing-braces -Wno-parentheses -Wno-unknown-pragmas -Wno-switch -Wno-comment -Wno-missing-field-initializers -Werror -fno-common -ggdb -pipe -Wuninitialized -O2 -fmessage-length=0 -c -m64 -I /home/majid/build/go-1.4.2/include -DPLAN9PORT -I /home/majid/build/go-1.4.2/src/lib9 -o $WORK/await.o /home/majid/build/go-1.4.2/src/lib9/await.c
go tool dist: FAILED: gcc -Wall -Wstrict-prototypes -Wextra -Wunused -Wno-sign-compare -Wno-missing-braces -Wno-parentheses -Wno-unknown-pragmas -Wno-switch -Wno-comment -Wno-missing-field-initializers -Werror -fno-common -ggdb -pipe -Wuninitialized -O2 -fmessage-length=0 -c -m64 -I /home/majid/build/go-1.4.2/include -DPLAN9PORT -I /home/majid/build/go-1.4.2/src/lib9 -o $WORK/_p9dir.o /home/majid/build/go-1.4.2/src/lib9/_p9dir.c

@binarycrusader
Copy link
Contributor

@fazalmajid right, I was suggesting that instead of building Go 1.4 with -std=gnu11, you build it with -std=gnu89. I know that gnu11/c11 is the default for GCC 5.2, I was suggesting you specify the older standard explicitly to see what the result was.

Also, as I asked before, what version of Go did you successfully run all tests with before this failure? Was 1.4 the last version you tested?

@fazalmajid
Copy link
Author

Yes, 1.4.2 was the last version I tested. Go 1.4.2 builds successfully with gcc -std=gnu99 but misc/cgo/test has the same failure when building Go 1.5 using the gnu89 Go-1.4.2.

As for running the test manually, I found out what is going wrong (or right): it's testing $GOROOT_FINAL/misc/cgo/test/cgo_test.go (that I patched to comment out all cgo tests so I could install), not the unpatched one in the build directory which fails. When I copy the unpatched one back to $GOROOT_FINAL, I can reproduce the error and will get back to bisecting the cgo tests.

@fazalmajid
Copy link
Author

@rsc any of these tests in misc/cgo/cgo_test.go will cause the failure on OI-151a1/GCC-5.2:

func TestCallback(t *testing.T)              { testCallback(t) }
func TestCallbackGC(t *testing.T)            { testCallbackGC(t) }
func TestCallbackPanic(t *testing.T)         { testCallbackPanic(t) }
func TestCallbackPanicLoop(t *testing.T)     { testCallbackPanicLoop(t) }
func TestCallbackPanicLocked(t *testing.T)   { testCallbackPanicLocked(t) }
func TestBlocking(t *testing.T)              { testBlocking(t) }

@ianlancetaylor
Copy link
Member

The trend there is pretty obvious: calls from Go -> C -> Go fail.

@ianlancetaylor
Copy link
Member

Does the SIGTRAP always occur at the same PC value? Can you run nm on the binary to see what function that PC is in?

@fazalmajid
Copy link
Author

The PC is different for each test. Not sure how to get to the executable, as it seems to be deleted when the test concludes.

@ianlancetaylor
Copy link
Member

You can get the executable by "cd misc/cgo/test; go test -c". You can run it by "GOTRACEBACK=2 ./test.test".

@fazalmajid
Copy link
Author

According to GDB, the problem is in runtime.morestack in asm_amd64.s line 302:

local64 ~/build/go-1.5/misc/cgo/test>env GOTRACEBACK=2 gdb ./test.test
GNU gdb (GDB) 7.8.2
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-solaris2.11".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test.test...done.
Loading Go Runtime support.
(gdb) r
Starting program: /export/home/majid/build/go-1.5/misc/cgo/test/test.test 
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[New LWP    3        ]
[New LWP    4        ]
[New LWP    5        ]

Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 1 (LWP 1)]
runtime.morestack () at /usr/local/go/src/runtime/asm_amd64.s:302
302             MOVQ    m_gsignal(BX), SI

and that matches what the Go trace reports:

local64 ~/build/go-1.5/misc/cgo/test>env GOTRACEBACK=2 ./test.test
SIGTRAP: trace trap (not reset when caught)
PC=0x4a7d26 m=0
signal arrived during cgo execution

goroutine 25 [running, locked to thread]:
runtime.morestack()
        /usr/local/go/src/runtime/asm_amd64.s:302 +0x26 fp=0xfffffd7fffdef488 sp=0xfffffd7fffdef480
created by testing.RunTests
        /usr/local/go/src/testing/testing.go:561 +0x86d

goroutine 1 [chan receive]:
runtime.gopark(0x69e4f8, 0xc820098418, 0x651ed0, 0xc, 0x17, 0x3)
        /usr/local/go/src/runtime/proc.go:185 +0x179 fp=0xc820057b28 sp=0xc820057b00
runtime.goparkunlock(0xc820098418, 0x651ed0, 0xc, 0x17, 0x3)
        /usr/local/go/src/runtime/proc.go:191 +0x54 fp=0xc820057b60 sp=0xc820057b28
runtime.chanrecv(0x5ba4a0, 0xc8200983c0, 0xc820057d78, 0xc820057c01, 0x4c0000)
        /usr/local/go/src/runtime/chan.go:448 +0x47b fp=0xc820057c20 sp=0xc820057b60
runtime.chanrecv1(0x5ba4a0, 0xc8200983c0, 0xc820057d78)
        /usr/local/go/src/runtime/chan.go:349 +0x2b fp=0xc820057c50 sp=0xc820057c20
testing.RunTests(0x69e1a0, 0x76c7e0, 0x33, 0x33, 0x480f01)
        /usr/local/go/src/testing/testing.go:562 +0x8ad fp=0xc820057dd0 sp=0xc820057c50
testing.(*M).Run(0xc820057ef8, 0xc82001a040)
        /usr/local/go/src/testing/testing.go:494 +0x70 fp=0xc820057e58 sp=0xc820057dd0
main.main()
        _/home/majid/build/go-1.5/misc/cgo/test/_test/_testmain.go:156 +0x116 fp=0xc820057f50 sp=0xc820057e58
runtime.main()
        /usr/local/go/src/runtime/proc.go:111 +0x2cb fp=0xc820057fa0 sp=0xc820057f50
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820057fa8 sp=0xc820057fa0

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820046fb8 sp=0xc820046fb0

goroutine 2 [force gc (idle)]:
runtime.gopark(0x69e4f8, 0x77e350, 0x652770, 0xf, 0x14, 0x1)
        /usr/local/go/src/runtime/proc.go:185 +0x179 fp=0xc820032758 sp=0xc820032730
runtime.goparkunlock(0x77e350, 0x652770, 0xf, 0xc820000114, 0x1)
        /usr/local/go/src/runtime/proc.go:191 +0x54 fp=0xc820032790 sp=0xc820032758
runtime.forcegchelper()
        /usr/local/go/src/runtime/proc.go:152 +0xc1 fp=0xc8200327c0 sp=0xc820032790
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc8200327c8 sp=0xc8200327c0
created by runtime.init.4
        /usr/local/go/src/runtime/proc.go:141 +0x2b

goroutine 3 [GC sweep wait]:
runtime.gopark(0x69e4f8, 0x77e4e0, 0x64ae30, 0xd, 0x46a714, 0x1)
        /usr/local/go/src/runtime/proc.go:185 +0x179 fp=0xc820032f48 sp=0xc820032f20
runtime.goparkunlock(0x77e4e0, 0x64ae30, 0xd, 0x14, 0x1)
        /usr/local/go/src/runtime/proc.go:191 +0x54 fp=0xc820032f80 sp=0xc820032f48
runtime.bgsweep(0xc820018070)
        /usr/local/go/src/runtime/mgcsweep.go:51 +0xba fp=0xc820032fb8 sp=0xc820032f80
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820032fc0 sp=0xc820032fb8
created by runtime.gcenable
        /usr/local/go/src/runtime/mgc.go:205 +0x53

goroutine 4 [finalizer wait]:
runtime.gopark(0x69e4f8, 0x7a1fd0, 0x6523e0, 0xe, 0x14, 0x1)
        /usr/local/go/src/runtime/proc.go:185 +0x179 fp=0xc820033718 sp=0xc8200336f0
runtime.goparkunlock(0x7a1fd0, 0x6523e0, 0xe, 0x14, 0x1)
        /usr/local/go/src/runtime/proc.go:191 +0x54 fp=0xc820033750 sp=0xc820033718
runtime.runfinq()
        /usr/local/go/src/runtime/mfinal.go:154 +0xb3 fp=0xc8200337c0 sp=0xc820033750
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc8200337c8 sp=0xc8200337c0
created by runtime.createfing
        /usr/local/go/src/runtime/mfinal.go:135 +0x60

goroutine 5 [syscall]:
runtime.notetsleepg(0x7a21c0, 0xffffffffffffffff, 0x1)
        /usr/local/go/src/runtime/lock_sema.go:264 +0x8f fp=0xc820033f40 sp=0xc820033f00
runtime.signal_recv(0x0)
        /usr/local/go/src/runtime/sigqueue.go:111 +0x132 fp=0xc820033f78 sp=0xc820033f40
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:22 +0x18 fp=0xc820033fc0 sp=0xc820033f78
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820033fc8 sp=0xc820033fc0
created by os/signal.init.1
        /usr/local/go/src/os/signal/signal_unix.go:28 +0x37

rax    0xc8200b6000
rbx    0x77ee40
rcx    0xffffdd7fffa92a40
rdx    0xc82002e598
rdi    0x49dfb0
rsi    0x77e8e0
rbp    0xc82002e5e0
rsp    0xfffffd7fffdef480
r8     0xc8200b6000
r9     0xc82002e678
r10    0x0
r11    0x0
r12    0x6
r13    0x699263
r14    0x3
r15    0x8
rip    0x4a7d26
rflags 0x246
cs     0x53
fs     0x0
gs     0x0

@ianlancetaylor
Copy link
Member

Thanks. Now I see that this means that runtime·morestack was called while running on the g0 stack. Unfortunately, I have no idea how that could happen. And the backtrace doesn't make much sense.

@binarycrusader
Copy link
Contributor

@fazalmajid is it possible for you to try this with a different version of gcc such as 4.7.3?

To do so, I believe you'd need to install that older version and then ensure it's first in $PATH both when you build go itself and when you run the tests.

It'd be helpful to know which version of gcc the Solaris builder has installed and try with that same version.

For the record, I'm using gcc 4.7.3 on a very recent build of Oracle Solaris without any issues.

@minux
Copy link
Member

minux commented Aug 26, 2015 via email

@fazalmajid
Copy link
Author

I tried with gcc 4.7.0, with the same results. Let me disable gcc 5.2 altogether and try again

@fazalmajid
Copy link
Author

Nope, even with GCC 5.2 disabled altogether to force cgo to use GCC 4.7.0 to compile, I am still getting the same error. Let me try building GCC 4.7.3.
My GCC configure flags are:

--gas --with-gnu-as --with-as=/usr/local/bin/gas \
--enable-shared --without-gnu-ld --with-ld=/usr/ccs/bin/ld \
--disable-multilib --enable-ssp --prefix=/usr/local \
--enable-languages=c,c++,fortran \
--with-gmp=/usr/local --with-mpfr=/usr/local --with-mpc=/usr/local

I also have the OpenSSL patch applied: https://www.openssl.org/~appro/values.c

@fazalmajid
Copy link
Author

I tried again with GCC 4.7.3, and disabled SSP in GCC, to no avail.

@ianlancetaylor
Copy link
Member

I gather that this passes on Solaris and fails on SmartOS. What is the difference between the two?

The nature of the failure makes me suspect that something in the call from Go to C to Go is changing the value of the TLS variable g. That could be due to differences in the system linker.

@fazalmajid
Copy link
Author

@minux: could you share the output of gcc -v on the solaris builder?

@minux
Copy link
Member

minux commented Aug 26, 2015 via email

@fazalmajid
Copy link
Author

@minux
I rebuilt gcc 4.7.3 with the same options (adjusted for file paths), to no avail. Are you using a standard Joyent image for the solaris builder? If so, may I have the UUID for it? Is the GCC you are using one supplied by the image, one from pkgsrc or one you built yourself?

@4ad
Copy link
Member

4ad commented Aug 28, 2015

Yes, standard image, base64 14.2.0, not sure how to get the UUID, but I run it on many image versions. The GCC is from pkgsrc.

@fazalmajid
Copy link
Author

I rebuilt using a zone with base-64/15.2.0, UUID 5c7d0d24-3475-11e5-8e67-27953a8b237e and the pkgin gcc (which is configured with /usr/bin/ld), and I am still experiencing the error. I also see it on an OpenIndiana oi151a1 machine, so it's not a regression introduced by one of the newer SmartOS/Illumos kernels. The zone has the stock /etc, I only modified /etc/{passwd,shadow,group,user_attr} and used crle to set the 64-bit p

golang ~/build/go-1.5>gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/local/gcc49/libexec/gcc/x86_64-sun-solaris2.11/4.9.2/lto-wrapper
Target: x86_64-sun-solaris2.11
Configured with: ../gcc-4.9.2/configure --enable-languages='c obj-c++ objc go fortran c++' --enable-shared --enable-long-long --with-local-prefix=/opt/local/gcc49 --enable-libssp --enable-threads=posix --with-boot-ldflags='-static-libstdc++ -static-libgcc -Wl,-R/opt/local/lib ' --disable-nls --with-gxx-include-dir=/opt/local/gcc49/include/c++/ --without-gnu-ld --with-ld=/usr/bin/ld --with-gnu-as --with-as=/opt/local/bin/gas --prefix=/opt/local/gcc49 --build=x86_64-sun-solaris2.11 --host=x86_64-sun-solaris2.11 --infodir=/opt/local/gcc49/info --mandir=/opt/local/gcc49/man
Thread model: posix
gcc version 4.9.2 (GCC) 

@fazalmajid
Copy link
Author

I made some progress: I temporarily removed my .tcshrc and .login files to have a clean slate in terms of environment variables and other settings, and the test now passes. I need to figure out what exactly is the root cause: environment variable, plimit, etc.?

@fazalmajid fazalmajid changed the title cmd/cgo: misc/cgo/test fails on Solaris using Go 1.5 On Solaris, "unlimit stacksize" causes misc/cgo/test to fails Aug 28, 2015
@fazalmajid
Copy link
Author

OK, I finally got it: my .login does (among other things) unlimit stacksize. With unlimited stacksize, the test crashes. Without the limit, it works fine.

@4ad
Copy link
Member

4ad commented Aug 28, 2015

Thanks for the report, just for further reference, gcc 5.2 does work:

[root@71c9cb18-79f8-e424-a65a-f881d1d8d224 ~/go/src]# go tool dist test -no-rebuild cgo_test

##### ../misc/cgo/test
scatter = 447380
hello from C
sqrt is: 0
PASS
ok      _/root/go/misc/cgo/test 1.211s
scatter = 552570
hello from C
sqrt is: 0
PASS
ok      _/root/go/misc/cgo/test 1.223s

ALL TESTS PASSED (some were excluded)
[root@71c9cb18-79f8-e424-a65a-f881d1d8d224 ~/go/src]# 
[root@71c9cb18-79f8-e424-a65a-f881d1d8d224 ~/go/src]# go version 
go version devel +63862af Fri Aug 28 16:34:52 2015 +0000 solaris/amd64
[root@71c9cb18-79f8-e424-a65a-f881d1d8d224 ~/go/src]# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/gcc52/libexec/gcc/x86_64-sun-solaris2.11/5.2.0/lto-wrapper
Target: x86_64-sun-solaris2.11
Configured with: /root/gcc-5.2.0/configure --enable-languages=c --disable-bootstrap --with-diagnostics-color=never --enable-shared --enable-long-long --with-local-prefix=/opt/local --disable-libssp --disable-multilib --enable-threads=posix --with-boot-ldflags='-static-libstdc++ -static-libgcc -Wl,-R/opt/local/lib ' --disable-nls --without-gnu-ld --with-ld=/usr/bin/ld --with-gnu-as --with-as=/opt/local/bin/gas --prefix=/opt/gcc52 --build=x86_64-sun-solaris2.11 --host=x86_64-sun-solaris2.11
Thread model: posix
gcc version 5.2.0 (GCC) 
[root@71c9cb18-79f8-e424-a65a-f881d1d8d224 ~/go/src]# 

@fazalmajid
Copy link
Author

Yes, I confirmed my initial build using GCC 5.2 works when a stacksize limit is set.

@ianlancetaylor ianlancetaylor changed the title On Solaris, "unlimit stacksize" causes misc/cgo/test to fails misc/cgo/test: On Solaris, "unlimit stacksize" causes misc/cgo/test to fails Aug 28, 2015
@ianlancetaylor ianlancetaylor modified the milestones: Go1.6, Go1.5.1 Aug 28, 2015
@ianlancetaylor
Copy link
Member

Let me confirm: you are saying that the test works when there is a limit on stack size, and that it fails when there is no limit on stack size?

How exactly are you changing the stack size limit?

@fazalmajid
Copy link
Author

Yes.

I change the stack size limit using "unlimit stacksize" in tcsh, which in turn uses setrlimit(2) to set RLIMIT_STACK to RLIM_INFINITY (which on Solaris is -3). According to Stevens this is available on pretty much any UNIX OS (XSI, Linux, FreeBSD, Mac OS X, Solaris).
On Bourne shell and similar environments, you would use "ulimit -s unlimited" instead.
To view the current limits, in csh/tcsh "limit", in sh/ksh/bash "ulimit -a".

@binarycrusader
Copy link
Contributor

For what it's worth, executing "ulimit -s unlimited" on an amd64 Linux distribution and then running "go test" in the misc/cgo/test directory is successful, so this is likely related to some of the subtle differences in Solaris when you set stack size to be unlimited:

$ ulimit -s
8192
$ ulimit -s unlimited
$ ulimit -s
unlimited
$ pwd
.../go/misc/cgo/test
$ go test
scatter = 0x406380
hello from C
sqrt is: 0
PASS
ok      _/.../go/misc/cgo/test  1.596s

I was able to produce a similar failure on Linux only by setting the stack limit to a very small amount (64 kB).

@binarycrusader
Copy link
Contributor

@fazalmajid I've been unable to reproduce this issue on a recent build of Oracle Solaris. That suggests this bug might be in SmartOS itself and older versions of Solaris or that we still don't understand the real issue.

By the way, Solaris doesn't generally allow memory overcommit, how much memory did you have free at the time you ran the tests?

@fazalmajid
Copy link
Author

As reported by top in the zone:
Memory: 8192M phys mem, 3779M free mem, 8192M total swap, 8161M free swap
The physical machine itself has 32GB.

I don't think it has to do with the actual setting, setting the stacksize limit to a ridiculous value like 20GB (the max you can apparently set before the value you supply is considered unlimited) does not cause the error.

@rsc
Copy link
Contributor

rsc commented Nov 30, 2015

@fazalmajid I assume this is still happening at the current development head? In your Aug 25 comment you ran and caught the crash under gdb, showing that it was in morestack. Can you run 'where' to see what called morestack?

FWIW, although gdb pins the blame on that line of morestack, the problem is actually the previous line, an INT $3 instruction which causes the SIGTRAP. The processor behavior is to advance the PC during the trap, which is likely why you see the line after it being given in the trace. But either way the caller is what we want to know more about.

The relevant code is:

TEXT runtime·morestack(SB),NOSPLIT,$0-0
    // Cannot grow scheduler stack (m->g0).
    get_tls(CX)
    MOVQ    g(CX), BX
    MOVQ    g_m(BX), BX
    MOVQ    m_g0(BX), SI
    CMPQ    g(CX), SI
    JNE 2(PC)
    INT $3

    // Cannot grow signal stack (m->gsignal).
    MOVQ    m_gsignal(BX), SI

and the problem is therefore that somehow morestack has been called on a system stack. The question is why. It would also be interesting to see the output of x/100xg $rsi at the point where the fault happens. That should be the first 100 words of the g == g->m->g0 struct.

Thanks.

@rsc rsc changed the title misc/cgo/test: On Solaris, "unlimit stacksize" causes misc/cgo/test to fails misc/cgo/test: morestack on g0 on Solaris under "unlimit stacksize" Nov 30, 2015
@fazalmajid
Copy link
Author

Yes, it's still happening with the Git HEAD.

Here is the output of "where" and "x/100xg $rsi" as requested:

tsurumah ~/go-head/misc/cgo/test>env GOTRACEBACK=2 gdb ./test.test
GNU gdb (GDB) 7.8.2
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-solaris2.11".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test.test...done.
Loading Go Runtime support.
(gdb) set pagination off
(gdb) r
Starting program: /home/majid/go-head/misc/cgo/test/test.test 
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP    2        ]
[New LWP    3        ]
[New LWP    4        ]
[New LWP    5        ]

Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 1 (LWP 1)]
runtime.morestack () at /home/majid/go-head/src/runtime/asm_amd64.s:331
331             MOVQ    m_gsignal(BX), SI
(gdb) where
#0  runtime.morestack () at /home/majid/go-head/src/runtime/asm_amd64.s:331
#1  0x000000000049ccd3 in runtime.deferproc.func1 () at /home/majid/go-head/src/runtime/panic.go:74
#2  0xfffffd7fffdef4d0 in ?? ()
#3  0x0000000000000008 in ?? ()
#4  0x000000000069f2e8 in _/home/majid/go-head/misc/cgo/test._cgoexpwrap_8a15248c4e43_gc.f ()
#5  0x000000000057e4ad in crosscall2 () at /home/majid/go-head/src/runtime/cgo/asm_amd64.s:36
#6  0xfffffd7fffdef4d0 in ?? ()
#7  0x0000000000000008 in ?? ()
#8  0x000000c8200940d8 in ?? ()
#9  0xfffffd7fffdef4f0 in ?? ()
#10 0x0000000000000012 in ?? ()
#11 0x000000000069e24e in runtime.gcbits.* ()
#12 0x000000000000000f in ?? ()
#13 0x0000000000000008 in ?? ()
#14 0x0000000000000012 in ?? ()
#15 0x000000c8200940d8 in ?? ()
#16 0xfffffd7fffdef4f0 in ?? ()
#17 0x000000000059eb3c in goCallback (p0=0x7862a0 <runtime.m0>) at /tmp/go-build993730418/_/home/majid/go-head/misc/cgo/test/_test/_obj_test/_cgo_export.c:16
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) x/100xg $rsi
0x785f00 <runtime.g0>:  0xfffffd7fffdfd000      0xfffffd7fffdff580
0x785f10 <runtime.g0+16>:       0xfffffd7fffdfd2d0      0xfffffd7fffdfd2d0
0x785f20 <runtime.g0+32>:       0x0000000000000000      0x0000000000000000
0x785f30 <runtime.g0+48>:       0x00000000007862a0      0x0000000000000000
0x785f40 <runtime.g0+64>:       0xfffffd7fffdef418      0xffffffffffffffff
0x785f50 <runtime.g0+80>:       0x0000000000785f00      0x0000000000000000
0x785f60 <runtime.g0+96>:       0x0000000000000000      0x0000000000000000
0x785f70 <runtime.g0+112>:      0x0000000000785f00      0x0000000000000000
0x785f80 <runtime.g0+128>:      0x0000000000000000      0x0000000000000000
0x785f90 <runtime.g0+144>:      0x0000000000000000      0x0000000000000000
0x785fa0 <runtime.g0+160>:      0x0000000000000000      0x0000000000000000
0x785fb0 <runtime.g0+176>:      0x0000000000000000      0x0000000000000000
0x785fc0 <runtime.g0+192>:      0x0000000000000000      0x0000000000000000
0x785fd0 <runtime.g0+208>:      0x0000000000000000      0x0000000000000000
0x785fe0 <runtime.g0+224>:      0x0000000000000000      0x0000000000000000
0x785ff0 <runtime.g0+240>:      0x0000000000000000      0x0000000000000000
0x786000 <runtime.g0+256>:      0x0000000000000000      0x0000000000000000
0x786010 <runtime.g0+272>:      0x0000000000000000      0x0000000000000000
0x786020 <runtime.g0+288>:      0x0000000000000000      0x0000000000000000
0x786030 <runtime.g0+304>:      0x0000000000000000      0x0000000000000000
0x786040 <runtime.g0+320>:      0x0000000000000000      0x0000000000000000
0x786050 <runtime.g0+336>:      0x0000000000000000      0x0000000000000000
0x786060 <runtime.g0+352>:      0x0000000000000000      0x0000000000000000
0x786070:       0x0000000000000000      0x0000000000000000
0x786080 <os/signal.handlers>:  0x0000000000000000      0x0000000000000000
0x786090 <os/signal.handlers+16>:       0x0000000000000000      0x0000000000000000
0x7860a0 <os/signal.handlers+32>:       0x0000000000000000      0x0000000000000000
0x7860b0 <os/signal.handlers+48>:       0x0000000000000000      0x0000000000000000
0x7860c0 <os/signal.handlers+64>:       0x0000000000000000      0x0000000000000000
0x7860d0 <os/signal.handlers+80>:       0x0000000000000000      0x0000000000000000
0x7860e0 <os/signal.handlers+96>:       0x0000000000000000      0x0000000000000000
0x7860f0 <os/signal.handlers+112>:      0x0000000000000000      0x0000000000000000
0x786100 <os/signal.handlers+128>:      0x0000000000000000      0x0000000000000000
0x786110 <os/signal.handlers+144>:      0x0000000000000000      0x0000000000000000
0x786120 <os/signal.handlers+160>:      0x0000000000000000      0x0000000000000000
0x786130 <os/signal.handlers+176>:      0x0000000000000000      0x0000000000000000
0x786140 <os/signal.handlers+192>:      0x0000000000000000      0x0000000000000000
0x786150 <os/signal.handlers+208>:      0x0000000000000000      0x0000000000000000
0x786160 <os/signal.handlers+224>:      0x0000000000000000      0x0000000000000000
0x786170 <os/signal.handlers+240>:      0x0000000000000000      0x0000000000000000
0x786180 <os/signal.handlers+256>:      0x0000000000000000      0x0000000000000000
0x786190 <os/signal.handlers+272>:      0x0000000000000000      0x0000000000000000
0x7861a0 <os/signal.handlers+288>:      0x0000000000000000      0x0000000000000000
0x7861b0 <os/signal.handlers+304>:      0x0000000000000000      0x0000000000000000
0x7861c0 <os/signal.handlers+320>:      0x0000000000000000      0x0000000000000000
0x7861d0 <os/signal.handlers+336>:      0x0000000000000000      0x0000000000000000
0x7861e0 <os/signal.handlers+352>:      0x0000000000000000      0x0000000000000000
0x7861f0 <os/signal.handlers+368>:      0x0000000000000000      0x0000000000000000
0x786200 <os/signal.handlers+384>:      0x0000000000000000      0x0000000000000000
0x786210 <os/signal.handlers+400>:      0x0000000000000000      0x0000000000000000
(gdb) 

@rsc
Copy link
Contributor

rsc commented Dec 4, 2015

Thanks for the extra information. I got access to a Solaris box and was able to reproduce this. It looks like when the stack is "unlimited", asking Solaris how big the stack is returns the current stack size (in this case, 0x3000 bytes), not its maximum size. Then Go tries to stay within that size and triggers the call to morestack on a cgo callback because the C code in the middle has taken up all of the original 0x3000 bytes and then some. Will send a CL making Go less gullible.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/17452 mentions this issue.

@rsc rsc closed this as completed in cd58f44 Dec 5, 2015
@fazalmajid
Copy link
Author

Posix defines ss_size to be the stack size (i.e. currently allocated, not maximum allocatable):

It looks like when RLIMIT_STACK is set to a specific value, the stack size is set to that value (after all, this is just allocating virtual address space, not actual memory which will only happen when a page fault occurs). When RLIMIT_STACK is unlimited, it can't do that, obviously, and thus allocates the minimum (2*PAGESIZE which should be 8KB by default, but I'm guessing by the time ld.so finishes its work, it has grown by one extra page to give your 0x3000 value).

I'm not sure what purpose g->stacklo serves in the Go runtime, and the default RLIMIT_STACK is 10MB. Your change would probably cause a segfault if RLIMIT_STACK were set under 1MB and the stack grew that large. Perhaps a better approach is to getrlimit RLIMIT_STACK, and if it is RLIM_INFINITY, use 1MB (or 2MB, or 10MB), otherwise use ctx.uc_stack.ss_size.

Go is in good company: Java had the same issue (note the lame response):

@rsc
Copy link
Contributor

rsc commented Dec 5, 2015

Go checks for impending stack overflow at the beginning of most functions. To do that on the system stacks, it needs to know how big the stack is allowed to be. Whatever POSIX happens to say, Solaris seems to be the only system that reports such a small size. It's fine.

@binarycrusader
Copy link
Contributor

The only thing that concerns me is that I was never able to reproduce this on Solaris proper; this issue seems to be unique to Illumos. I don't think the workaround put in place will break anything, but I intend to research into why there might be a difference.

@fazalmajid
Copy link
Author

@binarycrusader
Can you compile this program and run it, with and without unlimited stacksize?

#include <ucontext.h>
#include <stdio.h>

int main(int argc, char **argv) {
  ucontext_t uc;

  getcontext(&uc);
  printf("ss_size = %ld\n", uc.uc_stack.ss_size);
}

Here are my results (compiled -m64, I don't have a functional -m32 compiler any more):

  • with limit stacksize 10240KB: ss_size = 10485760
  • with unlimited stacksize: ss_size = 12288

@binarycrusader
Copy link
Contributor

@fazalmajid

As 64-bit:
$ ulimit -s
8192
$ cc -m64 ./main.c && ./a.out
ss_size = 8388608
$ ulimit -s unlimited
$ ulimit -s
unlimited
$ cc -m64 ./main.c && ./a.out
ss_size = 1098437885952

As 32-bit:
$ ulimit -s
8192
$ cc -m32 ./main.c && ./a.out
ss_size = 8388608
$ ulimit -s unlimited
$ ulimit -s
unlimited
$ cc -m32 ./main.c && ./a.out
ss_size = 2139090944

@binarycrusader
Copy link
Contributor

@fazalmajid if I had to guess, I'd say the difference might be due to a fix Solaris has for stack_inbounds() being broken for the main thread.

Historically, stack_inbounds() just checked for the given address greater-than-or-equal-to the stack base and less than base + curthread->ul_stack.ss_size.

We fixed libc_init() so that if the stack is unlimited, it tries reading the stack size from /proc/self/rmap. If that fails, we just set it to 8 Mbytes.

That's a wild guess at the moment, so I think the Go fix that was put in place likely remains right workaround for now.

@fazalmajid
Copy link
Author

Fair enough. Your results confirm the Oracle Solaris 11 behavior is different from the Illumos one (and presumably Solaris 10), which explains why the test wasn't failing on your machines.

On an unrelated note, I am really impressed with Go's scalability on Solaris, despite how recently it has been fully supported as a platform - I had a nsq_to_http process running yesterday with 600+ LWPs on a 32-core 64-thread machine using the equivalent of 30 cores running flat out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants