Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

SIGSEGV immediately after zookeeper session initiation (with both OpenJDK and Oracle JDK) #1352

Closed
wwalker opened this issue Mar 31, 2015 · 5 comments

Comments

@wwalker
Copy link

wwalker commented Mar 31, 2015

Immediately after zookeeper session establishment is complete, Java throws a SIGSEGV. This occurs both with OpenJDK and with Oracle JDK

marathon-0.8.1-1.0.171.el6.x86_64
jdk-1.7.0_75-fcs.x86_64

(almost identical messages with java-1.8.0-openjdk-1.8.0.31-5.b13.fc21.x86_64)

Mar 31 11:36:56 control marathon[25108]: [2015-03-31 11:36:56,850] INFO Starting Marathon 0.8.1 (mesosphere.marathon.Main$:20)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,172] INFO Connecting to Zookeeper... (mesosphere.marathon.Main$:39)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,179] INFO Client environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,179] INFO Client environment:host.name= (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,179] INFO Client environment:java.version=1.7.0_75 (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,179] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,179] INFO Client environment:java.home=/usr/java/jdk1.7.0_75/jre (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,180] INFO Client environment:java.class.path=/usr/bin/marathon2 (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,180] INFO Client environment:java.library.path=/usr/local/lib:/usr/lib:/usr/lib64 (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,180] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,180] INFO Client environment:java.compiler= (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,180] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,180] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,180] INFO Client environment:os.version=2.6.32-504.12.2.el6.x86_64 (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,180] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,181] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,182] INFO Client environment:user.dir=/var/log (org.apache.zookeeper.ZooKeeper:97)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,183] INFO Initiating client connection, connectString=172.18.15.152:2181 sessionTimeout=10000 watcher=com.twitter.common.zookeeper.ZooKeeperClient$3@64e6b46f (org.apache.zookeeper.ZooKeeper:379)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,191] INFO Opening socket connection to server /172.18.15.152:2181 (org.apache.zookeeper.ClientCnxn:1061)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,195] INFO Socket connection established to 172.18.15.152/172.18.15.152:2181, initiating session (org.apache.zookeeper.ClientCnxn:950)
Mar 31 11:36:58 control marathon[25108]: [2015-03-31 11:36:58,199] INFO Session establishment complete on server 172.18.15.152/172.18.15.152:2181, sessionid = 0x14c6f8e034901d0, negotiated timeout = 10000 (org.apache.zookeeper.ClientCnxn:739)
Mar 31 11:36:58 control marathon[25108]: #
Mar 31 11:36:58 control marathon[25108]: # A fatal error has been detected by the Java Runtime Environment:
Mar 31 11:36:58 control marathon[25108]: #
Mar 31 11:36:58 control marathon[25108]: # SIGSEGV (0xb) at pc=0x00007ffd6b74553c, pid=25108, tid=140726418175744
Mar 31 11:36:58 control marathon[25108]: #
Mar 31 11:36:58 control marathon[25108]: # JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build 1.7.0_75-b13)
Mar 31 11:36:58 control marathon[25108]: # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode linux-amd64 compressed oops)
Mar 31 11:36:58 control marathon[25108]: # Problematic frame:
Mar 31 11:36:58 control marathon[25108]: # C [libc.so.6+0x7b53c] cfree+0x1c
Mar 31 11:36:58 control marathon[25108]: #
Mar 31 11:36:58 control marathon[25108]: # Core dump written. Default location: /var/log/core or core.25108
Mar 31 11:36:58 control marathon[25108]: #
Mar 31 11:36:58 control marathon[25108]: # An error report file with more information is saved as:
Mar 31 11:36:58 control marathon[25108]: # /var/log/hs_err_pid25108.log
Mar 31 11:36:58 control marathon[25108]: #
Mar 31 11:36:58 control marathon[25108]: # If you would like to submit a bug report, please visit:
Mar 31 11:36:58 control marathon[25108]: # http://bugreport.sun.com/bugreport/crash.jsp
Mar 31 11:36:58 control marathon[25108]: #

@wwalker
Copy link
Author

wwalker commented Mar 31, 2015

@wwalker
Copy link
Author

wwalker commented Mar 31, 2015

Simple bt on the core file:

Core was generated by `java -Djava.library.path=/usr/local/lib:/usr/lib:/usr/lib64 -Djava.util.logging'.
Program terminated with signal 6, Aborted.
#0 0x00007ffd6b6fc625 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install jdk-1.7.0_75-fcs.x86_64
(gdb) bt
#0 0x00007ffd6b6fc625 in raise () from /lib64/libc.so.6
#1 0x00007ffd6b6fde05 in abort () from /lib64/libc.so.6
#2 0x00007ffd6b071c55 in os::abort(bool) () from /usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so
#3 0x00007ffd6b1f3cd7 in VMError::report_and_die() () from /usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so
#4 0x00007ffd6b1f425e in crash_handler(int, siginfo_, void_) () from /usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so
#5
#6 0x00007ffd6b0686f1 in os::is_first_C_frame(frame_) () from /usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so
#7 0x00007ffd6b1f23cd in VMError::report(outputStream_) () from /usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so
#8 0x00007ffd6b1f38da in VMError::report_and_die() () from /usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so
#9 0x00007ffd6b076b6f in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so
#10
#11 0x00007ffd6b74553c in free () from /lib64/libc.so.6
#12 0x00007ffd6b799630 in freeaddrinfo () from /lib64/libc.so.6
#13 0x00007ffd59070cb0 in process::initialize(std::basic_string<char, std::char_traits, std::allocator > const&) () from /usr/lib/libmesos-0.22.0.so
#14 0x00007ffd59071390 in process::ProcessBase::ProcessBase(std::basic_string<char, std::char_traits, std::allocator > const&) () from /usr/lib/libmesos-0.22.0.so
#15 0x00007ffd5901c333 in mesos::internal::state::ZooKeeperStorageProcess::ZooKeeperStorageProcess(std::basic_string<char, std::char_traits, std::allocator > const&, Duration const&, std::basic_string<char, std::char_traits, std::allocator > const&, Optionzookeeper::Authentication const&) () from /usr/lib/libmesos-0.22.0.so
#16 0x00007ffd5901c590 in mesos::internal::state::ZooKeeperStorage::ZooKeeperStorage(std::basic_string<char, std::char_traits, std::allocator > const&, Duration const&, std::basic_string<char, std::char_traits, std::allocator > const&, Optionzookeeper::Authentication const&) () from /usr/lib/libmesos-0.22.0.so
#17 0x00007ffd590c3059 in Java_org_apache_mesos_state_ZooKeeperState_initialize__Ljava_lang_String_2JLjava_util_concurrent_TimeUnit_2Ljava_lang_String_2 () from /usr/lib/libmesos-0.22.0.so
#18 0x00007ffd61012d98 in ?? ()
#19 0x00000000dcbe15a3 in ?? ()
#20 0x00007ffd000000b6 in ?? ()
#21 0x00007ffd64009098 in ?? ()
#22 0x00007ffd61060e85 in ?? ()
#23 0x00007ffd64008800 in ?? ()
#24 0x00007ffd610610f8 in ?? ()
#25 0x00007ffd6c2a4980 in ?? ()
#26 0x00000000dcbe17e0 in ?? ()
#27 0x00007ffd6c2a4a08 in ?? ()
#28 0x00000000dcbe83d0 in ?? ()
#29 0x0000000000000000 in ?? ()

@wwalker
Copy link
Author

wwalker commented Mar 31, 2015

zookeeper-client on the same machine seems to connect just fine.

@jrnt30
Copy link

jrnt30 commented May 13, 2015

I believe I am experiencing a similar issue in a few environments. I believe this may be due to INFO Client environment:host.name= (org.apache.zookeeper.ZooKeeper:97) based off of some observations on which environments this does and does not work well in.

Cody from the Mesosphere team also pointed me to https://issues.apache.org/jira/browse/MESOS-2636 which may be directly related

@kolloch
Copy link
Contributor

kolloch commented May 18, 2015

This has been hopefully solved by running Marathon 0.8.2-RC3 with Mesos 0.22.1 (Master+Libraries). Please reopen if the problem persists.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants