-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orcmd crashes with a segfault when we do a nmap scan on its open tcp port. #754
Comments
All I can say is that you have a stale copy of orcm - from the public master: /* socket is non-blocking so handle errors */
if (retval < 0) {
if (opal_socket_errno != EINTR &&
opal_socket_errno != EAGAIN &&
opal_socket_errno != EWOULDBLOCK) {
if (NULL == peer) {
/* protect against things like port scanners */
CLOSE_THE_SOCKET(sd);
return false;
} else if (peer->state == MCA_OOB_TCP_CONNECT_ACK) {
/* If we overflow the listen backlog, it's
possible that even though we finished the three
way handshake, the remote host was unable to
transition the connection from half connected
(received the initial SYN) to fully connected
(in the listen backlog). We likely won't see
the failure until we try to receive, due to
timing and the like. The first thing we'll get
in that case is a RST packet, which receive
will turn into a connection reset by peer
errno. In that case, leave the socket in
CONNECT_ACK and propogate the error up to
recv_connect_ack, who will try to establish the
connection again */
opal_output_verbose(OOB_TCP_DEBUG_CONNECT, orte_oob_base_framework.framework_output,
"%s connect ack received error %s from %s",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
strerror(opal_socket_errno),
(NULL == peer) ? "UNKNOWN" : ORTE_NAME_PRINT(&(peer->name)));
return false; I suggest you folks update |
Thanks :) sorry.. my mistake. |
jsquyres
pushed a commit
to jsquyres/ompi
that referenced
this issue
Aug 23, 2016
opal_free_list: fix strange size check
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When we do a scan on the open port using nmap locally on the same machine( loopback address), it causes the ormcd to crash with a segfault.
The specific command that I used for nmap was "nmap -sT -p T:50000-60000 127.0.0.1"
Below is the core dump for the seg fault. As per the below core dump, the problem happens when the file "orcm/orte/mca/oob/tcp/oob_tcp_connection.c" when the variable "peer" is set to "NULL" in the "recv_handler " function. and the same thing is passed on to functions mca_oob_tcp_peer_recv_connect_ack ( pr=0x0) and later to tcp_peer_recv_blocking ( peer 0x0). The pointer gets dereferenced in function tcp_peer_recv_blocking by this line of code "if (peer->state == MCA_OOB_TCP_CONNECT_ACK)"
Below is the core dump of the segfault:
"
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff44342d8 in tcp_peer_recv_blocking (peer=0x0, sd=32, data=0x7fffffffc930, size=28) at oob_tcp_connection.c:1012
1012 if (peer->state == MCA_OOB_TCP_CONNECT_ACK) {
(gdb) rc
Target multi-thread does not support this command.
(gdb) bt
#0 0x00007ffff44342d8 in tcp_peer_recv_blocking (peer=0x0, sd=32, data=0x7fffffffc930, size=28) at oob_tcp_connection.c:1012
#1 0x00007ffff443320f in mca_oob_tcp_peer_recv_connect_ack (pr=0x0, sd=32, dhdr=0x7fffffffc9d0) at oob_tcp_connection.c:664
#2 0x00007ffff442e42c in recv_handler (sd=32, flg=2, cbdata=0x6d80b0) at oob_tcp.c:564
#3 0x00007ffff761142c in event_process_active_single_queue (activeq=0x64dd80, base=0x64dee0) at event.c:1370
#4 event_process_active (base=) at event.c:1440
#5 opal_libevent2022_event_base_loop (base=0x64dee0, flags=1) at event.c:1641
#6 0x0000000000402f0c in main (argc=1, argv=0x7fffffffce88) at orcmd.c:272
(gdb) frame
#0 0x00007ffff44342d8 in tcp_peer_recv_blocking (peer=0x0, sd=32, data=0x7fffffffc930, size=28) at oob_tcp_connection.c:1012
1012 if (peer->state == MCA_OOB_TCP_CONNECT_ACK) {
"
The text was updated successfully, but these errors were encountered: