orcmd crashes with a segfault when we do a nmap scan on its open tcp port. #754

sushmith · 2015-07-27T23:27:44Z

When we do a scan on the open port using nmap locally on the same machine( loopback address), it causes the ormcd to crash with a segfault.
The specific command that I used for nmap was "nmap -sT -p T:50000-60000 127.0.0.1"

Below is the core dump for the seg fault. As per the below core dump, the problem happens when the file "orcm/orte/mca/oob/tcp/oob_tcp_connection.c" when the variable "peer" is set to "NULL" in the "recv_handler " function. and the same thing is passed on to functions mca_oob_tcp_peer_recv_connect_ack ( pr=0x0) and later to tcp_peer_recv_blocking ( peer 0x0). The pointer gets dereferenced in function tcp_peer_recv_blocking by this line of code "if (peer->state == MCA_OOB_TCP_CONNECT_ACK)"

Below is the core dump of the segfault:
"
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff44342d8 in tcp_peer_recv_blocking (peer=0x0, sd=32, data=0x7fffffffc930, size=28) at oob_tcp_connection.c:1012
1012 if (peer->state == MCA_OOB_TCP_CONNECT_ACK) {
(gdb) rc
Target multi-thread does not support this command.
(gdb) bt
#0 0x00007ffff44342d8 in tcp_peer_recv_blocking (peer=0x0, sd=32, data=0x7fffffffc930, size=28) at oob_tcp_connection.c:1012
#1 0x00007ffff443320f in mca_oob_tcp_peer_recv_connect_ack (pr=0x0, sd=32, dhdr=0x7fffffffc9d0) at oob_tcp_connection.c:664
#2 0x00007ffff442e42c in recv_handler (sd=32, flg=2, cbdata=0x6d80b0) at oob_tcp.c:564
#3 0x00007ffff761142c in event_process_active_single_queue (activeq=0x64dd80, base=0x64dee0) at event.c:1370
#4 event_process_active (base=) at event.c:1440
#5 opal_libevent2022_event_base_loop (base=0x64dee0, flags=1) at event.c:1641
#6 0x0000000000402f0c in main (argc=1, argv=0x7fffffffce88) at orcmd.c:272
(gdb) frame
#0 0x00007ffff44342d8 in tcp_peer_recv_blocking (peer=0x0, sd=32, data=0x7fffffffc930, size=28) at oob_tcp_connection.c:1012
1012 if (peer->state == MCA_OOB_TCP_CONNECT_ACK) {
"

rhc54 · 2015-07-28T02:35:16Z

All I can say is that you have a stale copy of orcm - from the public master:

        /* socket is non-blocking so handle errors */
        if (retval < 0) {
            if (opal_socket_errno != EINTR &&
                opal_socket_errno != EAGAIN &&
                opal_socket_errno != EWOULDBLOCK) {
                if (NULL == peer) {
                    /* protect against things like port scanners */
                    CLOSE_THE_SOCKET(sd);
                    return false;
                } else if (peer->state == MCA_OOB_TCP_CONNECT_ACK) {
                    /* If we overflow the listen backlog, it's
                       possible that even though we finished the three
                       way handshake, the remote host was unable to
                       transition the connection from half connected
                       (received the initial SYN) to fully connected
                       (in the listen backlog).  We likely won't see
                       the failure until we try to receive, due to
                       timing and the like.  The first thing we'll get
                       in that case is a RST packet, which receive
                       will turn into a connection reset by peer
                       errno.  In that case, leave the socket in
                       CONNECT_ACK and propogate the error up to
                       recv_connect_ack, who will try to establish the
                       connection again */
                    opal_output_verbose(OOB_TCP_DEBUG_CONNECT, orte_oob_base_framework.framework_output,
                                        "%s connect ack received error %s from %s",
                                        ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
                                        strerror(opal_socket_errno),
                                        (NULL == peer) ? "UNKNOWN" : ORTE_NAME_PRINT(&(peer->name)));
                    return false;

I suggest you folks update

sushmith · 2015-07-28T18:51:36Z

Thanks :) sorry.. my mistake.

opal_free_list: fix strange size check

sushmith closed this as completed Jul 28, 2015

jsquyres pushed a commit to jsquyres/ompi that referenced this issue Aug 23, 2016

Merge pull request open-mpi#754 from hjelmn/v2.x_opal_flist_fix

7fb2b07

opal_free_list: fix strange size check

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

orcmd crashes with a segfault when we do a nmap scan on its open tcp port. #754

orcmd crashes with a segfault when we do a nmap scan on its open tcp port. #754

sushmith commented Jul 27, 2015

rhc54 commented Jul 28, 2015

sushmith commented Jul 28, 2015

orcmd crashes with a segfault when we do a nmap scan on its open tcp port. #754

orcmd crashes with a segfault when we do a nmap scan on its open tcp port. #754

Comments

sushmith commented Jul 27, 2015

rhc54 commented Jul 28, 2015

sushmith commented Jul 28, 2015