Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orcmd crashes with a segfault when we do a nmap scan on its open tcp port. #754

Closed
sushmith opened this issue Jul 27, 2015 · 2 comments
Closed

Comments

@sushmith
Copy link

When we do a scan on the open port using nmap locally on the same machine( loopback address), it causes the ormcd to crash with a segfault.
The specific command that I used for nmap was "nmap -sT -p T:50000-60000 127.0.0.1"

Below is the core dump for the seg fault. As per the below core dump, the problem happens when the file "orcm/orte/mca/oob/tcp/oob_tcp_connection.c" when the variable "peer" is set to "NULL" in the "recv_handler " function. and the same thing is passed on to functions mca_oob_tcp_peer_recv_connect_ack ( pr=0x0) and later to tcp_peer_recv_blocking ( peer 0x0). The pointer gets dereferenced in function tcp_peer_recv_blocking by this line of code "if (peer->state == MCA_OOB_TCP_CONNECT_ACK)"

Below is the core dump of the segfault:
"
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff44342d8 in tcp_peer_recv_blocking (peer=0x0, sd=32, data=0x7fffffffc930, size=28) at oob_tcp_connection.c:1012
1012 if (peer->state == MCA_OOB_TCP_CONNECT_ACK) {
(gdb) rc
Target multi-thread does not support this command.
(gdb) bt
#0 0x00007ffff44342d8 in tcp_peer_recv_blocking (peer=0x0, sd=32, data=0x7fffffffc930, size=28) at oob_tcp_connection.c:1012
#1 0x00007ffff443320f in mca_oob_tcp_peer_recv_connect_ack (pr=0x0, sd=32, dhdr=0x7fffffffc9d0) at oob_tcp_connection.c:664
#2 0x00007ffff442e42c in recv_handler (sd=32, flg=2, cbdata=0x6d80b0) at oob_tcp.c:564
#3 0x00007ffff761142c in event_process_active_single_queue (activeq=0x64dd80, base=0x64dee0) at event.c:1370
#4 event_process_active (base=) at event.c:1440
#5 opal_libevent2022_event_base_loop (base=0x64dee0, flags=1) at event.c:1641
#6 0x0000000000402f0c in main (argc=1, argv=0x7fffffffce88) at orcmd.c:272
(gdb) frame
#0 0x00007ffff44342d8 in tcp_peer_recv_blocking (peer=0x0, sd=32, data=0x7fffffffc930, size=28) at oob_tcp_connection.c:1012
1012 if (peer->state == MCA_OOB_TCP_CONNECT_ACK) {
"

@rhc54
Copy link
Contributor

rhc54 commented Jul 28, 2015

All I can say is that you have a stale copy of orcm - from the public master:

        /* socket is non-blocking so handle errors */
        if (retval < 0) {
            if (opal_socket_errno != EINTR &&
                opal_socket_errno != EAGAIN &&
                opal_socket_errno != EWOULDBLOCK) {
                if (NULL == peer) {
                    /* protect against things like port scanners */
                    CLOSE_THE_SOCKET(sd);
                    return false;
                } else if (peer->state == MCA_OOB_TCP_CONNECT_ACK) {
                    /* If we overflow the listen backlog, it's
                       possible that even though we finished the three
                       way handshake, the remote host was unable to
                       transition the connection from half connected
                       (received the initial SYN) to fully connected
                       (in the listen backlog).  We likely won't see
                       the failure until we try to receive, due to
                       timing and the like.  The first thing we'll get
                       in that case is a RST packet, which receive
                       will turn into a connection reset by peer
                       errno.  In that case, leave the socket in
                       CONNECT_ACK and propogate the error up to
                       recv_connect_ack, who will try to establish the
                       connection again */
                    opal_output_verbose(OOB_TCP_DEBUG_CONNECT, orte_oob_base_framework.framework_output,
                                        "%s connect ack received error %s from %s",
                                        ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
                                        strerror(opal_socket_errno),
                                        (NULL == peer) ? "UNKNOWN" : ORTE_NAME_PRINT(&(peer->name)));
                    return false;

I suggest you folks update

@sushmith
Copy link
Author

Thanks :) sorry.. my mistake.

jsquyres pushed a commit to jsquyres/ompi that referenced this issue Aug 23, 2016
opal_free_list: fix strange size check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants