Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ovn-controller show 100% cpu #270

Open
gujun4990 opened this issue Feb 6, 2025 · 2 comments
Open

ovn-controller show 100% cpu #270

gujun4990 opened this issue Feb 6, 2025 · 2 comments

Comments

@gujun4990
Copy link
Contributor

Environment

ovs: 2.16.2
ovn: branch-21.09

Description

we run ovn-appctl command and found there's no any message. Then we run top command:

top - 16:51:07 up 15 days, 23:13,  0 users,  load average: 11.70, 14.03, 14.79
Tasks: 103 total,   2 running, 101 sleeping,   0 stopped,   0 zombie
%Cpu(s): 18.8 us, 10.8 sy,  0.0 ni, 68.6 id,  0.2 wa,  0.8 hi,  0.8 si,  0.0 st
KiB Mem : 16393728+total, 43273656 free, 95894120 used, 24769496 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 64888424 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                
  140 root      20   0  440652 167656   7252 R 100.0  0.1  10163:14 ovn-controller  

we use strace command to trace stack:

()[root@node-1 /]# strace -tt -T -s 256 -ff -p 140
strace: Process 140 attached with 4 threads
strace: [ Process PID=140 runs in x32 mode. ]
[pid   145] 16:50:09.849063 restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
[pid   142] 16:50:09.849410 restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
[pid   141] 16:50:09.849444 futex(0x558fe9833dc0, FUTEX_WAIT_PRIVATE, 2, NULL

^Cstrace: Process 140 detached
strace: Process 141 detached
 <detached ...>
strace: Process 142 detached
strace: Process 145 detached

Then check the 141 process:

()[root@node-1 /]# ps -T -p 140
  PID  SPID TTY          TIME CMD
  140   140 ?        7-01:24:47 ovn-controller
  140   141 ?        00:00:00 ovn_pinctrl0
  140   142 ?        00:00:00 urcu1
  140   145 ?        00:00:00 stopwatch2

We also query related sockets for ovn-controller:

()[root@node-1 ~]# lsof -p 140
COMMAND   PID USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME
ovn-contr 140 root  cwd       DIR             0,1851       74  509253456 /
ovn-contr 140 root  rtd       DIR             0,1851       74  509253456 /
ovn-contr 140 root  txt       REG             0,1851  3122480  170200702 /usr/bin/ovn-controller
ovn-contr 140 root  mem       REG              252,1           170200702 /usr/bin/ovn-controller (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851640 /usr/lib64/libpcre.so.1.2.0 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851680 /usr/lib64/libselinux.so.1 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851723 /usr/lib64/libutil-2.17.so (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851666 /usr/lib64/libresolv-2.17.so (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851560 /usr/lib64/libkeyutils.so.1.5 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851568 /usr/lib64/libkrb5support.so.0.1 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851661 /usr/lib64/libpython2.7.so.1.0 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           439395477 /usr/lib64/libevent-2.0.so.5.1.9 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851732 /usr/lib64/libz.so.1.2.7 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851493 /usr/lib64/libdl-2.17.so (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851556 /usr/lib64/libk5crypto.so.3.1 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851475 /usr/lib64/libcom_err.so.2.1 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851566 /usr/lib64/libkrb5.so.3.3 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851542 /usr/lib64/libgssapi_krb5.so.2.2 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851466 /usr/lib64/libc-2.17.so (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           441237570 /usr/lib64/libunbound.so.2.5.5 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851580 /usr/lib64/libm-2.17.so (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851676 /usr/lib64/librt-2.17.so (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851657 /usr/lib64/libpthread-2.17.so (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851469 /usr/lib64/libcap-ng.so.0.0.0 (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851480 /usr/lib64/libcrypto.so.1.0.2k (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851694 /usr/lib64/libssl.so.1.0.2k (path dev=0,1851)
ovn-contr 140 root  mem       REG              252,1           201851378 /usr/lib64/ld-2.17.so (path dev=0,1851)
ovn-contr 140 root    0u      CHR                1,3      0t0 1128893916 /dev/null
ovn-contr 140 root    1w     FIFO               0,13      0t0 1128863476 pipe
ovn-contr 140 root    2w     FIFO               0,13      0t0 1128863477 pipe
ovn-contr 140 root    3w      REG             0,1851        0  203210529 /var/log/ovn/ovn-controller.log
ovn-contr 140 root    4uW     REG             0,1851        4  237506068 /run/ovn/ovn-controller.pid
ovn-contr 140 root    5r     FIFO               0,13      0t0 1128911971 pipe
ovn-contr 140 root    6w     FIFO               0,13      0t0 1128911971 pipe
ovn-contr 140 root    7u     unix 0xffff944b93ae2880      0t0 1128911972 /var/run/ovn/ovn-controller.140.ctl
ovn-contr 140 root    8r     FIFO               0,13      0t0 1128911973 pipe
ovn-contr 140 root    9w     FIFO               0,13      0t0 1128911973 pipe
ovn-contr 140 root   10r     FIFO               0,13      0t0 1128896882 pipe
ovn-contr 140 root   11w     FIFO               0,13      0t0 1128896882 pipe
ovn-contr 140 root   12r     FIFO               0,13      0t0 1128896883 pipe
ovn-contr 140 root   13w     FIFO               0,13      0t0 1128896883 pipe
ovn-contr 140 root   14u     unix 0xffff944b93ae5100      0t0 1128911974 socket
ovn-contr 140 root   15r     FIFO               0,13      0t0 1128902933 pipe
ovn-contr 140 root   16w     FIFO               0,13      0t0 1128902933 pipe
ovn-contr 140 root   17r     FIFO               0,13      0t0 1128896890 pipe
ovn-contr 140 root   18w     FIFO               0,13      0t0 1128896890 pipe
ovn-contr 140 root   19u     unix 0xffff9444167e8900      0t0 1128896891 socket
ovn-contr 140 root   20u     unix 0xffff9444167e9680      0t0 1128896892 socket
ovn-contr 140 root   21u     unix 0xffff9444167eda00      0t0 1128896893 socket
ovn-contr 140 root   22u     unix 0xffff9444167e9200      0t0 1128896894 socket
ovn-contr 140 root   23u     IPv4         1128892267      0t0        TCP node-1:48300->ovn-ovsdb-sb-relay.openstack.svc.cluster.local:6642 (CLOSE_WAIT)
ovn-contr 140 root   24u     unix 0xffff944af18ba400      0t0 1128892370 socket
ovn-contr 140 root   25u     unix 0xffff944af18bda00      0t0 1128892371 socket
ovn-contr 140 root   26r     FIFO               0,13      0t0 1128892372 pipe
ovn-contr 140 root   27w     FIFO               0,13      0t0 1128892372 pipe
ovn-contr 140 root   28u     sock                0,9      0t0 1128898179 protocol: UDP
ovn-contr 140 root   29u  netlink                         0t0 1128898187 ROUTE
ovn-contr 140 root   30u  netlink                         0t0 1128898188 GENERIC
ovn-contr 140 root   31u     unix 0xffff944b93ae4c80      0t0 1128912127 socket
()[root@node-1 ~]# netstat -tnulpa|grep 6642
tcp        0      0 192.168.10.3:57790      10.222.218.160:6642     ESTABLISHED -                   
tcp        0      0 192.168.10.3:46216      10.222.218.160:6642     TIME_WAIT   -                   
tcp        0      0 192.168.10.3:58640      10.222.218.160:6642     ESTABLISHED -                   
tcp        0      0 192.168.10.3:60284      10.222.218.160:6642     TIME_WAIT   -                   
tcp   251042      0 192.168.10.3:48300      10.222.218.160:6642     CLOSE_WAIT  140/ovn-controller  
tcp        0      0 192.168.10.3:59722      10.222.218.160:6642     ESTABLISHED -                   
tcp        0      0 192.168.10.3:57978      10.222.218.160:6642     ESTABLISHED - 

We found the connection from sb-relay stay close_wait state. Based on the above information, we are not sure that the connection cause the ovn_pinctrl0 process is locked? Or the lock from ovn_pinctrl0 process cause the connection from sb-relay stay close_wait state?

@gujun4990
Copy link
Contributor Author

we also generate a Flame graph

Image

@gujun4990
Copy link
Contributor Author

we use gdb to debug the process:

Loaded symbols for /lib64/libpcre.so.1
0x000055cc98218201 in hmap_next_with_hash__ (hash=915080580, node=0x55cc9c39ea50) at include/openvswitch/hmap.h:323
323	    while (node != NULL && node->hash != hash) {
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.centos.es.x86_64 keyutils-libs-1.5.8-3.el7.centos.es.x86_64 krb5-libs-1.15.1-37.el7_6.5es.x86_64 libcap-ng-0.7.5-4.el7.centos.es.x86_64 libcom_err-1.42.9-16.el7.centos.es.1.x86_64 libevent-2.0.21-4.el7.centos.es.x86_64 libselinux-2.5-14.1.el7.centos.es.x86_64 openssl-libs-1.0.2k-19.el7.centos.es.x86_64 pcre-8.32-17.el7.centos.es.x86_64 python-libs-2.7.5-86.el7.centos.es.x86_64 unbound-libs-1.6.6-1.el7.centos.es.x86_64 zlib-1.2.7-18.el7.centos.es.x86_64
(gdb) bt
#0  0x000055cc98218201 in hmap_next_with_hash__ (hash=915080580, node=0x55cc9c39ea50) at include/openvswitch/hmap.h:323
#1  hmap_first_with_hash (hmap=hmap@entry=0x55cc9e39bd18, hmap=hmap@entry=0x55cc9e39bd18, hash=915080580) at include/openvswitch/hmap.h:334
#2  smap_find__ (smap=smap@entry=0x55cc9e39bd18, key=key@entry=0x55cc98309a56 "peer", key_len=4, hash=915080580) at lib/smap.c:418
#3  0x000055cc982187e6 in smap_get_node (smap=smap@entry=0x55cc9e39bd18, key=key@entry=0x55cc98309a56 "peer") at lib/smap.c:217
#4  0x000055cc98218849 in smap_get_def (def=0x0, key=key@entry=0x55cc98309a56 "peer", smap=smap@entry=0x55cc9e39bd18) at lib/smap.c:208
#5  smap_get (smap=smap@entry=0x55cc9e39bd18, key=key@entry=0x55cc98309a56 "peer") at lib/smap.c:200
#6  0x000055cc9815da41 in prepare_ipv6_ras (sbrec_port_binding_by_name=0x55cc987d4000, local_active_ports_ras=0x55cc987d87c8) at controller/pinctrl.c:4086
#7  pinctrl_run (ovnsb_idl_txn=ovnsb_idl_txn@entry=0x55cc9a84d5b0, sbrec_datapath_binding_by_key=sbrec_datapath_binding_by_key@entry=0x55cc987d46e0, 
    sbrec_port_binding_by_datapath=sbrec_port_binding_by_datapath@entry=0x55cc987d4380, sbrec_port_binding_by_key=sbrec_port_binding_by_key@entry=0x55cc987d41b0, 
    sbrec_port_binding_by_name=sbrec_port_binding_by_name@entry=0x55cc987d4000, sbrec_mac_binding_by_lport_ip=sbrec_mac_binding_by_lport_ip@entry=0x55cc987d4890, 
    sbrec_igmp_groups=sbrec_igmp_groups@entry=0x55cc987d4c10, sbrec_ip_multicast_opts=sbrec_ip_multicast_opts@entry=0x55cc987d4a60, sbrec_fdb_by_dp_key_mac=sbrec_fdb_by_dp_key_mac@entry=0x55cc987d4fa0, 
    dns_table=0x55cc987d0dd0, ce_table=ce_table@entry=0x55cc987d0dd0, svc_mon_table=svc_mon_table@entry=0x55cc987d0dd0, bfd_table=bfd_table@entry=0x55cc987d0dd0, br_int=br_int@entry=0x55cc9883b890, 
    chassis=chassis@entry=0x55cc991adef0, local_datapaths=local_datapaths@entry=0x55cc987d8660, active_tunnels=active_tunnels@entry=0x55cc987d8720, 
    local_active_ports_ipv6_pd=local_active_ports_ipv6_pd@entry=0x55cc987d87a8, local_active_ports_ras=local_active_ports_ras@entry=0x55cc987d87c8) at controller/pinctrl.c:3613
#8  0x000055cc9813452f in main (argc=8, argv=0x7ffeb7f51b88) at controller/ovn-controller.c:3769

we found the hmap data are not correct.
Image
Image
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant