Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel bug (1.8.2) #40

Closed
vtolstov opened this issue Jun 17, 2015 · 9 comments
Closed

kernel bug (1.8.2) #40

vtolstov opened this issue Jun 17, 2015 · 9 comments

Comments

@vtolstov
Copy link

I have kernel panic that shutdown two servers
I think that this is already fixed, but can @aabc check this. Thanks

dmesg from servers before die:
Jun 17 23:27:46 relay01 kernel: [7864672.083674] BUG: unable to handle kernel NULL pointer dereference at (null)
Jun 17 23:27:46 relay01 kernel: [7864672.086084] IP: [] netflow_target+0x5c1/0xb90 [ipt_NETFLOW]
Jun 17 23:27:46 relay01 kernel: [7864672.088555] PGD 0
Jun 17 23:27:46 relay01 kernel: [7864672.090974] Oops: 0000 [#1] SMP
Jun 17 23:27:46 relay01 kernel: [7864672.093393] Modules linked in: nfnetlink_log ip6table_mangle ip6table_raw iptable_mangle iptable_raw xt_u32 xt_pkttype xt_tcpudp xt_set xt_multiport ip_set_hash_net ip_set_hash_netport ip6table_filter ip6_tables iptable_filter ip_tables ip_set nfnetlink 8021q garp stp mrp llc ib_ucm ib_uverbs ib_addr ib_umad ib_ipoib ib_srp scsi_transport_srp ib_cm scsi_tgt mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core ipmi_devintf ipt_NETFLOW(O) x_tables md_mod dm_multipath scsi_dh scsi_mod radeon mperf coretemp kvm_intel ttm dm_mod drm_kms_helper kvm snd_pcm drm snd_page_alloc snd_timer psmouse snd soundcore iTCO_wdt iTCO_vendor_support i2c_algo_bit ipmi_si crc32c_intel i2c_core hpilo hpwdt lpc_ich ipmi_msghandler mfd_core pcspkr evdev serio_raw i7core_edac edac_core acpi_power_meter button processor squashfs loop aufs(C) hid_generic usbhid hid uhci_hcd ehci_pci ehci_hcd microcode usbcore usb_common bnx2 ixgbe dca mdio e1000e ptp pps_core thermal thermal_sys [last unloaded: nf_defrag_ipv4]
Jun 17 23:27:46 relay01 kernel: [7864672.118365] CPU: 13 PID: 0 Comm: swapper/13 Tainted: G CIO 3.10-3-amd64 #1 Debian 3.10.61-1+020141126043804.83+wheezy1.gbp631db1
Jun 17 23:27:46 relay01 kernel: [7864672.125605] Hardware name: HP ProLiant DL380 G6, BIOS P62 07/02/2013
Jun 17 23:27:46 relay01 kernel: [7864672.129300] task: ffff8806070c60c0 ti: ffff88060710c000 task.ti: ffff88060710c000
Jun 17 23:27:46 relay01 kernel: [7864672.133111] RIP: 0010:[] [] netflow_target+0x5c1/0xb90 [ipt_NETFLOW]
Jun 17 23:27:46 relay01 kernel: [7864672.136993] RSP: 0018:ffff880a1fac39a0 EFLAGS: 00010246
Jun 17 23:27:46 relay01 kernel: [7864672.140929] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
Jun 17 23:27:46 relay01 kernel: [7864672.144919] RDX: ffff880a1fac39ec RSI: 0000000000000014 RDI: ffff880a08090880
Jun 17 23:27:46 relay01 kernel: [7864672.148908] RBP: ffff8809dd4f1054 R08: ffffc9000d89a368 R09: ffff880a08090880
Jun 17 23:27:46 relay01 kernel: [7864672.152954] R10: 0000000000000001 R11: 00000000ffffffff R12: 0000000000000000
Jun 17 23:27:46 relay01 kernel: [7864672.157008] R13: 0000000000000014 R14: ffff880a08090880 R15: ffff8805c3b81c00
Jun 17 23:27:46 relay01 kernel: [7864672.161061] FS: 0000000000000000(0000) GS:ffff880a1fac0000(0000) knlGS:0000000000000000
Jun 17 23:27:46 relay01 kernel: [7864672.165172] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 17 23:27:46 relay01 kernel: [7864672.169337] CR2: 0000000000000000 CR3: 000000000160c000 CR4: 00000000000007e0
Jun 17 23:27:46 relay01 kernel: [7864672.173537] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 17 23:27:46 relay01 kernel: [7864672.177745] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 17 23:27:46 relay01 kernel: [7864672.181886] Stack:
Jun 17 23:27:46 relay01 kernel: [7864672.186040] ffffffffa04f11ae 0000000000000014 0100000000000002 0000000000000001
Jun 17 23:27:46 relay01 kernel: [7864672.190336] ffffffffa02332cc ffff880a1fac3b80 ffff8806015b5300 ffff880605405e40
Jun 17 23:27:46 relay01 kernel: [7864672.194823] ffff8809dd4f1054 ffff880a1fac3a80 ffff880605405e60 ffff880a08090880
Jun 17 23:27:46 relay01 kernel: [7864672.199160] Call Trace:
Jun 17 23:27:46 relay01 kernel: [7864672.203483]
Jun 17 23:27:46 relay01 kernel: [7864672.203517] [] ? hash_net4_test+0x83/0x228 [ip_set_hash_net]
Jun 17 23:27:46 relay01 kernel: [7864672.212189] [] ? ftrace_raw_event_irq_handler_entry+0x47/0xe9
Jun 17 23:27:46 relay01 kernel: [7864672.216618] [] ? ip_set_test+0x7b/0xdb [ip_set]
Jun 17 23:27:46 relay01 kernel: [7864672.221117] [] ? ipt_do_table+0x518/0x5a7 [ip_tables]
Jun 17 23:27:46 relay01 kernel: [7864672.225567] [] ? ipt_do_table+0x578/0x5a7 [ip_tables]
Jun 17 23:27:46 relay01 kernel: [7864672.230012] [] ? dev_hard_start_xmit+0x272/0x3ed
Jun 17 23:27:46 relay01 kernel: [7864672.234445] [] ? dst_mtu+0xa/0xa
Jun 17 23:27:46 relay01 kernel: [7864672.238897] [] ? nf_iterate+0x42/0x80
Jun 17 23:27:46 relay01 kernel: [7864672.243325] [] ? nf_hook_slow+0x69/0xfe
Jun 17 23:27:46 relay01 kernel: [7864672.247772] [] ? dst_mtu+0xa/0xa
Jun 17 23:27:46 relay01 kernel: [7864672.252176] [] ? ip_forward+0x2af/0x38b
Jun 17 23:27:46 relay01 kernel: [7864672.252176] [] ? ip_forward+0x2af/0x38b
Jun 17 23:27:46 relay01 kernel: [7864672.256613] [] ? __netif_receive_skb_core+0x447/0x4bf
Jun 17 23:27:46 relay01 kernel: [7864672.261060] [] ? netif_receive_skb+0x4c/0x7d
Jun 17 23:27:46 relay01 kernel: [7864672.265526] [] ? napi_gro_receive+0x35/0x76
Jun 17 23:27:46 relay01 kernel: [7864672.269918] [] ? bnx2_poll_work+0x913/0xa07 [bnx2]
Jun 17 23:27:46 relay01 kernel: [7864672.274324] [] ? bnx2_poll_msix+0x28/0x6f [bnx2]
Jun 17 23:27:46 relay01 kernel: [7864672.278614] [] ? net_rx_action+0xa7/0x1dc
Jun 17 23:27:46 relay01 kernel: [7864672.282996] [] ? enqueue_hrtimer+0x36/0x6d
Jun 17 23:27:46 relay01 kernel: [7864672.287215] [] ? add_interrupt_randomness+0x39/0x16f
Jun 17 23:27:46 relay01 kernel: [7864672.291388] [] ? __do_softirq+0xec/0x209
Jun 17 23:27:46 relay01 kernel: [7864672.295429] [] ? call_softirq+0x1c/0x30
Jun 17 23:27:46 relay01 kernel: [7864672.299351] [] ? do_softirq+0x3a/0x78
Jun 17 23:27:46 relay01 kernel: [7864672.303122] [] ? irq_exit+0x3f/0x83
Jun 17 23:27:46 relay01 kernel: [7864672.306757] [] ? do_IRQ+0x81/0x97
Jun 17 23:27:46 relay01 kernel: [7864672.310337] [] ? common_interrupt+0x6d/0x6d
Jun 17 23:27:46 relay01 kernel: [7864672.313752]
Jun 17 23:27:46 relay01 kernel: [7864672.313785] [] ? arch_local_irq_enable+0x4/0x8
Jun 17 23:27:46 relay01 kernel: [7864672.320300] [] ? cpuidle_enter_state+0x46/0xb1
Jun 17 23:27:46 relay01 kernel: [7864672.323503] [] ? cpuidle_idle_call+0xd6/0x147
Jun 17 23:27:46 relay01 kernel: [7864672.326675] [] ? arch_cpu_idle+0x6/0x1a
Jun 17 23:27:46 relay01 kernel: [7864672.329885] [] ? cpu_startup_entry+0x125/0x1a5
Jun 17 23:27:46 relay01 kernel: [7864672.332854] [] ? _raw_spin_unlock_irqrestore+0xc/0xd
Jun 17 23:27:46 relay01 kernel: [7864672.335873] [] ? start_secondary+0x1e6/0x1ec
Jun 17 23:27:46 relay01 kernel: [7864672.338856] Code: 40 04 eb 29 48 8d 4c 24 4c ba 04 00 00 00 44 89 ee 4c 89 f7 e8 57 e2 ff ff 48 85 c0 74 0d 8b 10 c1 ea 10 66 89 94 24 80 00 00 00 <8b> 00 66 89 84 24 82 00 00 00 31 ed 31 db 48 c7 c7 d0 25 23 a0
Jun 17 23:27:46 relay01 kernel: [7864672.345493] RIP [] netflow_target+0x5c1/0xb90 [ipt_NETFLOW]
Jun 17 23:27:46 relay01 kernel: [7864672.348733] RSP
Jun 17 23:27:46 relay01 kernel: [7864672.351940] CR2: 0000000000000000

@vtolstov
Copy link
Author

kernel module compiled for 3.10 kernel (binary http://bb.selfip.ru/ipt_NETFLOW.ko)

@aabc
Copy link
Owner

aabc commented Jun 18, 2015

Thanks. But. You have module compiled without debug info which make it very hard to identify location of crash.

If you have system not upgraded too much (gcc, etc) you may recompile same version of module with debug info and send its binary to me. That would be helpful.

There is manual how to recompile module: #27 (comment)

@sewer2
Copy link

sewer2 commented Jun 19, 2015

binary with debug info: http://storage-208744-1.cs.clodoserver.ru/ipt_NETFLOW.ko

@aabc
Copy link
Owner

aabc commented Jun 19, 2015

Thanks. I was unable to locate exact version of source for your binary, though. Excuse me, I forgot to request to pack source too. Please provide source that you used to build the module (second one, that with debug info).

@aabc
Copy link
Owner

aabc commented Jun 19, 2015

Imprecisely speaking, I suspect that crash is in this ipsec/esp code:

                        if (likely(hp = skb_header_pointer(skb, ptr, 4, &_hdr)))
                                tuple.s_port = hp->spi >> 16;
                                tuple.d_port = hp->spi;
                        }

This one is fixed long ago.

But for precise answer I need exact source that is compiled into .ko together with the .ko.

@sewer2
Copy link

sewer2 commented Jun 19, 2015

@aabc
Copy link
Owner

aabc commented Jun 19, 2015

Jun 17 23:27:46 relay01 kernel: [7864672.338856] Code: 40 04 eb 29, 48 8d 4c 24 4c, ba 04 00 00 00, 44 89 ee, 4c 89 f7, e8 57 e2 ff ff, 48 85 c0, 74 0d, 8b 10, c1 ea 10, 66 89 94 24 80 00 00 00, <8b> 00, 66 89 84 24 82 00 00 00, 31 ed, 31 db 48 c7 c7 d0 25 23 a0

Corresponding asm:

    2775:       48 8d 4c 24 4c          lea    0x4c(%rsp),%rcx
    277a:       ba 04 00 00 00          mov    $0x4,%edx
    277f:       44 89 ee                mov    %r13d,%esi
    2782:       4c 89 f7                mov    %r14,%rdi
    2785:       e8 57 e2 ff ff          callq  9e1 <skb_header_pointer>
    278a:       48 85 c0                test   %rax,%rax
    278d:       74 0d                   je     279c <netflow_target+0x5c1>
    278f:       8b 10                   mov    (%rax),%edx
    2791:       c1 ea 10                shr    $0x10,%edx
    2794:       66 89 94 24 80 00 00    mov    %dx,0x80(%rsp)
    279b:       00
    279c:      <8b> 00                  mov    (%rax),%eax
    279e:       66 89 84 24 82 00 00    mov    %ax,0x82(%rsp)
    27a5:       00
    27a6:       31 ed                   xor    %ebp,%ebp

@aabc
Copy link
Owner

aabc commented Jun 19, 2015

repo with source.

Thanks much for this!

                    case IPPROTO_ESP: {
                        struct ip_esp_hdr _hdr, *hp;

                        if (likely(hp = skb_header_pointer(skb, ptr, 4, &_hdr)))
    2c95:       48 8d 4c 24 4c          lea    0x4c(%rsp),%rcx
    2c9a:       ba 04 00 00 00          mov    $0x4,%edx
    2c9f:       44 89 ee                mov    %r13d,%esi
    2ca2:       4c 89 f7                mov    %r14,%rdi
    2ca5:       e8 19 df ff ff          callq  bc3 <skb_header_pointer>
    2caa:       48 85 c0                test   %rax,%rax
    2cad:       74 0d                   je     2cbc <netflow_target+0x5c1>
                                tuple.s_port = hp->spi >> 16;
    2caf:       8b 10                   mov    (%rax),%edx
    2cb1:       c1 ea 10                shr    $0x10,%edx
    2cb4:       66 89 94 24 80 00 00    mov    %dx,0x80(%rsp)
    2cbb:       00
                                tuple.d_port = hp->spi;
    2cbc:      <8b> 00                  mov    (%rax),%eax
    2cbe:       66 89 84 24 82 00 00    mov    %ax,0x82(%rsp)
    2cc5:       00

Yes, this is exactly code I suspected above. And it's fixed already. Thanks for the help.

@vtolstov
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants