I've got a really strange problem running Xenserver 7.0 (tried 7.1 as well) on a Dell M620 with BCM57810 network card.
The whole setup is fine and running flawlessly with no traffic. I've got a Windows Server 2016 running and can access it with RDC through a Vyos firewall etc. On another virtual machine I want to run an owncloud instance and add another IP to the network interface and forward the traffic to it. As soon as I access the owncloud http interface, the whole server is crashing with a kernel panic and error messages relating to the Broadcom network driver.
device tap13.0 left promiscuous mode
device vif13.0 left promiscuous mode
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1a4/0x280()
NETDEV WATCHDOG: eth0 (bnx2x): transmit queue 0 timed out
Modules linked in: btrfs zlib_deflate raid6_pq xor xfs tun nfsv3 nfs fscache bnx2fc(O) cnic(O) uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt openvswitch(O) gre 8021q garp mrp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_multiport dm_multipath xt_conntrack nf_conntrack iptable_filter ipmi_devintf coretemp crc32_pclmul aesni_intel aes_x86_64 ablk_helper cryptd lrw lpc_ich mfd_core sg ipmi_si ipmi_msghandler wmi sb_edac edac_core hed shpchp nfsd auth_rpcgss oid_registry nfs_acl lockd nls_utf8 isofs sunrpc ip_tables x_tables hid_generic usbhid hid sd_mod ahci libahci libata bnx2x(O) ehci_pci ehci_hcd mdio libcrc32c ptp megaraid_sas(O) pps_core scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh scsi_mod ipv6 autofs4
CPU: 6 PID: 0 Comm: swapper/6 Tainted: G O 3.10.0+10 #1
Hardware name: Dell Inc. PowerEdge M620/0VHRN7, BIOS 2.5.4 01/27/2016
0000000000000009 ffff8801354c3d58 ffffffff815427c7 ffff8801354c3d90
ffffffff81054da1 ffff88012e210000 0000000000000000 0000000000000006
ffff88012efe7100 ffff88012efe7080 ffff8801354c3df0 ffffffff81054e0c
Call Trace:
<IRQ> [<ffffffff815427c7>] dump_stack+0x19/0x1b
[<ffffffff81054da1>] warn_slowpath_common+0x61/0x80
[<ffffffff81054e0c>] warn_slowpath_fmt+0x4c/0x50
[<ffffffff8149cd44>] dev_watchdog+0x1a4/0x280
[<ffffffff8149cba0>] ? dev_deactivate_queue.constprop.29+0x60/0x60
[<ffffffff81063cd3>] call_timer_fn+0x53/0x130
[<ffffffff8149cba0>] ? dev_deactivate_queue.constprop.29+0x60/0x60
[<ffffffff810658fd>] run_timer_softirq+0x22d/0x290
[<ffffffff8105d48b>] __do_softirq+0xfb/0x240
[<ffffffff8155255c>] call_softirq+0x1c/0x30
[<ffffffff81014203>] do_softirq+0x43/0x80
[<ffffffff8105d6d9>] irq_exit+0x49/0xa0
[<ffffffff81384b55>] xen_evtchn_do_upcall+0x35/0x50
[<ffffffff815525be>] xen_do_hypervisor_callback+0x1e/0xa0
<EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[<ffffffff8100a340>] ? xen_safe_halt+0x10/0x30
[<ffffffff8101a844>] ? default_idle+0x44/0xd0
[<ffffffff8101b038>] ? arch_cpu_idle+0x18/0x30
[<ffffffff810a3532>] ? cpu_startup_entry+0x1c2/0x280
[<ffffffff8152e11d>] ? cpu_bringup_and_idle+0x13/0x15
---[ end trace 3267d319304e6e4c ]---
ULP_STOP
bnx2fc: ERROR:bnx2fc_destroy_timer - Destroy compl not received!!
bnx2x: [bnx2x_stats_comp:211(eth0)]timeout waiting for stats finished
bnx2x: [bnx2x_stats_comp:211(eth0)]timeout waiting for stats finished
[bnx2x_clean_tx_queue:1624(eth0)]timeout waiting for queue[0]: txdata->tx_pkt_prod(17962) != txdata->tx_pkt_cons(17955)
[bnx2x_clean_tx_queue:1624(eth0)]timeout waiting for queue[24]: txdata->tx_pkt_prod(49476) != txdata->tx_pkt_cons(49474)
[bnx2x_clean_tx_queue:1624(eth0)]timeout waiting for queue[0]: txdata->tx_pkt_prod(17962) != txdata->tx_pkt_cons(17955)
[bnx2x_clean_tx_queue:1624(eth0)]timeout waiting for queue[24]: txdata->tx_pkt_prod(49476) != txdata->tx_pkt_cons(49474)
[bnx2x_state_wait:329(eth0)]timeout waiting for state 0
bnx2x: [bnx2x_del_all_macs:9335(eth0)]Failed to delete MACs: -16
bnx2x: [bnx2x_chip_cleanup:10164(eth0)]Failed to schedule DEL commands for UC MACs list: -16
[bnx2x_state_wait:329(eth0)]timeout waiting for state 9
[bnx2x_state_wait:329(eth0)]timeout waiting for state 2
bnx2x: [bnx2x_func_stop:9935(eth0)]FUNC_STOP ramrod failed. Running a dry transaction
bnx2x: [bnx2x_issue_dmae_with_comp:757(eth0)]DMAE timeout!
bnx2x: [bnx2x_write_dmae:806(eth0)]DMAE returned failure -1
bnx2x: [bnx2x_issue_dmae_with_comp:757(eth0)]DMAE timeout!
bnx2x: [bnx2x_write_dmae:806(eth0)]DMAE returned failure -1
bnx2x: [bnx2x_issue_dmae_with_comp:757(eth0)]DMAE timeout!
bnx2x: [bnx2x_write_dmae:806(eth0)]DMAE returned failure -1
bnx2x: [bnx2x_issue_dmae_with_comp:757(eth0)]DMAE timeout!
bnx2x: [bnx2x_write_dmae:806(eth0)]DMAE returned failure -1
The network diagram is as follows:
Unfortunately I cannot install the vendor driver since I don't have the kernel headers to compile the driver manually.
I tried to disable virtual interfaces in the NIC configuration but without any success. Also disable_tpa or other module parameter didn't give me any success.
Hope anyone has any ideas.
I've recently had the same trouble with Xenserver 7.1 and Ubuntu VM
Server Dell R730
NIC Broadcom Limited NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
In my case the trouble was in the vlan handling.
When I tried to handle Vlan on Xen and connect 4 virtual nic's with selected Vlan's from Xenserver to a VM - the whole hardware server had repeatedly crashed 7-10 minutes after starting this VM.
A workaround was to pass the whole eth0 interface to VM, and after that to handle Vlan's inside the VM itself (eth0.100, eth0.200 etc)