CentOS release 5.4 (Final) + Nic Bonding + machine goes out of network randomly + 2.6.18-164.6.1.el5 #1 SMP kernel

Hi We are facing a strange problem from past few days. Below are the logs attached for ref. We are using nic bonding to our dell server. It has centos 5.4 installed with 4 nic cards. This machine was working good from past few months. But from previous 2-3 days it went out of network by itself. Any pointer or help would be much appreciated. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ cat /proc/cpuinfo processor : 15 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz stepping : 5 ============================================================================================== [root(a)S log]# uname -a Linux ABC.NETXXXXXX 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64 x86_64 x86_64 GNU/Linux ============================================================================================ [root(a)S log]# cat /etc/redhat-release CentOS release 5.4 (Final) [root(a)SJC-SRCH-03-R ~]# dmesg |grep eth | more eth0: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem d6000000, IRQ 90, node addr 00219b8fd3bc eth1: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem d8000000, IRQ 98, node addr 00219b8fd3be eth2: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem da000000, IRQ 106, node addr 00219b8fd3c0 eth3: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem dc000000, IRQ 114, node addr 00219b8fd3c2 cnic: Added CNIC device: eth0 cnic: Added CNIC device: eth1 cnic: Added CNIC device: eth2 cnic: Added CNIC device: eth3 bonding: bond0: Adding slave eth0. bnx2: eth0: using MSIX bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: enslaving eth0 as a backup interface with a down link. bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: Adding slave eth1. bnx2: eth1: using MSIX bnx2i: iSCSI not supported, dev=eth1 bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: enslaving eth1 as a backup interface with a down link. bnx2i: iSCSI not supported, dev=eth1 bonding: bond0: link status definitely up for interface eth0. bonding: bond0: making interface eth0 the new active one. bonding: bond0: link status definitely up for interface eth1. bonding: bond0: link status definitely down for interface eth1, disabling it bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth1. bonding: bond0: making interface eth1 the new active one. bnx2: eth2: using MSIX ADDRCONF(NETDEV_UP): eth2: link is not ready bnx2i: iSCSI not supported, dev=eth2 bnx2i: iSCSI not supported, dev=eth2 bnx2: eth2: using MSIX ADDRCONF(NETDEV_UP): eth2: link is not ready bnx2i: iSCSI not supported, dev=eth2 bnx2i: iSCSI not supported, dev=eth2 bnx2: eth3: using MSIX ADDRCONF(NETDEV_UP): eth3: link is not ready bnx2i: iSCSI not supported, dev=eth3 bnx2i: iSCSI not supported, dev=eth3 bnx2: eth3: using MSIX ADDRCONF(NETDEV_UP): eth3: link is not ready bnx2i: iSCSI not supported, dev=eth3 bnx2i: iSCSI not supported, dev=eth3 bnx2: eth2: using MSIX ADDRCONF(NETDEV_UP): eth2: link is not ready bnx2i: iSCSI not supported, dev=eth2 bnx2: eth2: using MSIX ADDRCONF(NETDEV_UP): eth2: link is not ready bnx2i: iSCSI not supported, dev=eth2 bnx2i: iSCSI not supported, dev=eth2 bnx2: eth3: using MSIX ADDRCONF(NETDEV_UP): eth3: link is not ready bnx2i: iSCSI not supported, dev=eth3 bnx2: eth3: using MSIX ADDRCONF(NETDEV_UP): eth3: link is not ready bnx2i: iSCSI not supported, dev=eth3 bnx2i: iSCSI not supported, dev=eth3 bonding: bond0: Removing slave eth1 bonding: bond0: releasing active interface eth1 bonding: bond0: making interface eth0 the new active one.ease 5.4 (Final) bonding: bond0: Removing slave eth0 bonding: bond0: releasing active interface eth0 bonding: unable to remove non-existent slave eth1 for bond bond0. bonding: bond0: Adding slave eth0. bnx2: eth0: using MSIX bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: enslaving eth0 as a backup interface with a down link. bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: Adding slave eth1. bnx2: eth1: using MSIX bnx2i: iSCSI not supported, dev=eth1 bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: enslaving eth1 as a backup interface with a down link. bonding: bond0: link status definitely up for interface eth0. bonding: bond0: making interface eth0 the new active one. bonding: bond0: link status definitely up for interface eth1. bnx2i: iSCSI not supported, dev=eth1 bonding: bond0: link status definitely down for interface eth1, disabling it bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth1. bonding: bond0: making interface eth1 the new active one. bonding: bond0: Removing slave eth0 bonding: bond0: Warning: the permanent HWaddr of eth0 - 00:21:9B:8F:D3:BC - is still in use by bond0. Set the HWaddr of eth0 to a different address to avoid conflicts. bonding: bond0: releasing backup interface eth0 bonding: bond0: Removing slave eth1 bonding: bond0: releasing active interface eth1 bonding: bond0: Adding slave eth0. bnx2: eth0: using MSIX bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: enslaving eth0 as a backup interface with a down link. bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: Adding slave eth1. bnx2: eth1: using MSIX bnx2i: iSCSI not supported, dev=eth1 bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: enslaving eth1 as a backup interface with a down link. bonding: bond0: link status definitely up for interface eth0. bonding: bond0: making interface eth0 the new active one. bonding: bond0: link status definitely up for interface eth1. bnx2i: iSCSI not supported, dev=eth1 bonding: bond0: link status definitely down for interface eth1, disabling it bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth1. bonding: bond0: making interface eth1 the new active one. =========================================================================================================================================== ============================================================================================ [root(a)S ~]# modinfo e1000 | more filename: /lib/modules/2.6.18-164.6.1.el5/kernel/drivers/net/e1000/e1000.ko version: 7.3.20-k2-NAPI license: GPL description: Intel(R) PRO/1000 Network Driver author: Intel Corporation, <linux.nics(a)intel.com> srcversion: 26DD82C709EB760C93D4103 alias: pci:v00008086d00001000sv*sd*bc*sc*i* depends: vermagic: 2.6.18-164.6.1.el5 SMP mod_unload gcc-4.1 parm: TxDescriptors:Number of transmit descriptors (array of int) parm: TxDescPower:Binary exponential size (2^X) of each transmit descriptor (array of int) parm: RxDescriptors:Number of receive descriptors (array of int) parm: Speed:Speed setting (array of int) parm: Duplex:Duplex setting (array of int) parm: AutoNeg:Advertised auto-negotiation setting (array of int) parm: FlowControl:Flow Control setting (array of int) parm: XsumRX:Disable or enable Receive Checksum offload (array of int) parm: TxIntDelay:Transmit Interrupt Delay (array of int) parm: TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int) parm: RxIntDelay:Receive Interrupt Delay (array of int) parm: RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int) parm: InterruptThrottleRate:Interrupt Throttling Rate (array of int) parm: SmartPowerDownEnable:Enable PHY smart power down (array of int) parm: KumeranLockLoss:Enable Kumeran lock loss workaround (array of int) parm: copybreak:Maximum size of packet that is copied to a new buffer on receive (uint) parm: debug:Debug level (0=none,...,16=all) (int) ================================================================================================= [root@ ~]# ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d ============================================= root(a)S ~]# ethtool -i eth0 driver: bnx2 version: 1.9.3 firmware-version: 4.6.4 NCSI 1.0.6 bus-info: 0000:01:00.0 [========================= [root@ ~]# ethtool -i eth1 driver: bnx2 version: 1.9.3 firmware-version: 4.6.4 NCSI 1.0.6 bus-info: 0000:01:00.1 ================================ [root(a)S ~]# ethtool -i bond0 driver: bonding version: 3.4.0 firmware-version: 2 bus-info: ===================================

This is a Redhat/centos Kernel bug. https://bugzilla.redhat.com/show_bug.cgi?id=520888 https://rhn.redhat.com/errata/RHSA-2010-0398.html in certain circumstances, under heavy load, certain network interface cards using the bnx2 driver and configured to use MSI-X, could stop processing interrupts and then network connectivity would cease. (BZ#587799) On Wed, Nov 24, 2010 at 5:23 PM, Narender <narender.hooda(a)gmail.com> wrote:
Hi
We are facing a strange problem from past few days. Below are the logs attached for ref.
We are using nic bonding to our dell server. It has centos 5.4 installed with 4 nic cards. This machine was working good from past few months. But from previous 2-3 days it went out of network by itself.
Any pointer or help would be much appreciated.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ cat /proc/cpuinfo
processor : 15 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5530 @ 2.40GHz stepping : 5 ==============================================================================================
[root(a)S log]# uname -a Linux ABC.NETXXXXXX 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64 x86_64 x86_64 GNU/Linux ============================================================================================
[root(a)S log]# cat /etc/redhat-release CentOS release 5.4 (Final)
[root(a)SJC-SRCH-03-R ~]# dmesg |grep eth | more eth0: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem d6000000, IRQ 90, node addr 00219b8fd3bc eth1: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem d8000000, IRQ 98, node addr 00219b8fd3be eth2: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem da000000, IRQ 106, node addr 00219b8fd3c0 eth3: Broadcom NetXtreme II BCM5709 1000Base-T (C0) PCI Express found at mem dc000000, IRQ 114, node addr 00219b8fd3c2 cnic: Added CNIC device: eth0 cnic: Added CNIC device: eth1 cnic: Added CNIC device: eth2 cnic: Added CNIC device: eth3 bonding: bond0: Adding slave eth0. bnx2: eth0: using MSIX bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: enslaving eth0 as a backup interface with a down link. bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: Adding slave eth1. bnx2: eth1: using MSIX bnx2i: iSCSI not supported, dev=eth1 bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: enslaving eth1 as a backup interface with a down link. bnx2i: iSCSI not supported, dev=eth1 bonding: bond0: link status definitely up for interface eth0. bonding: bond0: making interface eth0 the new active one. bonding: bond0: link status definitely up for interface eth1. bonding: bond0: link status definitely down for interface eth1, disabling it bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth1. bonding: bond0: making interface eth1 the new active one. bnx2: eth2: using MSIX ADDRCONF(NETDEV_UP): eth2: link is not ready bnx2i: iSCSI not supported, dev=eth2 bnx2i: iSCSI not supported, dev=eth2 bnx2: eth2: using MSIX ADDRCONF(NETDEV_UP): eth2: link is not ready bnx2i: iSCSI not supported, dev=eth2 bnx2i: iSCSI not supported, dev=eth2 bnx2: eth3: using MSIX ADDRCONF(NETDEV_UP): eth3: link is not ready bnx2i: iSCSI not supported, dev=eth3 bnx2i: iSCSI not supported, dev=eth3 bnx2: eth3: using MSIX ADDRCONF(NETDEV_UP): eth3: link is not ready bnx2i: iSCSI not supported, dev=eth3 bnx2i: iSCSI not supported, dev=eth3 bnx2: eth2: using MSIX ADDRCONF(NETDEV_UP): eth2: link is not ready bnx2i: iSCSI not supported, dev=eth2 bnx2: eth2: using MSIX ADDRCONF(NETDEV_UP): eth2: link is not ready bnx2i: iSCSI not supported, dev=eth2 bnx2i: iSCSI not supported, dev=eth2 bnx2: eth3: using MSIX ADDRCONF(NETDEV_UP): eth3: link is not ready bnx2i: iSCSI not supported, dev=eth3 bnx2: eth3: using MSIX ADDRCONF(NETDEV_UP): eth3: link is not ready bnx2i: iSCSI not supported, dev=eth3 bnx2i: iSCSI not supported, dev=eth3 bonding: bond0: Removing slave eth1 bonding: bond0: releasing active interface eth1 bonding: bond0: making interface eth0 the new active one.ease 5.4 (Final) bonding: bond0: Removing slave eth0 bonding: bond0: releasing active interface eth0 bonding: unable to remove non-existent slave eth1 for bond bond0. bonding: bond0: Adding slave eth0. bnx2: eth0: using MSIX bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: enslaving eth0 as a backup interface with a down link. bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: Adding slave eth1. bnx2: eth1: using MSIX bnx2i: iSCSI not supported, dev=eth1 bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: enslaving eth1 as a backup interface with a down link. bonding: bond0: link status definitely up for interface eth0. bonding: bond0: making interface eth0 the new active one. bonding: bond0: link status definitely up for interface eth1. bnx2i: iSCSI not supported, dev=eth1 bonding: bond0: link status definitely down for interface eth1, disabling it bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth1. bonding: bond0: making interface eth1 the new active one. bonding: bond0: Removing slave eth0 bonding: bond0: Warning: the permanent HWaddr of eth0 - 00:21:9B:8F:D3:BC - is still in use by bond0. Set the HWaddr of eth0 to a different address to avoid conflicts. bonding: bond0: releasing backup interface eth0 bonding: bond0: Removing slave eth1 bonding: bond0: releasing active interface eth1 bonding: bond0: Adding slave eth0. bnx2: eth0: using MSIX bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: enslaving eth0 as a backup interface with a down link. bnx2i: iSCSI not supported, dev=eth0 bonding: bond0: Adding slave eth1. bnx2: eth1: using MSIX bnx2i: iSCSI not supported, dev=eth1 bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: enslaving eth1 as a backup interface with a down link. bonding: bond0: link status definitely up for interface eth0. bonding: bond0: making interface eth0 the new active one. bonding: bond0: link status definitely up for interface eth1. bnx2i: iSCSI not supported, dev=eth1 bonding: bond0: link status definitely down for interface eth1, disabling it bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth1. bonding: bond0: making interface eth1 the new active one. ===========================================================================================================================================
============================================================================================ [root(a)S ~]# modinfo e1000 | more filename: /lib/modules/2.6.18-164.6.1.el5/kernel/drivers/net/e1000/e1000.ko version: 7.3.20-k2-NAPI license: GPL description: Intel(R) PRO/1000 Network Driver author: Intel Corporation, <linux.nics(a)intel.com> srcversion: 26DD82C709EB760C93D4103 alias: pci:v00008086d00001000sv*sd*bc*sc*i* depends: vermagic: 2.6.18-164.6.1.el5 SMP mod_unload gcc-4.1 parm: TxDescriptors:Number of transmit descriptors (array of int) parm: TxDescPower:Binary exponential size (2^X) of each transmit descriptor (array of int) parm: RxDescriptors:Number of receive descriptors (array of int) parm: Speed:Speed setting (array of int) parm: Duplex:Duplex setting (array of int) parm: AutoNeg:Advertised auto-negotiation setting (array of int) parm: FlowControl:Flow Control setting (array of int) parm: XsumRX:Disable or enable Receive Checksum offload (array of int) parm: TxIntDelay:Transmit Interrupt Delay (array of int) parm: TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int) parm: RxIntDelay:Receive Interrupt Delay (array of int) parm: RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int) parm: InterruptThrottleRate:Interrupt Throttling Rate (array of int) parm: SmartPowerDownEnable:Enable PHY smart power down (array of int) parm: KumeranLockLoss:Enable Kumeran lock loss workaround (array of int) parm: copybreak:Maximum size of packet that is copied to a new buffer on receive (uint) parm: debug:Debug level (0=none,...,16=all) (int) =================================================================================================
[root@ ~]# ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: d
============================================= root(a)S ~]# ethtool -i eth0 driver: bnx2 version: 1.9.3 firmware-version: 4.6.4 NCSI 1.0.6 bus-info: 0000:01:00.0 [========================= [root@ ~]# ethtool -i eth1 driver: bnx2 version: 1.9.3 firmware-version: 4.6.4 NCSI 1.0.6 bus-info: 0000:01:00.1 ================================ [root(a)S ~]# ethtool -i bond0 driver: bonding version: 3.4.0 firmware-version: 2 bus-info: ===================================
participants (1)
-
Narender