Primary module parameter passed to bonding is pernament. That means if you
release the primary slave and enslave it again, it becomes the primary slave
again. But if you set primary slave via sysfs, the primary slave is only set
once and it's not remembered in bond->params structure. Therefore the setting is
lost after releasing the primary slave. This simple one-liner fixes this.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I was implementing primary_passive option (formely named primary_lazy) I've
run into troubles with ab_arp. This is the only mode which is not using
bond_select_active_slave() function to select active slave and instead it
selects it itself. This seems to be not the right behaviour and it would be
better to do it in bond_select_active_slave() for all cases. This patch makes
this happen. Please review.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes commit e36b9d16c6. The approach
there is to call dev_close()/dev_open() whenever the device type is changed in
order to remap the device IP multicast addresses to HW multicast addresses.
This approach suffers from 2 drawbacks:
*. It assumes tha the device is UP when calling dev_close(), or otherwise
dev_close() has no affect. It is worth to mention that initscripts (Redhat)
and sysconfig (Suse) doesn't act the same in this matter.
*. dev_close() has other side affects, like deleting entries from the routing
table, which might be unnecessary.
The fix here is to directly remap the IP multicast addresses to HW multicast
addresses for a bonding device that changes its type, and nothing else.
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can speedup ether addresses compares using compare_ether_addr_64bits()
instead of memcmp(). We make sure all operands are at least 8 bytes long and
16bits aligned (or better, long word aligned if possible)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are all drivers that don't touch real hardware.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: Have bond_check_dev_link examine netif_running
Some network devices do not call netif_carrier_off when they
are set administratively down. Have the bonding link check function
also inspect the netif_running state. Ignore netif_running if the
bond_check_dev_link function is called with "reporting" set, as in that
case it's inspecting the capabilities of the non-netif_carrier device
driver.
Signed-off-by: Petri Gynther <pgynther@google.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
max_bonds is of type int and cannot be greater than INT_MAX.
Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bonding can use compare_ether_addr() in bond_release.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Propogate the vlan_features of the slave devices to the bonding
master device, using the same logic as for regular features.
Tested by Or Gerlitz <ogerlitz@voltaire.com>, who also removed
the debug logic from the original test patch.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I did not introduce new lines over 80 chars. I even eliminated some of
them.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bonding device forbids slave device of different types under the same
master.
However, it is possible for a bonding master to change type during its
lifetime. This can be either from ARPHRD_ETHER to ARPHRD_INFINIBAND
or the other way arround. The change of type requires device level
multicast address cleanup because device level multicast addresses
depend on the device type.
The patch adds a call to dev_close() before the bonding master changes
type and dev_open() just after that.
In the example below I enslaved an IPoIB device (ib0) under
bond0. Since each bonding master starts as device of type ARPHRD_ETHER
by default, a change of type occurs when ib0 is enslaved.
This is how /proc/net/dev_mcast looks like without the patch
5 bond0 1 0 00ffffffff12601bffff000000000001ff96ca05
5 bond0 1 0 01005e000116
5 bond0 1 0 01005e7ffffd
5 bond0 1 0 01005e000001
5 bond0 1 0 333300000001
6 ib0 1 0 00ffffffff12601bffff000000000001ff96ca05
6 ib0 1 0 333300000001
6 ib0 1 0 01005e000001
6 ib0 1 0 01005e7ffffd
6 ib0 1 0 01005e000116
6 ib0 1 0 00ffffffff12401bffff00000000000000000001
6 ib0 1 0 00ffffffff12601bffff00000000000000000001
and this is how it looks like after the patch.
5 bond0 1 0 00ffffffff12601bffff000000000001ff96ca05
5 bond0 1 0 00ffffffff12601bffff00000000000000000001
5 bond0 1 0 00ffffffff12401bffff0000000000000ffffffd
5 bond0 1 0 00ffffffff12401bffff00000000000000000116
5 bond0 1 0 00ffffffff12401bffff00000000000000000001
6 ib0 1 0 00ffffffff12601bffff000000000001ff96ca05
6 ib0 1 0 00ffffffff12401bffff00000000000000000116
6 ib0 1 0 00ffffffff12401bffff0000000000000ffffffd
6 ib0 2 0 00ffffffff12401bffff00000000000000000001
6 ib0 2 0 00ffffffff12601bffff00000000000000000001
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AD_SHORT_TIMEOUT and AD_STATE_LACP_ACTIVITY have the same value, but
AD_STATE_LACP_ACTIVITY better reflects the intended semantics.
[ J adds: AD_STATE_LACP_ACTIVITY is a value defined by the standard, and
should be set here in accordance with 802.3ad 43.4.12; AD_SHORT_TIMEOUT
is a constant specific to the Linux 802.3ad implementation that happens
to have the same value ]
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
struct port_params p;
@@
* p.port_state |= AD_SHORT_TIMEOUT
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts the remaining occurences of raw return values to their
symbolic counterparts in ndo_start_xmit() functions that were missed by the
previous automatic conversion.
Additionally code that assumed the symbolic value of NETDEV_TX_OK to be zero
is changed to explicitly use NETDEV_TX_OK.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Need to rework how bonding devices are initialized to make it more
amenable to creating bonding devices via netlink.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove bogus non-portable possibly unaligned way of testing
for zero addres..
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bonding device acts unlike all other Linux network device functions
in that it ignores case of device names. The developer must have come
from windows!
Cleanup the management of names and use standard routines where possible.
Flag places where bonding device still doesn't work right with network
namespaces.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "expected_refcount" stuff in bonding sysfs module is a mistake.
Sysfs does proper refcounting, and it is okay to remove a bond device
that has some user process holding the file open.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resolve some of the complaints from checkpatch, and remove "magic emacs format"
comments, and useless MODULE_SUPPORTED_DEVICE(). But should not
change actual code.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not safe to use a network device destructor that is a function in
the module, since it can be called after module is unloaded if sysfs
handle is open.
When eventually using netlink, the device cleanup code needs to be done
via uninit function.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The whole read/write semaphore locking can be removed. It doesn't add any
protection that isn't already done by using the RTNL mutex properly.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid a unnecessary carrier state transistion that happens when device
is registered.
Lockdep works better if initialization is done before registration as well.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_create() is always called with same parameters so move the argument
down.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some users still load bond module multiple times to create bonding
devices. This accidentally was broken by a later patch about
the time sysfs was fixed. According to Jay, it was broken
by:
commit b8a9787edd
Author: Jay Vosburgh <fubar@us.ibm.com>
Date: Fri Jun 13 18:12:04 2008 -0700
bonding: Allow setting max_bonds to zero
Note: sysfs and procfs still produce WARN() messages when this is done
so the sysfs method is the recommended API.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).
CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.
It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.
David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().
List of devices that must clear this flag is :
- loopback device, because it calls netif_rx() and quoting Patrick :
"ip_route_input() doesn't accept loopback addresses, so loopback packets
already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function
- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sysfs files for a network device can not unconditionally take the
rtnl_lock as the bonding sysfs files do. If someone accesses those
sysfs files while the network device is being unregistered with the
rtnl_lock held we will deadlock.
So use trylock and restart_syscall to avoid this problem.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the purposes of bonding is to allow for redundant links, and failover
correctly if the cable is pulled. If all the members of a bonded device have
no carrier present, the bonded device itself needs to report no carrier present
to user space so management tools (like routing daemons) can respond.
Bonding in 802.3ad mode does not work correctly for this because it incorrectly
chooses a link that is down as a possible aggregator.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct net_device trans_start field is a hot spot on SMP and high performance
devices, particularly multi queues ones, because every transmitter dirties
it. Is main use is tx watchdog and bonding alive checks.
But as most devices dont use NETIF_F_LLTX, we have to lock
a netdev_queue before calling their ndo_start_xmit(). So it makes
sense to move trans_start from net_device to netdev_queue. Its update
will occur on a already present (and in exclusive state) cache line, for
free.
We can do this transition smoothly. An old driver continue to
update dev->trans_start, while an updated one updates txq->trans_start.
Further patches could also put tx_bytes/tx_packets counters in
netdev_queue to avoid dirtying dev->stats (vlan device comes to mind)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch fixes issues with dev->dev_addr changing from array to pointer.
Hopefully there are no others.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If module initialisation failed (e.g. because the bonding sysfs entry
cannot be created), kernel panics:
IP: [<ffffffff8024910a>] destroy_workqueue+0x2d/0x146
Call Trace:
[<ffffffff808268c4>] bond_destructor+0x28/0x78
[<ffffffff80b64471>] netdev_run_todo+0x231/0x25a
[<ffffffff80b6dbcd>] rtnl_unlock+0x9/0xb
[<ffffffff81567907>] bonding_init+0x83e/0x84a
Remove the calls to bond_work_cancel_all() and destroy_workqueue();
both are also called/scheduled via bond_free_all().
bond_destroy_sysfs is unecessary because the sysfs entry has
not been created in the error case.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ETH_P_SLOW is already defined in include/linux/if_ether.h.
There's no need to define BOND_ETH_P_LACPDU in drivers/net/bonding/bond_3ad.h
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove CONFIG_PROC_FS ifdefs from the code by adding void functions.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bonding/bond_main.c | 30 ++++++++++++++++++++----------
1 files changed, 20 insertions(+), 10 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix locking issue in alb MAC address management; removed
incorrect locking and replaced with correct locking. This bug was
introduced in commit 059fe7a578
("bonding: Convert locks to _bh, rework alb locking for new locking")
Bug reported by Paul Smith <paul@mad-scientist.net>, who also
tested the fix.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the cleanup in bond_create nicer :) Also now the forgotten
free_netdev is called.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_slave_info_query() should keep a read lock while accessing slave info,
or risk accessing stale data and corruption.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pointed out by Sean E. Millichamp.
Quote from Documentation/networking/bonding.txt:
"Note that when a bonding interface has no active links, the
driver will immediately reuse the first link that goes up, even if the
updelay parameter has been specified (the updelay is ignored in this
case). If there are slave interfaces waiting for the updelay timeout
to expire, the interface that first went into that state will be
immediately reused. This reduces down time of the network if the
value of updelay has been overestimated, and since this occurs only in
cases with no connectivity, there is no additional penalty for
ignoring the updelay."
This patch actually changes the behaviour in this way.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bonding/bond_main.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch only changes the order of interfaces to use for checking slave link
status in bond_check_dev_link() to priorize ethtool interface. Should safe some
troubles as ethtool seems to be more supported.
Jirka
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bonding/bond_main.c | 26 ++++++++++++--------------
1 files changed, 12 insertions(+), 14 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove debug printk I accidently left in as part of commit:
commit 6146b1a4da
Author: Jay Vosburgh <fubar@us.ibm.com>
Date: Tue Nov 4 17:51:15 2008 -0800
bonding: Fix ALB mode to balance traffic on VLANs
Reported by Duncan Gibb <duncan.gibb@siriusit.co.uk>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a zero address hole bug in the bonding arp_ip_target list
that was causing the bond to ignore ARP replies (bugz 13006).
Instead of just setting the array entry to zero, we now
copy any additional entries down one slot, putting the
zero entry at the end. With this change we can now have
all the loops that walk the array stop when they hit a zero
since there will be no addresses after it.
Changes are based in part on code fragment provided in kernel:
bugzilla 13006:
http://bugzilla.kernel.org/show_bug.cgi?id=13006
by Steve Howard <steve@astutenetworks.com>
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.
We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.
But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.
->read_proc/->write_proc were just fixed to not require ->owner for
protection.
rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.
Removing ->owner will also make PDE smaller.
So, let's nuke it.
Kudos to Jeff Layton for reminding about this, let's say, oversight.
http://bugzilla.kernel.org/show_bug.cgi?id=12454
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
I've hit an issue on my system when I've been using RealTek RTL8139D cards in
bonding interface in mode balancing-alb. When I enslave a card, the current
active slave (bond->curr_active_slave) is not set and the link is therefore
not functional.
----
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: None
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:1f:01:2f:22
----
The thing that gets it right is when I unplug the cable and then I put it back
into the NIC. Then the current active slave is set to eth1 and link is working
just fine. Here is dmesg log with bonding DEBUG messages turned on:
----
ADDRCONF(NETDEV_UP): bond0: link is not ready
event_dev: bond0, event: 1
IFF_MASTER
event_dev: bond0, event: 8
IFF_MASTER
bond_ioctl: master=bond0, cmd=35216
slave_dev=cac5d800:
slave_dev->name=eth1:
eth1: ! NETIF_F_VLAN_CHALLENGED
event_dev: eth1, event: 8
eth1: link up, 100Mbps, full-duplex, lpa 0xC5E1
event_dev: eth1, event: 1
event_dev: eth1, event: 8
IFF_SLAVE
Initial state of slave_dev is BOND_LINK_UP
bonding: bond0: enslaving eth1 as an active interface with an up link.
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
event_dev: bond0, event: 4
IFF_MASTER
bond0: no IPv6 routers present
<<<<cable unplug>>>>
eth1: link down
event_dev: eth1, event: 4
IFF_SLAVE
bonding: bond0: link status definitely down for interface eth1, disabling it
event_dev: bond0, event: 4
IFF_MASTER
<<<<cable plug>>>>
eth1: link up, 100Mbps, full-duplex, lpa 0xC5E1
event_dev: eth1, event: 4
IFF_SLAVE
bonding: bond0: link status definitely up for interface eth1.
bonding: bond0: making interface eth1 the new active one.
event_dev: eth1, event: 8
IFF_SLAVE
event_dev: eth1, event: 8
IFF_SLAVE
bonding: bond0: first active interface up!
event_dev: bond0, event: 4
IFF_MASTER
----
The current active slave is set by calling bond_select_active_slave() function
from bond_miimon_commit() function when the slave (eth1) link goes to state up.
I also tested this on other machine with Broadcom NetXtreme II BCM5708
1000Base-T NIC and there all works fine. The thing is that this adapter is down
and goes up after few seconds after it is enslaved.
This patch calls bond_select_active_slave() in bond_enslave() function for modes
alb and tlb and makes sure that the current active slave is set up properly even
when the slave state is already up. Tested on both systems, works fine.
Notice: The same problem can maybe also occrur in mode 8023AD but I'm unable to
test that.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects an omission from the following commit:
commit f0c76d6177
Author: Jay Vosburgh <fubar@us.ibm.com>
Date: Wed Jul 2 18:21:58 2008 -0700
bonding: refactor mii monitor
The un-refactored code checked the link speed and duplex of
every slave on every pass; the refactored code did not do so.
The 802.3ad and balance-alb/tlb modes utilize the speed and
duplex information, and require it to be kept up to date. This patch
adds a notifier check to perform the appropriate updating when the slave
device speed changes.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>