Commit Graph

37885 Commits

Author SHA1 Message Date
Johannes Berg
30686bf7f5 mac80211: convert HW flags to unsigned long bitmap
As we're running out of hardware capability flags pretty quickly,
convert them to use the regular test_bit() style unsigned long
bitmaps.

This introduces a number of helper functions/macros to set and to
test the bits, along with new debugfs code.

The occurrences of an explicit __clear_bit() are intentional, the
drivers were never supposed to change their supported bits on the
fly. We should investigate changing this to be a per-frame flag.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-10 16:05:36 +02:00
Johannes Berg
206c59d1d7 Merge remote-tracking branch 'net-next/master' into mac80211-next
Merge back net-next to get wireless driver changes (from Kalle)
to be able to create the API change across all trees properly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-10 12:45:09 +02:00
Alexis Green
5ec596c41b mac80211: Fix a case of incorrect metric used when forwarding a PREQ
This patch fixes a bug in hwmp_preq_frame_process where the wrong metric
can be used when forwarding a PREQ. This happens because the code uses
the same metric variable to record the value of the metric to the source
of the PREQ and the value of the metric to the target of the PREQ.

This comes into play when both reply and forward are set which happens
when IEEE80211_PREQ_PROACTIVE_PREP_FLAG is set and when MP_F_DO | MP_F_RF
is set. The original code had a special case to handle the first case
but not the second.

The patch uses distinct variables for the two metrics which makes the
code flow much clearer and removes the need to restore the original
value of metric when forwarding.

Signed-off-by: Alexis Green <agreen@cococorp.com>
CC: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-10 11:52:59 +02:00
David S. Miller
c3eee1fb1d Included changes:
- use common Jenkins hash instead of private implementation
 - extend internal routing API
 - properly re-arrange header files inclusion
 - clarify precedence between '&' and '?'
 - remove unused ethhdr variable in batadv_gw_dhcp_recipient_get()
 - ensure per-VLAN structs are updated upon MAC change
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVdaxwAAoJEOb/4TMchkvffJsQAK33RTmcnFYVpS09AywTbxfC
 kNo5SAbQJ1MsNui8TmyHsnLBW/l59WnP8CITVYDlwLyW4cEKWVooGsFPv+MCityM
 3vOqghdi7jC1b0LT6kTcoSr7P04q+bBV9atj5Z5nnXUpvjmW0b/a6qCIZmqGKTtQ
 WIIayEs0bVPJ6y1Z6CDpFQY6k9j938BaH3VgP7NL4zXRA3OJGGQKaZTRwVuYnVu4
 Swt/zj+B7lutBSP2d4ngjU2CXS35OzDWl4GybQuIiNzdSXsZwm4inTyuNjA4Eqdg
 vE87ZqbFqD4T2PqR1wcTPKMJegd+nPQrYDdeNfz91KdjVKBN+LrdSCX3IozEynpY
 7584+eROCyCKlQGi7OcRze5THbZ+BNKxUd3Ds4ek2NTU7vG2rMWADFj3wZ0yzX6F
 e/JunQ39BHXPSc6UlaTgdYiVpTNME9+A2I0AXMOpU6urpc4fAyFVGUORJqjM8BBm
 jLZ8QakmXdfgBYi7zqefeCW9de9L1ekGQs/XB1AmKmx5xY7IKzKEZbkUV3nAxtqI
 YKP2yQUhqjoke6bzllz72CPChO4O11fYPCDPL0sarn/UxY8SzwmHeKi8UeGaN6gt
 FZhR17kQGm4c2WF7082HEIrOW137YogEOFWMixqpqix5ICBmmRMPqo8wqQBxllk3
 mC+4jiwdn4kT1EVAqHba
 =u+G3
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
Included changes:
- use common Jenkins hash instead of private implementation
- extend internal routing API
- properly re-arrange header files inclusion
- clarify precedence between '&' and '?'
- remove unused ethhdr variable in batadv_gw_dhcp_recipient_get()
- ensure per-VLAN structs are updated upon MAC change
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-09 20:23:52 -07:00
Alexis Green
e060e7adc2 mac80211: Always check rates and capabilities in mesh mode
In mesh mode there is a race between establishing links and processing
rates and capabilities in beacons. This is very noticeable with slow
beacons (e.g. beacon intervals of 1s) and manifested for us as stations
using minstrel when minstrel_ht should be used. Fixed by changing
mesh_sta_info_init so that it always checks rates and such if it has not
already done so.

Signed-off-by: Alexis Green <agreen@cococorp.com>
CC: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 22:05:06 +02:00
Chun-Yeow Yeoh
8df734e865 mac80211: fix the beacon csa counter for mesh and ibss
The csa counter has moved from sdata to beacon/presp but
it is not updated accordingly for mesh and ibss. Fix this.

Fixes: af296bdb8d ("mac80211: move csa counters from sdata to beacon/presp")
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 22:04:25 +02:00
Alexis Green
afd2efb919 mac80211: Fix incorrectly named last_hop_metric variable in mesh_rx_path_sel_frame
The last hop metric should refer to link cost (this is how
hwmp_route_info_get uses it for example). But in mesh_rx_path_sel_frame
we are not dealing with link cost but with the total cost to the origin
of a PREQ or PREP.

Signed-off-by: Alexis Green <agreen@cococorp.com>
CC: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 21:53:35 +02:00
Sara Sharon
74d803b602 mac80211: ignore invalid scan RSSI values
Channels in 2.4GHz band overlap, this means that if we
send a probe request on channel 1 and then move to channel
2, we will hear the probe response on channel 2. In this
case, the RSSI will be lower than if we had heard it on
the channel on which it was sent (1 in this case).

The scan result ignores those invalid values and the
station last signal should not be updated as well.

In case the scan determines the signal to be invalid turn on
the flag so the station last signal will not be updated with
the value and thus user space probing for NL80211_STA_INFO_SIGNAL
and NL80211_STA_INFO_SIGNAL_AVG will not get this invalid RSSI
value.

Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 21:51:59 +02:00
Michal Kazior
d01f858c78 mac80211: release channel on auth failure
There were a few rare cases when upon
authentication failure channel wasn't released.
This could cause stale pointers to remain in
chanctx assigned_vifs after interface removal and
trigger general protection fault later.

This could be triggered, e.g. on ath10k with the
following steps:

 1. start an AP
 2. create 2 extra vifs on ath10k host
 3. connect vif1 to the AP
 4. connect vif2 to the AP
    (auth fails because ath10k firmware isn't able
     to maintain 2 peers with colliding AP mac
     addresses across vifs and consequently
     refuses sta_info_insert() in
     ieee80211_prep_connection())
 5. remove the 2 extra vifs
 6. goto step 2; at step 3 kernel was crashing:

 general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
 Modules linked in: ath10k_pci ath10k_core ath
 ...
 Call Trace:
  [<ffffffff81a2dabb>] ieee80211_check_combinations+0x22b/0x290
  [<ffffffff819fb825>] ? ieee80211_check_concurrent_iface+0x125/0x220
  [<ffffffff8180f664>] ? netpoll_poll_disable+0x84/0x100
  [<ffffffff819fb833>] ieee80211_check_concurrent_iface+0x133/0x220
  [<ffffffff81a0029e>] ieee80211_open+0x3e/0x80
  [<ffffffff817f2d26>] __dev_open+0xb6/0x130
  [<ffffffff817f3051>] __dev_change_flags+0xa1/0x170
 ...
 RIP  [<ffffffff81a23140>] ieee80211_chanctx_radar_detect+0xa0/0x170

 (gdb) l * ieee80211_chanctx_radar_detect+0xa0
 0xffffffff81a23140 is in ieee80211_chanctx_radar_detect (/devel/src/linux/net/mac80211/util.c:3182).
 3177             */
 3178            WARN_ON(ctx->replace_state == IEEE80211_CHANCTX_REPLACES_OTHER &&
 3179                    !list_empty(&ctx->assigned_vifs));
 3180
 3181            list_for_each_entry(sdata, &ctx->assigned_vifs, assigned_chanctx_list)
 3182                    if (sdata->radar_required)
 3183                            radar_detect |= BIT(sdata->vif.bss_conf.chandef.width);
 3184
 3185            return radar_detect;

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 21:51:01 +02:00
Johannes Berg
472be00d04 mac80211: handle aggregation session timeout on fast-xmit path
The conversion to the fast-xmit path lost proper aggregation session
timeout handling - the last_tx wasn't set on that path and the timer
would therefore incorrectly tear down the session periodically (with
those drivers/rate control algorithms that have a timeout.)

In case of iwlwifi, this was every 5 seconds and caused significant
throughput degradation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-06-09 21:38:40 +02:00
David S. Miller
941742f497 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-08 20:06:56 -07:00
Willem de Bruijn
bbbf2df003 net: replace last open coded skb_orphan_frags with function call
Commit 70008aa50e ("skbuff: convert to skb_orphan_frags") replaced
open coded tests of SKBTX_DEV_ZEROCOPY and skb_copy_ubufs with calls
to helper function skb_orphan_frags. Apply that to the last remaining
open coded site.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-08 12:15:13 -07:00
Josh Hunt
0243508edd ipv6: Fix protocol resubmission
UDP encapsulation is broken on IPv6. This is because the logic to resubmit
the nexthdr is inverted, checking for a ret value > 0 instead of < 0. Also,
the resubmit label is in the wrong position since we already get the
nexthdr value when performing decapsulation. In addition the skb pull is no
longer necessary either.

This changes the return value check to look for < 0, using it for the
nexthdr on the next iteration, and moves the resubmit label to the proper
location.

With these changes the v6 code now matches what we do in the v4 ip input
code wrt resubmitting when decapsulating.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Acked-by: "Tom Herbert" <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-08 12:13:17 -07:00
Robert Shearman
27e41fcfa6 ipv6: fix possible use after free of dev stats
The memory pointed to by idev->stats.icmpv6msgdev,
idev->stats.icmpv6dev and idev->stats.ipv6 can each be used in an RCU
read context without taking a reference on idev. For example, through
IP6_*_STATS_* calls in ip6_rcv. These memory blocks are freed without
waiting for an RCU grace period to elapse. This could lead to the
memory being written to after it has been freed.

Fix this by using call_rcu to free the memory used for stats, as well
as idev after an RCU grace period has elapsed.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-08 12:12:45 -07:00
Firo Yang
f38b24c905 fib_trie: coding style: Use pointer after check
As Alexander Duyck pointed out that:
struct tnode {
        ...
        struct key_vector kv[1];
}
The kv[1] member of struct tnode is an arry that refernced by
a null pointer will not crash the system, like this:
struct tnode *p = NULL;
struct key_vector *kv = p->kv;
As such p->kv doesn't actually dereference anything, it is simply a
means for getting the offset to the array from the pointer p.

This patch make the code more regular to avoid making people feel
odd when they look at the code.

Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 23:45:40 -07:00
Nikolay Aleksandrov
c4c832f89d bridge: disable softirqs around br_fdb_update to avoid lockup
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to disable softirqs because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. The spin locks in br_fdb_update were _bh before commit f8ae737dee
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d139898 ("bridge: add NTF_USE support")
Using local_bh_disable/enable around br_fdb_update() allows us to keep
using the spin_lock/unlock in br_fdb_update for the fast-path.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d139898 ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 19:44:13 -07:00
David S. Miller
7ff46e79fb Revert "bridge: use _bh spinlock variant for br_fdb_update to avoid lockup"
This reverts commit 1d7c49037b.

Nikolay Aleksandrov has a better version of this fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 19:43:47 -07:00
Robert Shearman
25cc8f0763 mpls: fix possible use after free of device
The mpls device is used in an RCU read context without a lock being
held. As the memory is freed without waiting for the RCU grace period
to elapse, the freed memory could still be in use.

Address this by using kfree_rcu to free the memory for the mpls device
after the RCU grace period has elapsed.

Fixes: 03c57747a7 ("mpls: Per-device MPLS state")
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 19:37:27 -07:00
Wilson Kok
1d7c49037b bridge: use _bh spinlock variant for br_fdb_update to avoid lockup
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to use spin_lock_bh because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. These locks were _bh before commit f8ae737dee
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d139898 ("bridge: add NTF_USE support")

Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d139898 ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 15:24:54 -07:00
Eric Dumazet
b80c0e7858 tcp: get_cookie_sock() consolidation
IPv4 and IPv6 share same implementation of get_cookie_sock(),
and there is no point inlining it.

We add tcp_ prefix to the common helper name and export it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 15:19:52 -07:00
Antonio Quartulli
94d1dd8731 batman-adv: change the MAC of each VLAN upon ndo_set_mac_address
The MAC address of the soft-interface is used to initialise
the "non-purge" TT entry of each existing VLAN. Therefore
when the user invokes ndo_set_mac_address() all the
"non-purge" TT entries have to be updated, not only the one
belonging to the non-tagged network.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:20 +02:00
Sven Eckelmann
7dac6d9391 batman-adv: Remove unused post-VLAN ethhdr in batadv_gw_dhcp_recipient_get
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:20 +02:00
Sven Eckelmann
a2f2b6cd41 batman-adv: Clarify calculation precedence for '&' and '?'
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:19 +02:00
Sven Eckelmann
1e2c2a4fe4 batman-adv: Add required includes to all files
The header files could not be build indepdent from each other. This is
happened because headers didn't include the files for things they've used.
This was problematic because the success of a build depended on the
knowledge about the right order of local includes.

Also source files were not including everything they've used explicitly.
Instead they required that transitive includes are always stable. This is
problematic because some transitive includes are not obvious, depend on
config settings and may not be stable in the future.

The order for include blocks are:

 * primary headers (main.h and the *.h file of a *.c file)
 * global linux headers
 * required local headers
 * extra forward declarations for pointers in function/struct declarations

The only exceptions are linux/bitops.h and linux/if_ether.h in packet.h.
This header file is shared with userspace applications like batctl and must
therefore build together with userspace applications. The header
linux/bitops.h is not part of the uapi headers and linux/if_ether.h
conflicts with the musl implementation of netinet/if_ether.h. The
maintainers rejected the use of __KERNEL__ preprocessor checks and thus
these two headers are only in main.h. All files using packet.h first have
to include main.h to work correctly.

Reported-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:19 +02:00
Antonio Quartulli
bcef1f3c49 batman-adv: add bat_neigh_free API
This API has to be used to let any routing protocol free
neighbor specific allocated resources

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:18 +02:00
Antonio Quartulli
6f70eb7552 batman-adv: split name from variable for uint mesh attributes
Some mesh attributes are behind substructs in the
batadv_priv object and for this reason the name cannot be
used anymore to refer to them.

This patch allows to specify the variable name where the
attribute is stored inside batadv_priv instead of using the
name

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:18 +02:00
Sven Eckelmann
36fd61cb80 batman-adv: Use common Jenkins Hash implementation
An unoptimized version of the Jenkins one-at-a-time hash function is used
and partially copied all over the code wherever an hashtable is used.
Instead the optimized version shared between the whole kernel should be
used to reduce code duplication and use better optimized code.

Only the DAT code must use the old implementation because it is used as
distributed hash function which has to be common for all nodes.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-07 17:07:17 +02:00
Alexei Starovoitov
d691f9e8d4 bpf: allow programs to write to certain skb fields
allow programs read/write skb->mark, tc_index fields and
((struct qdisc_skb_cb *)cb)->data.

mark and tc_index are generically useful in TC.
cb[0]-cb[4] are primarily used to pass arguments from one
program to another called via bpf_tail_call() which can
be seen in sockex3_kern.c example.

All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
mark, tc_index are writeable from tc_cls_act only.
cb[0]-cb[4] are writeable by both sockets and tc_cls_act.

Add verifier tests and improve sample code.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 02:01:33 -07:00
Alexei Starovoitov
3431205e03 bpf: make programs see skb->data == L2 for ingress and egress
eBPF programs attached to ingress and egress qdiscs see inconsistent skb->data.
For ingress L2 header is already pulled, whereas for egress it's present.
This is known to program writers which are currently forced to use
BPF_LL_OFF workaround.
Since programs don't change skb internal pointers it is safe to do
pull/push right around invocation of the program and earlier taps and
later pt->func() will not be affected.
Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
around run_filter/BPF_PROG_RUN even if skb_shared.

This fix finally allows programs to use optimized LD_ABS/IND instructions
without BPF_LL_OFF for higher performance.
tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
       w/o JIT   w/JIT
before  20.5     23.6 Mpps
after   21.8     26.6 Mpps

Old programs with BPF_LL_OFF will still work as-is.

We can now undo most of the earlier workaround commit:
a166151cbe ("bpf: fix bpf helpers to use skb->mac_header relative offsets")

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 02:01:33 -07:00
Eric Dumazet
98da81a426 tcp: remove redundant checks II
For same reasons than in commit 12e25e1041 ("tcp: remove redundant
checks"), we can remove redundant checks done for timewait sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 01:55:01 -07:00
Eric Dumazet
90c337da15 inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations
When an application needs to force a source IP on an active TCP socket
it has to use bind(IP, port=x).

As most applications do not want to deal with already used ports, x is
often set to 0, meaning the kernel is in charge to find an available
port.
But kernel does not know yet if this socket is going to be a listener or
be connected.
It has very limited choices (no full knowledge of final 4-tuple for a
connect())

With limited ephemeral port range (about 32K ports), it is very easy to
fill the space.

This patch adds a new SOL_IP socket option, asking kernel to ignore
the 0 port provided by application in bind(IP, port=0) and only
remember the given IP address.

The port will be automatically chosen at connect() time, in a way
that allows sharing a source port as long as the 4-tuples are unique.

This new feature is available for both IPv4 and IPv6 (Thanks Neal)

Tested:

Wrote a test program and checked its behavior on IPv4 and IPv6.

strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
connect().
Also getsockname() show that the port is still 0 right after bind()
but properly allocated after connect().

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0

IPv6 test :

socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0

I was able to bind()/connect() a million concurrent IPv4 sockets,
instead of ~32000 before patch.

lpaa23:~# ulimit -n 1000010
lpaa23:~# ./bind --connect --num-flows=1000000 &
1000000 sockets

lpaa23:~# grep TCP /proc/net/sockstat
TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66

Check that a given source port is indeed used by many different
connections :

lpaa23:~# ss -t src :40000 | head -10
State      Recv-Q Send-Q   Local Address:Port          Peer Address:Port
ESTAB      0      0           127.0.0.2:40000         127.0.202.33:44983
ESTAB      0      0           127.0.0.2:40000         127.2.27.240:44983
ESTAB      0      0           127.0.0.2:40000           127.2.98.5:44983
ESTAB      0      0           127.0.0.2:40000        127.0.124.196:44983
ESTAB      0      0           127.0.0.2:40000         127.2.139.38:44983
ESTAB      0      0           127.0.0.2:40000          127.1.59.80:44983
ESTAB      0      0           127.0.0.2:40000          127.3.6.228:44983
ESTAB      0      0           127.0.0.2:40000          127.0.38.53:44983
ESTAB      0      0           127.0.0.2:40000         127.1.197.10:44983

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-06 23:57:12 -07:00
Tom Herbert
b3baa0fbd0 mpls: Add MPLS entropy label in flow_keys
In flow dissector if an MPLS header contains an entropy label this is
saved in the new keyid field of flow_keys. The entropy label is
then represented in the flow hash function input.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
1fdd512c92 net: Add GRE keyid in flow_keys
In flow dissector if a GRE header contains a keyid this is saved in the
new keyid field of flow_keys. The GRE keyid is then represented
in the flow hash function input.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
87ee9e52ff net: Add IPv6 flow label to flow_keys
In flow_dissector set the flow label in flow_keys for IPv6. This also
removes the shortcircuiting of flow dissection when a non-zero label
is present, the flow label can be considered to provide additional
entropy for a hash.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
d34af823ff net: Add VLAN ID to flow_keys
In flow_dissector set vlan_id in flow_keys when VLAN is found.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
45b47fd00c net: Get rid of IPv6 hash addresses flow keys
We don't need to return the IPv6 address hash as part of flow keys.
In general, using the IPv6 address hash is risky in a hash value
since the underlying use of xor provides no entropy. If someone
really needs the hash value they can get it from the full IPv6
addresses in flow keys (e.g. from flow_get_u32_src).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
9f24908901 net: Add keys for TIPC address
Add a new flow key for TIPC addresses.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:31 -07:00
Tom Herbert
c3f8324188 net: Add full IPv6 addresses to flow_keys
This patch adds full IPv6 addresses into flow_keys and uses them as
input to the flow hash function. The implementation supports either
IPv4 or IPv6 addresses in a union, and selector is used to determine
how may words to input to jhash2.

We also add flow_get_u32_dst and flow_get_u32_src functions which are
used to get a u32 representation of the source and destination
addresses. For IPv6, ipv6_addr_hash is called. These functions retain
getting the legacy values of src and dst in flow_keys.

With this patch, Ethertype and IP protocol are now included in the
flow hash input.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:30 -07:00
Tom Herbert
42aecaa9bb net: Get skb hash over flow_keys structure
This patch changes flow hashing to use jhash2 over the flow_keys
structure instead just doing jhash_3words over src, dst, and ports.
This method will allow us take more input into the hashing function
so that we can include full IPv6 addresses, VLAN, flow labels etc.
without needing to resort to xor'ing which makes for a poor hash.

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:30 -07:00
Tom Herbert
c468efe2c7 net: Remove superfluous setting of key_basic
key_basic is set twice in __skb_flow_dissect which seems unnecessary.
Remove second one.

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:30 -07:00
Tom Herbert
ce3b535547 net: Simplify GRE case in flow_dissector
Do break when we see routing flag or a non-zero version number in GRE
header.

Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:30 -07:00
Alexei Starovoitov
94db13fe5f bpf: fix build due to missing tc_verd
fix build error:
net/core/filter.c: In function 'bpf_clone_redirect':
net/core/filter.c:1429:18: error: 'struct sk_buff' has no member named 'tc_verd'
  if (G_TC_AT(skb2->tc_verd) & AT_INGRESS)

Fixes: 3896d655f4 ("bpf: introduce bpf_clone_redirect() helper")
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 11:45:59 -07:00
Wei Liu
c39c4c6abb tcp: double default TSQ output bytes limit
Xen virtual network driver has higher latency than a physical NIC.
Having only 128K as limit for TSQ introduced 30% regression in guest
throughput.

This patch raises the limit to 256K. This reduces the regression to 8%.
This buys us more time to work out a proper solution in the long run.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 01:09:36 -07:00
Eric Dumazet
12e25e1041 tcp: remove redundant checks
tcp_v4_rcv() checks the following before calling tcp_v4_do_rcv():

if (th->doff < sizeof(struct tcphdr) / 4)
    goto bad_packet;
if (!pskb_may_pull(skb, th->doff * 4))
    goto discard_it;

So following check in tcp_v4_do_rcv() is redundant
and "goto csum_err;" is wrong anyway.

if (skb->len < tcp_hdrlen(skb) || ...)
	goto csum_err;

A second check can be removed after no_tcp_socket label for same reason.

Same tests can be removed in tcp_v6_do_rcv()

Note : short tcp frames are not properly accounted in tcpInErrs MIB,
because pskb_may_pull() failure simply drops incoming skb, we might
fix this in a separate patch.

Signed-off-by: Eric Dumazet  <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 01:04:40 -07:00
Shawn Bohrer
6e54030932 ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()
421b3885bf "udp: ipv4: Add udp early
demux" introduced a regression that allowed sockets bound to INADDR_ANY
to receive packets from multicast groups that the socket had not joined.
For example a socket that had joined 224.168.2.9 could also receive
packets from 225.168.2.9 despite not having joined that group if
ip_early_demux is enabled.

Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify
that the multicast packet is indeed ours.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 00:46:26 -07:00
Scott Feldman
7616dcbb21 switchdev: documentation: use switchdev_port_obj_xxx for IPv4 FIB add/modify/delete ops
Clarify in documentation and code that IPV4 FIB add operation is used for
both adding a new FIB entry to the device and for modifying an existing FIB
entry on the device.

Also, remove left-over references to ipv4_fib ops and replace with details
on SWITCHDEV_PORT_IPV4_FIB object.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 23:47:23 -07:00
David S. Miller
cf71f43e44 Included changes:
- code re-arrangement for better reading and understanding
 - code style fixups
 - comments corrections
 - remove unnecessary NULL check in batadv_iv_ogm_update_seqnos()
 - make boolean functions explicitly return a bool result
 - remove unnecessary variables in algo_register() and algo_select()
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVbwnkAAoJEOb/4TMchkvf6iAP/3MyKJDF9ubmOLkZpWKyBq+/
 MsTEN4PFRxQQ7Q2+Cct1MshuD0DBBznG+Nu1UwYUB5ahUPUmpntJ8hQoD982jT3u
 K4h/tHlyEtRVxPzYwW79woE/Q+hjdGqE745eKMHury0K+SkNR4jX3yJ7bjVRwQiC
 Sdk6uProCCgK5JHX++bxjbTnJobCvqCSy045hjMxuwFuTG4S+5le60m+tVe21D3C
 tnyT3y6L4OdbhKpBRMMAFkxYUzQONxiEWMYffubM6gk+ziIAttAJemLyE+ViHAH4
 Y7ItGd9Z/5+mPaO0OF3Q3jfN1jhGf3IxoYgKy9rL5JWIy6qomx0TTfPoPTDRYFR+
 2iQX59FIayaa9CgYbauHopEiDOJQ/nQ437haPO25xT9ICZbnPNWshdv9Z+zLNV/A
 uuUQrN+aWNLo9j40iD01s7AfPcYNDYklqygb9hSLTa7yeH/rPCG/RqJJ7zse4IQa
 /QMl1lUl484gPHFqMTVB7/75KL5G5B+KQdwON3AqnyRR3RrlOm7NbtcvuDTDheeW
 BAU5g7y/RG3DSoGtwPvFG6MyyPK8C2+niLY7EWUrs1EBWc5DGH+/oeVBR6SL46Fv
 KY1TiFrzvczjUKA0NyLw3w/jeE3SGxiVEBGN2Wv7veVwuV2Jc3MLGxNZGoKPum/k
 Vz7vG3ghIRM3aA1dO6Nx
 =m9Yd
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
pull request: batman-adv 20150603

here you have our second batch of patches intended for net-next.

In this patchset you won't find any new features, but quite some code
cleanup work, a bunch of code style fixes and also comments corrections
by Markus Pargmann.

Moreover you have a patch from Sven Eckelmann removing an unnecessary
NULL check in batadv_iv_ogm_update_seqnos().

Please pull or let me know of any problem!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 20:22:46 -07:00
Alexei Starovoitov
3896d655f4 bpf: introduce bpf_clone_redirect() helper
Allow eBPF programs attached to classifier/actions to call
bpf_clone_redirect(skb, ifindex, flags) helper which will
mirror or redirect the packet by dynamic ifindex selection
from within the program to a target device either at ingress
or at egress. Can be used for various scenarios, for example,
to load balance skbs into veths, split parts of the traffic
to local taps, etc.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 20:16:58 -07:00
Jiri Benc
640b2b107c openvswitch: disable LRO
Currently, openvswitch tries to disable LRO from the user space. This does
not work correctly when the device added is a vlan interface, though.
Instead of dealing with possibly complex stacked cross name space relations
in the user space, do the same as bridging does and call dev_disable_lro in
the kernel.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 19:39:35 -07:00
Markus Pargmann
f372d09059 batman-adv: Remove unnecessary ret variable in algo_register
Remove ret variable and all jumps.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-06-03 15:57:25 +02:00