Commit Graph

602400 Commits

Author SHA1 Message Date
Tom Herbert
c1e48af796 gue: Implement direction IP encapsulation
This patch implements direct encapsulation of IPv4 and IPv6 packets
in UDP. This is done a version "1" of GUE and as explained in I-D
draft-ietf-nvo3-gue-03.

Changes here are only in the receive path, fou with IPxIPx already
supports the transmit side. Both the normal receive path and
GRO path are modified to check for GUE version and check for
IP version in the case that GUE version is "1".

Tested:

IPIP with direct GUE encap
  1 TCP_STREAM
    4530 Mbps
  200 TCP_RR
    1297625 tps
    135/232/444 90/95/99% latencies

IP4IP6 with direct GUE encap
  1 TCP_STREAM
    4903 Mbps
  200 TCP_RR
    1184481 tps
    149/253/473 90/95/99% latencies

IP6IP6 direct GUE encap
  1 TCP_STREAM
   5146 Mbps
  200 TCP_RR
    1202879 tps
    146/251/472 90/95/99% latencies

SIT with direct GUE encap
  1 TCP_STREAM
    6111 Mbps
  200 TCP_RR
    1250337 tps
    139/241/467 90/95/99% latencies

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 23:51:14 -07:00
Linus Torvalds
c8ae067f26 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Fixes for crap of assorted ages: EOPENSTALE one is 4.2+, autofs one is
  4.6, d_walk - 3.2+.

  The atomic_open() and coredump ones are regressions from this window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  coredump: fix dumping through pipes
  fix a regression in atomic_open()
  fix d_walk()/non-delayed __d_free() race
  autofs braino fix for do_last()
  fix EOPENSTALE bug in do_last()
2016-06-07 20:41:36 -07:00
Mateusz Guzik
1607f09c22 coredump: fix dumping through pipes
The offset in the core file used to be tracked with ->written field of
the coredump_params structure. The field was retired in favour of
file->f_pos.

However, ->f_pos is not maintained for pipes which leads to breakage.

Restore explicit tracking of the offset in coredump_params. Introduce
->pos field for this purpose since ->written was already reused.

Fixes: a008393951 ("get rid of coredump_params->written").

Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-07 22:07:09 -04:00
Al Viro
a01e718f72 fix a regression in atomic_open()
open("/foo/no_such_file", O_RDONLY | O_CREAT) on should fail with
EACCES when /foo is not writable; failing with ENOENT is obviously
wrong.  That got broken by a braino introduced when moving the
creat_error logics from atomic_open() to lookup_open().  Easy to
fix, fortunately.

Spotted-by: "Yan, Zheng" <ukernel@gmail.com>
Tested-by: "Yan, Zheng" <ukernel@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-07 21:53:51 -04:00
Al Viro
3d56c25e3b fix d_walk()/non-delayed __d_free() race
Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay.  Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.

Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover.  Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.

Cc: stable@vger.kernel.org # v3.2+ (and watch out for __d_materialise_dentry())
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-06-07 21:26:55 -04:00
Srinivas Pandruvada
983e600e88 cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
When turbo is disabled, the ->set_policy() interface is broken.

For example, when turbo is disabled and cpuinfo.max = 2900000 (full
max turbo frequency), setting the limits results in frequency less
than the requested one:
Set 1000000 KHz results in 0700000 KHz
Set 1500000 KHz results in 1100000 KHz
Set 2000000 KHz results in  1500000 KHz

This is because the limits->max_perf fraction is calculated using
the max turbo frequency as the reference, but when the max P-State is
capped in intel_pstate_get_min_max(), the reference is not the max
turbo P-State. This results in reducing max P-State.

One option is to always use max turbo as reference for calculating
limits. But this will not be correct. By definition the intel_pstate
sysfs limits, shows percentage of available performance. So when
BIOS has disabled turbo, the available performance is max non turbo.
So the max_perf_pct should still show 100%.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog, rewrite in fewer lines of code ]
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-08 03:22:40 +02:00
Srinivas Pandruvada
2c2c1af449 cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()
The limits->max_perf is rounded_up but immediately overwritten by
another assignment to limits->max_perf.

Move that operation to the correct location.

While here also added a pr_debug() call in ->set_policy to aid in
debugging.

Fixes: 785ee27881 (cpufreq: intel_pstate: Fix limits->max_perf rounding error)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw : Subject & changelog ]
Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-08 03:22:39 +02:00
David S. Miller
3256564458 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains two Netfilter/IPVS fixes for your net
tree, they are:

1) Fix missing alignment in next offset calculation for standard
   targets, introduced in the previous merge window, patch from
   Florian Westphal.

2) Fix to correct the handling of outgoing connections which use the
   SIP-pe such that the binding of a real-server is updated when needed.
   This was an omission from changes introduced by Marco Angaroni in
   the previous merge window too, to allow handling of outgoing
   connections by the SIP-pe. Patch and report came via Simon Horman.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 17:14:10 -07:00
Yuchung Cheng
ce3cf4ec03 tcp: record TLP and ER timer stats in v6 stats
The v6 tcp stats scan do not provide TLP and ER timer information
correctly like the v4 version . This patch fixes that.

Fixes: 6ba8a3b19e ("tcp: Tail loss probe (TLP)")
Fixes: eed530b6c6 ("tcp: early retransmit")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 17:12:22 -07:00
Daniel Borkmann
92c075dbde net: sched: fix tc_should_offload for specific clsact classes
When offloading classifiers such as u32 or flower to hardware, and the
qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes,
since not all of them handle ingress, therefore we must leave those in
software path. Add a .tcf_cl_offload() callback, so we can generically
handle them, tested on ixgbe.

Fixes: 10cbc68434 ("net/sched: cls_flower: Hardware offloaded filters statistics support")
Fixes: 5b33f48842 ("net/flower: Introduce hardware offload support")
Fixes: a1b7c5fd7f ("net: sched: add cls_u32 offload hooks for netdevs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:59:53 -07:00
WANG Cong
a03e6fe569 act_police: fix a crash during removal
The police action is using its own code to initialize tcf hash
info, which makes us to forgot to initialize a->hinfo correctly.
Fix this by calling the helper function tcf_hash_create() directly.

This patch fixed the following crash:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
 IP: [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
 PGD d3c34067 PUD d3e18067 PMD 0
 Oops: 0000 [#1] SMP
 CPU: 2 PID: 853 Comm: tc Not tainted 4.6.0+ #87
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 task: ffff8800d3e28040 ti: ffff8800d3f6c000 task.ti: ffff8800d3f6c000
 RIP: 0010:[<ffffffff810c099f>]  [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91
 RSP: 0000:ffff88011b203c80  EFLAGS: 00010002
 RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000028
 RBP: ffff88011b203d40 R08: 0000000000000001 R09: 0000000000000000
 R10: ffff88011b203d58 R11: ffff88011b208000 R12: 0000000000000001
 R13: ffff8800d3e28040 R14: 0000000000000028 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000028 CR3: 00000000d4be1000 CR4: 00000000000006e0
 Stack:
  ffff8800d3e289c0 0000000000000046 000000001b203d60 ffffffff00000000
  0000000000000000 ffff880000000000 0000000000000000 ffffffff00000000
  ffffffff8187142c ffff88011b203ce8 ffff88011b203ce8 ffffffff8101dbfc
 Call Trace:
  <IRQ>
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
  [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35
  [<ffffffff810a9604>] ? sched_clock_local+0x11/0x78
  [<ffffffff810bf6a1>] ? mark_lock+0x24/0x201
  [<ffffffff810c1dbd>] lock_acquire+0x120/0x1b4
  [<ffffffff810c1dbd>] ? lock_acquire+0x120/0x1b4
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff81aad89f>] _raw_spin_lock_bh+0x3c/0x72
  [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1
  [<ffffffff8187142c>] __tcf_hash_release+0x77/0xd1
  [<ffffffff81871a27>] tcf_action_destroy+0x49/0x7c
  [<ffffffff81870b1c>] tcf_exts_destroy+0x20/0x2d
  [<ffffffff8189273b>] u32_destroy_key+0x1b/0x4d
  [<ffffffff81892788>] u32_delete_key_freepf_rcu+0x1b/0x1d
  [<ffffffff810de3b8>] rcu_process_callbacks+0x610/0x82e
  [<ffffffff8189276d>] ? u32_destroy_key+0x4d/0x4d
  [<ffffffff81ab0bc1>] __do_softirq+0x191/0x3f4

Fixes: ddf97ccdd7 ("net_sched: add network namespace support for tc actions")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:38:59 -07:00
David S. Miller
34fe76abbe Merge branch 'net-sched-fast-stats'
Eric Dumazet says:

====================
net: sched: faster stats gathering

A while back, I sent one RFC patch using lockless stats gathering
on 64bit arches.

This patch series does it more cleanly, using a seqcount.

Since qdisc/class stats are written at dequeue() time,
we can ask the dequeue to change the seqcount, so that
stats readers can avoid taking the root qdisc lock,
and instead the typical read_seqcount_{begin|retry} guarded
loop.

This does not change fast path costs, as the seqcount
increments are not more expensive than the bit manipulation,
and allows readers to not freeze the fast path anymore.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:37:14 -07:00
Eric Dumazet
edb09eb17e net: sched: do not acquire qdisc spinlock in qdisc/class stats dump
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :

For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.

An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()

In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.

I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.

[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kevin Athey <kda@google.com>
Cc: Xiaotian Pei <xiaotian@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:37:14 -07:00
Eric Dumazet
f9eb8aea2a net_sched: transform qdisc running bit into a seqcount
Instead of using a single bit (__QDISC___STATE_RUNNING)
in sch->__state, use a seqcount.

This adds lockdep support, but more importantly it will allow us
to sample qdisc/class statistics without having to grab qdisc root lock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:37:13 -07:00
Eric Dumazet
aafddbf0cf fq_codel: return non zero qlen in class dumps
We properly scan the flow list to count number of packets,
but John passed 0 to gnet_stats_copy_queue() so we report
a zero value to user space instead of the result.

Fixes: 6401585366 ("net: sched: restrict use of qstats qlen")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:28:11 -07:00
David S. Miller
064d5e6f8e Merge branch 'u32-hwoffload-fixes'
Jakub Kicinski says:

====================
cls_u32 hardware offload fixes

This set fixes two small issues with error codes I noticed
in cls_u32.  Second patch could be viewed as user space API
change but that portion of API is not part of any release,
yet.

Compile tested only.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:27:15 -07:00
Jakub Kicinski
d47a0f387f net: cls_u32: be more strict about skip-sw flag
Return an error if user requested skip-sw and the underlaying
hardware cannot handle tc offloads (or offloads are disabled).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:27:14 -07:00
Jakub Kicinski
1a0f7d2984 net: cls_u32: fix error code for invalid flags
'err' variable is not set in this test, we would return whatever
previous test set 'err' to.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:27:14 -07:00
Colin Ian King
7b01b8e847 gtp: #define _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__
Fix clang build warning:

./include/uapi/linux/gtp.h:1:9: warning: '_UAPI_LINUX_GTP_H_' is
used as a header guard here, followed by #define of a different
macro [-Wheader-guard]

fix by defining  _UAPI_LINUX_GTP_H_ and not _UAPI_LINUX_GTP_H__

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:25:49 -07:00
Linus Torvalds
2051877c4c This finally removes the CLK_IS_ROOT flag by picking up the last few
stragglers that didn't get merged by anyone this time around. Better to
 do it now than wait for another one to pop up. There's also a minor
 maintainers update and a Kconfig fix.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXVoIpAAoJEK0CiJfG5JUlk/wP/Ro3dPTTJW8tf1kabMWwYRym
 PRsKeBNUbiAbbiDdYFcDVgVrxMpkeRQX+qoTPT37FypMbyDnu+rIEeWqHyyNCdzR
 4+di548c8XzStBMPNGaKG+WWVDOU/rRWGrun1vc2NR8JohgWFBx8ciV9Kht4g+Ss
 5ggm0E/ZKV5Hj7SuiBVbzMsZ/jufDM/V9NeIHy5Gnz6dPuRBkzrvwu9obJ/QLCWE
 mh7eRug4C+6xYaQrPbXzgxTXqRJQkk/M27ArodVhvZy16gPr70HC+oNGUJwHk+Fs
 yiqx9wicQuxNQqibgOC087RjUTDfFcGLdV71ouIQQhWuZFdQlHr9RfKaq+v1g/DB
 s3n8whjHJAukU4i34btG3Mq1UcoLTL4vkOYMW+2yjvUfdUdY5BtKGphrkPO5xKMP
 4hpAKkNW3ViTLn3cJQMuk5OgzPr0XrVjd++GtU7XjczzDKx8j9vTbhyZL0mRl+6s
 jx8GU4hGuEkuhBIfWENNe2W2rf4TBrfQeiLsJt9nLFY4yqJRNByplkMmL75in/cD
 PzbF647286PJYJdhjP0n70E2jyZbfyGYaUdZ9rbuwbEtA3XpOq4ZiWG0ZqPi7aOf
 UickP3QW0AoY4Y0QhZ+thTcNxAZPPq6IfEzFNvzGXArR6msQLYzF9Y1dQ/HuXmZP
 +tyYKKBCZbKObv463cM3
 =JewH
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "This finally removes the CLK_IS_ROOT flag by picking up the last few
  stragglers that didn't get merged by anyone this time around.

  Better to do it now than wait for another one to pop up.  There's also
  a minor maintainers update and a Kconfig fix"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: nxp: Select MFD_SYSCON for creg driver
  MAINTAINERS: Add file patterns for clock device tree bindings
  clk: Remove CLK_IS_ROOT flag
  clk: microchip: Remove CLK_IS_ROOT
  powerpc/512x: clk: Remove CLK_IS_ROOT
  vexpress/spc: Remove CLK_IS_ROOT
2016-06-07 16:24:44 -07:00
David S. Miller
64151ae36e Merge branch 'be2net-noncrit-fixes'
Sathya Perla says:

====================
be2net: patch set

Hi David, the following patch set contains three non-critical fixes that
can go into the net-next tree.

Patch 1 fixes the logic for provisioning queue pairs on VFs to take into
account the limit on number of TXQs too as in some profiles the number
of TXQs is less than that of RXQs.

Patch 2 enables WoL support from shutdown on Skyhawk.

Patch 3 enhances the logic for provisioning queue pairs on VFs on
SR-IOV over multi-partition configs. Each PF (partition) on a port has to
compute the number of RSS tables it's VFs can use.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:18:20 -07:00
Somnath Kotur
de2b1e0366 be2net: Fix provisioning of RSS for VFs in multi-partition configurations
Currently, we do not distribute queue resources to enable RSS for VFs
in multi-channel/partition configurations.
Fix this by having each PF(SRIOV capable) calculate it's share of the
15 RSS Policy Tables available per port before provisioning resources for
all the VFs.
This  proportional share calculation is done based on division of the
PF's MAX VFs with the Total MAX VFs on that port. It also needs to
learn about the no: of NIC PFs on the port and subtract that from
the 15 RSS Policy Tables on the port.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:18:20 -07:00
Sriharsha Basavapatna
45f13df75f be2net: Enable Wake-On-LAN from shutdown for Skyhawk
Skyhawk does support wake-up from ACPI shutdown state - S5, provided the
platform supports it (like Auxiliary power source etc). The changes listed
below are done to fix this.

1) There's no need to defer the HW configuration of WOL to be_suspend().
Remove this in be_suspend() and move it to be_set_wol() ethtool function
so it is configured directly in the context of ethtool. This automatically
takes care of the shutdown case.

2) The driver incorrectly uses WOL_CAP field in the FW response to
get_acpi_wol_cap() command, to determine if WOL is enabled. Instead the
driver must rely on the macaddr field in the response to infer WOL state.

3) In be_get_config() during init, if we find that WOL is enabled in FW,
call pci_enable_wake() to enable pmcsr.pme_en bit. This is needed to
support persistent WOL configuration provided by the FW in some platforms.

4) Remove code in be_set_wol() that writes to PCICFG_PM_CONTROL_OFFSET
to set pme_en bit; pci_enable_wake() sets that.

Fixes: 028991e49 ("Enabling Wake-on-LAN is not supported in S5 state")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:18:19 -07:00
Suresh Reddy
b9263cbf21 be2net: use max-TXQs limit too while provisioning VF queue pairs
When the PF driver provisions resources for VFs, it currently only looks
at max RSS queues available to calculate the number of VF queue pairs.
This logic breaks when there are less number of TX-queues than RSS-queues.
This patch fixes this problem by using the max-TXQs available in the
PF-pool in the calculations. As a part of this change the
be_calculate_vf_qs() routine is renamed as be_calculate_vf_res() and the
code that calculates limits on other related resources is moved here to
contain all resource calculation code inside one routine.

Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:18:19 -07:00
Colin Ian King
9f647a6de9 net: fec: fix spelling mistakes and add missing newline
trivial fix to spelling mistakes and add missing newline in pr_err
messages

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:15:59 -07:00
David S. Miller
71743ffa15 Merge branch 'bnxt_en-fixes'
Michael Chan says:

====================
bnxt_en: Bug fixes.

Fix a race condition and VLAN rx acceleration logic.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:02:04 -07:00
Michael Chan
8852ddb4dc bnxt_en: Simplify VLAN receive logic.
Since both CTAG and STAG rx acceleration must be enabled together, we
only need to check one feature flag (NETIF_F_HW_VLAN_CTAG_RX) before
calling __vlan_hwaccel_put_tag().

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:02:03 -07:00
Michael Chan
5a9f6b238e bnxt_en: Enable and disable RX CTAG and RX STAG VLAN acceleration together.
The hardware can only be set to strip or not strip both the VLAN CTAG and
STAG.  It cannot strip one and not strip the other.  Add logic to
bnxt_fix_features() to toggle both feature flags when the user is toggling
one of them.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:02:03 -07:00
Michael Chan
b9a8460a08 bnxt_en: Fix tx push race condition.
Set the is_push flag in the software BD before the tx data is pushed to
the chip.  It is possible to get the tx interrupt as soon as the tx data
is pushed.  The tx handler will not handle the event properly if the
is_push flag is not set and it will crash.

Signed-off-by: Michael Chan <michael.chan@broadocm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:02:03 -07:00
Zhao Qiang
c19b6d246a drivers/net: support hdlc function for QE-UCC
The driver add hdlc support for Freescale QUICC Engine.
It support NMSI and TSA mode.

Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:56:31 -07:00
Zhao Qiang
35ef1c20fd fsl/qe: Add QE TDM lib
QE has module to support TDM, some other protocols
supported by QE are based on TDM.
add a qe-tdm lib, this lib provides functions to the protocols
using TDM to configurate QE-TDM.

Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:56:31 -07:00
Zhao Qiang
19163ac312 fsl/qe: Make regs resouce_size_t
Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:56:31 -07:00
Zhao Qiang
bb8b2062af fsl/qe: setup clock source for TDM mode
Add tdm clock configuration in both qe clock system and ucc
fast controller.

Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:56:30 -07:00
Zhao Qiang
68f047e3d6 fsl/qe: add rx_sync and tx_sync for TDM mode
Rx_sync and tx_sync are used by QE-TDM mode,
add them to struct ucc_fast_info.

Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:56:30 -07:00
Jamal Hadi Salim
0b0f43fe2e net sched: indentation and other OCD stylistic fixes
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
2016-06-07 15:53:54 -07:00
David S. Miller
be11991368 Merge branch 'sch-action-tstamp'
Jamal Hadi Salim says:

====================
net sched action timestamp improvements

Various aggregations of duplicated code, fixes and introduction of firstused
timestamp

v2: add const for source time info per suggestion from Cong
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:53:44 -07:00
Jamal Hadi Salim
48d8ee1694 net sched actions: aggregate dumping of actions timeinfo
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:53:43 -07:00
Jamal Hadi Salim
53eb440f4a net sched actions: introduce timestamp for firsttime use
Useful to know when the action was first used for accounting
(and debugging)

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:53:43 -07:00
Jamal Hadi Salim
9c4a4e488b net sched: actions use tcf_lastuse_update for consistency
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:53:43 -07:00
Amir Vadai
e69985c67c net/sched: cls_flower: Introduce support in SKIP SW flag
In order to make a filter processed only by hardware, skip_sw flag
should be supplied. This is an addition to the already existing skip_hw
flag (filter will be processed by software only). If no flag is
specified, filter will be processed by both software and hardware.

If only hardware offloaded filters exist, fl_classify() will return
without doing anything.

A following userspace patch will be sent once kernel patch is accepted.

Example:

tc filter add dev enp0s9 protocol ip prio 20 parent ffff: \
	flower \
		ip_proto 6 \
		indev enp0s9 \
		skip_sw \
	action skbedit mark 0x1234

Signed-off-by: Amir Vadai <amirva@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:49:53 -07:00
David S. Miller
919f274fd6 Merge branch 'qed-iov-fw-reqs'
Yuval Mintz says:

====================
qed: IOV series - relax firmware requirements

In order for VFs to work, current implementation demands that the VF's
requried storm firmware would be exactly the version that was loaded by
the PF, which is a very harsh requirement.
This patch series is intended to relax this -
the recently submitted firmware is intended to be forward/backward
compatible in its fastpath [slowpath is configured by PF on behalf of VF],
and so VFs would only be required of having the same major faspath HSI in
order to work.

Most of the other patches in this series extend current forward
compatibilty of driver to reduce chance of breaking PF/VF compatibility
in the future. A few are unrelated IOV changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:40:12 -07:00
Yuval Mintz
54fdd80f6f qed: PF to reply to unknown messages
If a future VF would send the PF an unknown message, the PF today would
not send a reply. This would have 2 bad effects:
  a. VF would have to timeout on the request.
  b. If VF were to send an additional message to PF, firmware would mark
     it as malicious.

Instead, if there's some valid reply-address on the message - let the PF
answer and tell the VF it doesn't know the message.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:40:12 -07:00
Yuval Mintz
8246d0b48b qed: PF enforce MAC limitation of VFs
The only limitation relating to MACs the PF enforce today on its VFs
is in case it has a forced-unicast MAC address for them, in which case
they can't configure other unicast addresses.
Specifically, the PF isn't enforcing the number of MAC addresse a VF can
configure regardless of the nubmer of such filters agreed upon by PF and
VF during the acquisition process.

PF's shadow-config is now extended to also contain information about its
VFs' unicast addresses configuration, allowing such enforcement.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:40:12 -07:00
Yuval Mintz
5040acf537 qed: Move doorbell calculation from VF to PF
Today, the VF is aware of its queues context-ids, and calculates the
doorbell address when opening its queues on its own.
The configuration of doorbells in HW can sometime in the future be changed
by the PF [hw has several configurable features that might affect doorbell
addresses, e.g., dpm support], this would break compatibility with older
VFs as their calculated doorbell addresses would be incorrect for such a
configuration.

In order to avoid such a backward compatibility failure, let the PF make
the calculation of the doorbell offset based on the context-id, and pass
that to the VF.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:40:11 -07:00
Yuval Mintz
41086467d6 qed: Make PF more robust against malicious VF
There are several requests the VF can make toward the PF which the driver
would pass to firmware without checking the validity first - specifically,
opening queues and updating vports. Such configurations might cause the
firmware to assert.

This adds validation of the legality of said configurations on the PF side
before passing it onward via ramrod to firmware.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:40:11 -07:00
Yuval Mintz
1cf2b1a971 qed: PF-VF resource negotiation
One of the goals of the vf's first message to the PF [acquire]
is to learn about the number of resources available to it [macs, vlans,
etc.]. This is done via negotiation - the VF requires a set of resources,
which the PF either approves or disaproves and sends a smaller set of
resources as alternative. In this later case, the VF is then expected to
either abort the probe or re-send the acquire message with less
required resources.

While this infrastructure exists since the initial submision of qed
SRIOV support, it's in fact completely inoperational - PF isn't really
looking into the resources the VF has asked for and is never going to
reply to the VF that it lacks resources.

This patch addresses this flow, fixing it and allowing the PF and VF
to actually agree on a set of resources.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:40:11 -07:00
Yuval Mintz
1fe614d10f qed: Relax VF firmware requirements
Current driver require an exact match between VF and PF storm firmware;
Any difference would fail the VF acquire message, causing the VF probe
to be aborted.

While there's still dependencies between the two, the recent FW submission
has relaxed the match requirement - instead of an exact match, there's now
a 'fastpath' HSI major/minor scheme, where VFs and PFs that match in their
major number can co-exist even if their minor is different.

In order to accomadate this change some changes in the vf-start init flow
had to be made, as the VF start ramrod now has to be sent only after PF
learns which fastpath HSI its VF is requiring.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:40:11 -07:00
Eric Dumazet
3bcb846ca4 net: get rid of spin_trylock() in net_tx_action()
Note: Tom Herbert posted almost same patch 3 months back, but for
different reasons.

The reasons we want to get rid of this spin_trylock() are :

1) Under high qdisc pressure, the spin_trylock() has almost no
chance to succeed.

2) We loop multiple times in softirq handler, eventually reaching
the max retry count (10), and we schedule ksoftirqd.

Since we want to adhere more strictly to ksoftirqd being waked up in
the future (https://lwn.net/Articles/687617/), better avoid spurious
wakeups.

3) calls to __netif_reschedule() dirty the cache line containing
q->next_sched, slowing down the owner of qdisc.

4) RT kernels can not use the spin_trylock() here.

With help of busylock, we get the qdisc spinlock fast enough, and
the trylock trick brings only performance penalty.

Depending on qdisc setup, I observed a gain of up to 19 % in qdisc
performance (1016600 pps instead of 853400 pps, using prio+tbf+fq_codel)

("mpstat -I SCPU 1" is much happier now)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:32:03 -07:00
Wu Fengguang
fa54cc70ed rxrpc: fix ptr_ret.cocci warnings
net/rxrpc/rxkad.c:1165:1-3: WARNING: PTR_ERR_OR_ZERO can be used

 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

CC: David Howells <dhowells@redhat.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:30:21 -07:00
David S. Miller
29a36611e9 Merge branch 'rds-packet-assembly-fixes'
Sowmini Varadhan says:

====================
RDS: TCP: socket locking RDS packet assembly fixes

This three part patchset fixes bugs in synchronization between
rds_tcp_accept_one() and the rds-tcp send/recv path.

Patch 1 ensures that the lock_sock() is taken appropriately
and the RDS datagram reassembly state is reset  to synchronize
with the receive path.

Patch 2 ensures that partially sent RDS datagrams will get
retransmitted after rds_tcp_accept_one() switches sockets.

Patch 3 fixes a race window which would prematurely re-enable
rds_send_xmit() before the rds_tcp_connection setup has been
completed in rds_tcp_accept_one().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:10:16 -07:00