Commit Graph

873738 Commits

Author SHA1 Message Date
Linus Torvalds
e472c64aa4 dmaengine fixes for v5.4-rc6
Few fixes on the dmaengine drivers:
  - fix in sprd driver for link list and potential memory leak
  - tegra transfer failure fix
  - imx size check fix for script_number
  - xilinx fix for 64bit AXIDMA and control reg update
  - qcom bam dma resource leak fix
  - cppi slave transfer fix when idle
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl26hEgACgkQfBQHDyUj
 g0cAjRAAweFLgnBti+Cqb3mjbHxYfSknw/4CNis5XvhXfCGUupgqQQfL1mjH1ZGv
 GPz0lOjgkaUwtioxMrIqGSc/7MCsHOK02kktfY87SSraSQrKF2Hl9Y0KRYu8FIWW
 AFvmbhXAtLqRY87r6K/S7LzQzSrYjfGHxHV5ShsA18+2Y2gtuQQ2V6nbkjZquOle
 So27cjAZXYQJP9YI2DJXpcgwe2DSO2cgfcLH58t6Prw4/piJf7P1Qhs/ng5Egfbe
 RahP8AdTSjRf2ylVHngaM5EFuDsjn/uslEfh93dL4gQH/NWqNgTbk02R0HtwrRjs
 F8EVNsDqVjWrc1D/tKWUucYVMDdGHmNovN0sGESbuVBgUKwqjrhLMrEzxs2d0cgN
 h2YRbPnAHRmO/F0x789TGFsw3BU0T/gAl+LrVFUBCKvhwtEtsG6rmX1wjSVca3lM
 zMhV9oipMUBPSVQQhyIf/yizEqw0YkgFKelQppb6e/I+OBgQGYchgaiQtbNcI4L2
 jOkQ7dsFPL8a8bmN+r38IzJMjHp5F+Govuc+1nmhuALpGxgBtmEqzgzBiP3AZgcJ
 XjkKW3IQdK+TAcqA6eDFLlm33Npe+flXQ/QvJlpYfHLDzZx2CNQExaFVo4meNwDu
 iGajKgP9DhO6l7uaep2mDxMPz7u/edhfvHy6cjWUIH9KjwPZ64w=
 =e87E
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "A few fixes to the dmaengine drivers:

   - fix in sprd driver for link list and potential memory leak

   - tegra transfer failure fix

   - imx size check fix for script_number

   - xilinx fix for 64bit AXIDMA and control reg update

   - qcom bam dma resource leak fix

   - cppi slave transfer fix when idle"

* tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle
  dmaengine: qcom: bam_dma: Fix resource leak
  dmaengine: sprd: Fix the possible memory leak issue
  dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config
  dmaengine: xilinx_dma: Fix 64-bit simple AXIDMA transfer
  dmaengine: imx-sdma: fix size check for sdma script_number
  dmaengine: tegra210-adma: fix transfer failure
  dmaengine: sprd: Fix the link-list pointer register configuration issue
2019-10-31 07:34:09 +00:00
David S. Miller
3da0966320 Merge branch 'hv_netvsc-fix-error-handling-in-netvsc_attach-set_features'
Haiyang Zhang says:

====================
hv_netvsc: fix error handling in netvsc_attach/set_features

The error handling code path in these functions are not correct.
This patch set fixes them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:17:36 -07:00
Haiyang Zhang
719b85c336 hv_netvsc: Fix error handling in netvsc_attach()
If rndis_filter_open() fails, we need to remove the rndis device created
in earlier steps, before returning an error code. Otherwise, the retry of
netvsc_attach() from its callers will fail and hang.

Fixes: 7b2ee50c0c ("hv_netvsc: common detach logic")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:17:36 -07:00
Haiyang Zhang
c4509a5ac0 hv_netvsc: Fix error handling in netvsc_set_features()
When an error is returned by rndis_filter_set_offload_params(), we should
still assign the unaffected features to ndev->features. Otherwise, these
features will be missing.

Fixes: d6792a5a07 ("hv_netvsc: Add handler for LRO setting change")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:17:36 -07:00
Vishal Kulkarni
fc89cc358f cxgb4: fix panic when attaching to ULD fail
Release resources when attaching to ULD fail. Otherwise, data
mismatch is seen between LLD and ULD later on, which lead to
kernel panic when accessing resources that should not even
exist in the first place.

Fixes: 94cdb8bb99 ("cxgb4: Add support for dynamic allocation of resources for ULD")
Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:11:13 -07:00
David S. Miller
d86784fe9b Merge branch 'Control-action-percpu-counters-allocation-by-netlink-flag'
Vlad Buslov says:

====================
Control action percpu counters allocation by netlink flag

Currently, significant fraction of CPU time during TC filter allocation
is spent in percpu allocator. Moreover, percpu allocator is protected
with single global mutex which negates any potential to improve its
performance by means of recent developments in TC filter update API that
removed rtnl lock for some Qdiscs and classifiers. In order to
significantly improve filter update rate and reduce memory usage we
would like to allow users to skip percpu counters allocation for
specific action if they don't expect high traffic rate hitting the
action, which is a reasonable expectation for hardware-offloaded setup.
In that case any potential gains to software fast-path performance
gained by usage of percpu-allocated counters compared to regular integer
counters protected by spinlock are not important, but amount of
additional CPU and memory consumed by them is significant.

In order to allow configuring action counters allocation type at
runtime, implement following changes:

- Implement helper functions to update the action counters and use them
  in affected actions instead of updating counters directly. This steps
  abstracts actions implementation from counter types that are being
  used for particular action instance at runtime.

- Modify the new helpers to use percpu counters if they were allocated
  during action initialization and use regular counters otherwise.

- Extend action UAPI TCA_ACT space with TCA_ACT_FLAGS field. Add
  TCA_ACT_FLAGS_NO_PERCPU_STATS action flag and update
  hardware-offloaded actions to not allocate percpu counters when the
  flag is set.

With this changes users that prefer action update slow-path speed over
software fast-path speed can dynamically request actions to skip percpu
counters allocation without affecting other users.

Now, lets look at actual performance gains provided by this change.
Simple test is used to measure insertion rate - iproute2 TC is executed
in parallel by xargs in batch mode, its total execution time is measured
by shell builtin "time" command. The command runs 20 concurrent tc
instances, each with its own batch file with 100k rules:

$ time ls add* | xargs -n 1 -P 20 sudo tc -b

Two main rule profiles are tested. First is simple L2 flower classifier
with single gact drop action. The configuration is chosen as worst case
scenario because with single-action rules pressure on percpu allocator
is minimized. Example rule:

filter add dev ens1f0 protocol ip ingress prio 1 handle 1 flower skip_hw
    src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0 action drop

Second profile is typical real-world scenario that uses flower
classifier with some L2-4 fields and two actions (tunnel_key+mirred).
Example rule:

filter add dev ens1f0_0 protocol ip ingress prio 1 handle 1 flower
    skip_hw src_mac e4:11:0:0:0:0 dst_mac e4:12:0:0:0:0 src_ip
    192.168.111.1 dst_ip 192.168.111.2 ip_proto udp dst_port 1 src_port
    1 action tunnel_key set id 1 src_ip 2.2.2.2 dst_ip 2.2.2.3 dst_port
    4789 action mirred egress redirect dev vxlan1

 Profile           |        percpu |     no_percpu | X improvement
                   | (k rules/sec) | (k rules/sec) |
-------------------+---------------+---------------+---------------
 Gact drop         |           203 |           259 |          1.28
 tunnel_key+mirred |            92 |           204 |          2.22

For simple drop action removing percpu allocation leads to ~25%
insertion rate improvement. Perf profiles highlights the bottlenecks.

Perf profile of run with percpu allocation (gact drop):

+ 89.11% 0.48% tc [kernel.vmlinux] [k] entry_SYSCALL_64
+ 88.58% 0.04% tc [kernel.vmlinux] [k] do_syscall_64
+ 87.50% 0.04% tc libc-2.29.so [.] __libc_sendmsg
+ 86.96% 0.04% tc [kernel.vmlinux] [k] __sys_sendmsg
+ 86.85% 0.01% tc [kernel.vmlinux] [k] ___sys_sendmsg
+ 86.60% 0.05% tc [kernel.vmlinux] [k] sock_sendmsg
+ 86.55% 0.12% tc [kernel.vmlinux] [k] netlink_sendmsg
+ 86.04% 0.13% tc [kernel.vmlinux] [k] netlink_unicast
+ 85.42% 0.03% tc [kernel.vmlinux] [k] netlink_rcv_skb
+ 84.68% 0.04% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg
+ 84.56% 0.24% tc [kernel.vmlinux] [k] tc_new_tfilter
+ 75.73% 0.65% tc [cls_flower] [k] fl_change
+ 71.30% 0.03% tc [kernel.vmlinux] [k] tcf_exts_validate
+ 71.27% 0.13% tc [kernel.vmlinux] [k] tcf_action_init
+ 71.06% 0.01% tc [kernel.vmlinux] [k] tcf_action_init_1
+ 70.41% 0.04% tc [act_gact] [k] tcf_gact_init
+ 53.59% 1.21% tc [kernel.vmlinux] [k] __mutex_lock.isra.0
+ 52.34% 0.34% tc [kernel.vmlinux] [k] tcf_idr_create
- 51.23% 2.17% tc [kernel.vmlinux] [k] pcpu_alloc
  - 49.05% pcpu_alloc
    + 39.35% __mutex_lock.isra.0 4.99% memset_erms
    + 2.16% pcpu_alloc_area
  + 2.17% __libc_sendmsg
+ 45.89% 44.33% tc [kernel.vmlinux] [k] osq_lock
+ 9.94% 0.04% tc [kernel.vmlinux] [k] tcf_idr_check_alloc
+ 7.76% 0.00% tc [kernel.vmlinux] [k] tcf_idr_insert
+ 6.50% 0.03% tc [kernel.vmlinux] [k] tfilter_notify
+ 6.24% 6.11% tc [kernel.vmlinux] [k] mutex_spin_on_owner
+ 5.73% 5.32% tc [kernel.vmlinux] [k] memset_erms
+ 5.31% 0.18% tc [kernel.vmlinux] [k] tcf_fill_node

Here bottleneck is clearly in pcpu_alloc() function that takes more than
half CPU time, which is mostly wasted busy-waiting for internal percpu
allocator global lock.

With percpu allocation removed (gact drop):

+ 87.50% 0.51% tc [kernel.vmlinux] [k] entry_SYSCALL_64
+ 86.94% 0.07% tc [kernel.vmlinux] [k] do_syscall_64
+ 85.75% 0.04% tc libc-2.29.so [.] __libc_sendmsg
+ 85.00% 0.07% tc [kernel.vmlinux] [k] __sys_sendmsg
+ 84.84% 0.07% tc [kernel.vmlinux] [k] ___sys_sendmsg
+ 84.59% 0.01% tc [kernel.vmlinux] [k] sock_sendmsg
+ 84.58% 0.14% tc [kernel.vmlinux] [k] netlink_sendmsg
+ 83.95% 0.12% tc [kernel.vmlinux] [k] netlink_unicast
+ 83.34% 0.01% tc [kernel.vmlinux] [k] netlink_rcv_skb
+ 82.39% 0.12% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg
+ 82.16% 0.25% tc [kernel.vmlinux] [k] tc_new_tfilter
+ 75.13% 0.84% tc [cls_flower] [k] fl_change
+ 69.92% 0.05% tc [kernel.vmlinux] [k] tcf_exts_validate
+ 69.87% 0.11% tc [kernel.vmlinux] [k] tcf_action_init
+ 69.61% 0.02% tc [kernel.vmlinux] [k] tcf_action_init_1
- 68.80% 0.10% tc [act_gact] [k] tcf_gact_init
  - 68.70% tcf_gact_init
    + 36.08% tcf_idr_check_alloc
    + 31.88% tcf_idr_insert
+ 63.72% 0.58% tc [kernel.vmlinux] [k] __mutex_lock.isra.0
+ 58.80% 56.68% tc [kernel.vmlinux] [k] osq_lock
+ 36.08% 0.04% tc [kernel.vmlinux] [k] tcf_idr_check_alloc
+ 31.88% 0.01% tc [kernel.vmlinux] [k] tcf_idr_insert

The gact actions (like all other actions types) are inserted in single
idr instance protected by global (per namespace) lock that becomes new
bottleneck with such simple rule profile and prevents achieving 2x+
performance increase that can be expected by looking at profiling data
for insertion action with percpu counter.

Perf profile of run with percpu allocation (tunnel_key+mirred):

+ 91.95% 0.21% tc [kernel.vmlinux] [k] entry_SYSCALL_64
+ 91.74% 0.06% tc [kernel.vmlinux] [k] do_syscall_64
+ 90.74% 0.01% tc libc-2.29.so [.] __libc_sendmsg
+ 90.52% 0.01% tc [kernel.vmlinux] [k] __sys_sendmsg
+ 90.50% 0.04% tc [kernel.vmlinux] [k] ___sys_sendmsg
+ 90.41% 0.02% tc [kernel.vmlinux] [k] sock_sendmsg
+ 90.38% 0.04% tc [kernel.vmlinux] [k] netlink_sendmsg
+ 90.10% 0.06% tc [kernel.vmlinux] [k] netlink_unicast
+ 89.76% 0.01% tc [kernel.vmlinux] [k] netlink_rcv_skb
+ 89.28% 0.04% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg
+ 89.15% 0.03% tc [kernel.vmlinux] [k] tc_new_tfilter
+ 83.41% 0.33% tc [cls_flower] [k] fl_change
+ 81.17% 0.04% tc [kernel.vmlinux] [k] tcf_exts_validate
+ 81.13% 0.06% tc [kernel.vmlinux] [k] tcf_action_init
+ 81.04% 0.04% tc [kernel.vmlinux] [k] tcf_action_init_1
- 73.59% 2.16% tc [kernel.vmlinux] [k] pcpu_alloc
  - 71.42% pcpu_alloc
    + 61.41% __mutex_lock.isra.0 5.02% memset_erms
    + 2.93% pcpu_alloc_area
  + 2.16% __libc_sendmsg
+ 63.58% 0.17% tc [kernel.vmlinux] [k] tcf_idr_create
+ 63.40% 0.60% tc [kernel.vmlinux] [k] __mutex_lock.isra.0
+ 57.85% 56.38% tc [kernel.vmlinux] [k] osq_lock
+ 46.27% 0.13% tc [act_tunnel_key] [k] tunnel_key_init
+ 34.26% 0.02% tc [act_mirred] [k] tcf_mirred_init
+ 10.99% 0.00% tc [kernel.vmlinux] [k] dst_cache_init
+ 5.32% 5.11% tc [kernel.vmlinux] [k] memset_erms

With two times more actions pressure on percpu allocator doubles, so now
it takes ~74% of CPU execution time.

With percpu allocation removed (tunnel_key+mirred):

+ 86.02% 0.50% tc [kernel.vmlinux] [k] entry_SYSCALL_64
+ 85.51% 0.12% tc [kernel.vmlinux] [k] do_syscall_64
+ 84.40% 0.03% tc libc-2.29.so [.] __libc_sendmsg
+ 83.84% 0.03% tc [kernel.vmlinux] [k] __sys_sendmsg
+ 83.72% 0.01% tc [kernel.vmlinux] [k] ___sys_sendmsg
+ 83.56% 0.01% tc [kernel.vmlinux] [k] sock_sendmsg
+ 83.50% 0.08% tc [kernel.vmlinux] [k] netlink_sendmsg
+ 83.02% 0.17% tc [kernel.vmlinux] [k] netlink_unicast
+ 82.48% 0.00% tc [kernel.vmlinux] [k] netlink_rcv_skb
+ 81.89% 0.11% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg
+ 81.71% 0.25% tc [kernel.vmlinux] [k] tc_new_tfilter
+ 73.99% 0.63% tc [cls_flower] [k] fl_change
+ 69.72% 0.00% tc [kernel.vmlinux] [k] tcf_exts_validate
+ 69.72% 0.09% tc [kernel.vmlinux] [k] tcf_action_init
+ 69.53% 0.05% tc [kernel.vmlinux] [k] tcf_action_init_1
+ 53.08% 0.91% tc [kernel.vmlinux] [k] __mutex_lock.isra.0
+ 45.52% 43.99% tc [kernel.vmlinux] [k] osq_lock
- 36.02% 0.21% tc [act_tunnel_key] [k] tunnel_key_init
  - 35.81% tunnel_key_init
    + 15.95% tcf_idr_check_alloc
    + 13.91% tcf_idr_insert
    - 4.70% dst_cache_init
      + 4.68% pcpu_alloc
+ 33.22% 0.04% tc [kernel.vmlinux] [k] tcf_idr_check_alloc
+ 32.34% 0.05% tc [act_mirred] [k] tcf_mirred_init
+ 28.24% 0.01% tc [kernel.vmlinux] [k] tcf_idr_insert
+ 7.79% 0.05% tc [kernel.vmlinux] [k] idr_alloc_u32
+ 7.67% 7.35% tc [kernel.vmlinux] [k] idr_get_free
+ 6.46% 6.22% tc [kernel.vmlinux] [k] mutex_spin_on_owner
+ 5.11% 0.05% tc [kernel.vmlinux] [k] tfilter_notify

With percpu allocation removed insertion rate is increased by ~120%.
Such rule profile scales much better than simple single action because
both types of actions were competing for single lock in percpu
allocator, but not for action idr lock, which is per-action. Note that
percpu allocator is still used by dst_cache in tunnel_key actions and
consumes 4.68% CPU time. Dst_cache seems like good opportunity for
further insertion rate optimization but is not addressed by this change.

Another improvement provided by this change is significantly reduced
memory usage. The test is implemented by sampling "used memory" value
from "vmstat -s" command output. Following table includes memory usage
measurements for same two configurations that were used for measuring
insertion rate:

 Profile           | Mem per rule | Mem per rule no_percpu | Less memory used
                   |         (KB) |                   (KB) |             (KB)
-------------------+--------------+------------------------+------------------
 Gact drop         |         3.91 |                   2.51 |              1.4
 tunnel_key+mirred |         6.73 |                   3.91 |              2.8

Results indicate that memory usage of percpu allocator per action is
~1.4 KB. Note that any measurements of percpu allocator memory usage is
inherently tied to particular setup since memory usage is linear to
number of cores in system. It is to be expected that on current top of
the line servers percpu allocator memory usage will be 2-5x more than on
24 CPUs setup that was used for testing.

Setup details: 2x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 32GB memory

Patches applied on top of net-next branch:

commit 2203cbf2c8 (net-next) Author:
Russell King <rmk+kernel@armlinux.org.uk> Date: Tue Oct 15 11:38:39 2019
+0100

net: sfp: move fwnode parsing into sfp-bus layer

Changes V1 -> V2:

- Include memory measurements.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:51 -07:00
Vlad Buslov
9ae6b78708 tc-testing: implement tests for new fast_init action flag
Add basic tests to verify action creation with new fast_init flag for all
actions that support the flag.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:51 -07:00
Vlad Buslov
e382267860 net: sched: update action implementations to support flags
Extend struct tc_action with new "tcfa_flags" field. Set the field in
tcf_idr_create() function and provide new helper
tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags
value. Update individual hardware-offloaded actions init() to pass their
"flags" argument to new helper in order to skip percpu stats allocation
when user requested it through flags.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:51 -07:00
Vlad Buslov
abbb0d3363 net: sched: extend TCA_ACT space with TCA_ACT_FLAGS
Extend TCA_ACT space with nla_bitfield32 flags. Add
TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in
tcf_action_init_1() and pass resulting value as additional argument to
a_o->init().

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:50 -07:00
Vlad Buslov
5e174d5e73 net: sched: modify stats helper functions to support regular stats
Modify stats update helper functions introduced in previous patches in this
series to fallback to regular tc_action->tcfa_{b|q}stats if cpu stats are
not allocated for the action argument. If regular non-percpu allocated
counters are in use, then obtain action tcfa_lock while modifying them.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:50 -07:00
Vlad Buslov
ef816f3c49 net: sched: don't expose action qstats to skb_tc_reinsert()
Previous commit introduced helper function for updating qstats and
refactored set of actions to use the helpers, instead of modifying qstats
directly. However, one of the affected action exposes its qstats to
skb_tc_reinsert(), which then modifies it.

Refactor skb_tc_reinsert() to return integer error code and don't increment
overlimit qstats in case of error, and use the returned error code in
tcf_mirred_act() to manually increment the overlimit counter with new
helper function.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:50 -07:00
Vlad Buslov
26b537a88c net: sched: extract qstats update code into functions
Extract common code that increments cpu_qstats counters into standalone act
API functions. Change hardware offloaded actions that use percpu counter
allocation to use the new functions instead of accessing cpu_qstats
directly.

This commit doesn't change functionality.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:50 -07:00
Vlad Buslov
5e1ad95b63 net: sched: extract bstats update code into function
Extract common code that increments cpu_bstats counter into standalone act
API function. Change hardware offloaded actions that use percpu counter
allocation to use the new function instead of incrementing cpu_bstats
directly.

This commit doesn't change functionality.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:50 -07:00
Vlad Buslov
c8ecebd04c net: sched: extract common action counters update code into function
Currently, all implementations of tc_action_ops->stats_update() callback
have almost exactly the same implementation of counters update
code (besides gact which also updates drop counter). In order to simplify
support for using both percpu-allocated and regular action counters
depending on run-time flag in following patches, extract action counters
update code into standalone function in act API.

This commit doesn't change functionality.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 18:07:50 -07:00
Alexei Starovoitov
af91acbc62 bpf: Fix bpf jit kallsym access
Jiri reported crash when JIT is on, but net.core.bpf_jit_kallsyms is off.
bpf_prog_kallsyms_find() was skipping addr->bpf_prog resolution
logic in oops and stack traces. That's incorrect.
It should only skip addr->name resolution for 'cat /proc/kallsyms'.
That's what bpf_jit_kallsyms and bpf_jit_harden protect.

Fixes: 3dec541b2e ("bpf: Add support for BTF pointers to x86 JIT")
Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191030233019.1187404-1-ast@kernel.org
2019-10-31 02:02:29 +01:00
Christophe JAILLET
21d8bd123a net: qrtr: Simplify 'qrtr_tun_release()'
Use 'skb_queue_purge()' instead of re-implementing it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:58:23 -07:00
David S. Miller
dba7bf0348 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2019-10-29

This series contains updates to e1000e, igb, ixgbe and i40e drivers.

Sasha adds support for Intel client platforms Comet Lake and Tiger Lake
to the e1000e driver.  Also adds a fix for a compiler warning that was
recently introduced, when CONFIG_PM_SLEEP is not defined, so wrap the
code that requires this kernel configuration to be defined.

Alex fixes a potential race condition between network configuration and
power management for e1000e, which is similar to a past issue in the igb
driver.  Also provided a bit of code cleanup since the driver no longer
checks for __E1000_DOWN.

Josh Hunt adds UDP segmentation offload support for igb, ixgbe and i40e.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:51:25 -07:00
zhong jiang
84e93d999a wimax: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
It is more clear to use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs file
operation rather than DEFINE_SIMPLE_ATTRIBUTE.

It is detected with the help of coccinelle.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:43:25 -07:00
Heiner Kallweit
a2a1a13b81 net: dsa: add ethtool pause configuration support
This patch adds glue logic to make pause settings per port
configurable vie ethtool.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:42:51 -07:00
Guillaume Nault
1d7a55267f vxlan: drop "vxlan" parameter in vxlan_fdb_alloc()
This parameter has never been used.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:41:50 -07:00
Heiner Kallweit
a319fb52e4 net: phy: marvell: add downshift support for 88E1145
Add downshift support for 88E1145, it uses the same downshift
configuration registers as 88E1111.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:35:56 -07:00
Eric Dumazet
ee8d153d46 net: annotate lockless accesses to sk->sk_napi_id
We already annotated most accesses to sk->sk_napi_id

We missed sk_mark_napi_id() and sk_mark_napi_id_once()
which might be called without socket lock held in UDP stack.

KCSAN reported :
BUG: KCSAN: data-race in udpv6_queue_rcv_one_skb / udpv6_queue_rcv_one_skb

write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 0:
 sk_mark_napi_id include/net/busy_poll.h:125 [inline]
 __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
 udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
 udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
 udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
 __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
 udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
 ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
 ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
 dst_input include/net/dst.h:442 [inline]
 ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955
 napi_poll net/core/dev.c:6392 [inline]
 net_rx_action+0x3ae/0xa90 net/core/dev.c:6460

write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 1:
 sk_mark_napi_id include/net/busy_poll.h:125 [inline]
 __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
 udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
 udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
 udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
 __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
 udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
 ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
 ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
 dst_input include/net/dst.h:442 [inline]
 ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 10890 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: e68b6e50fa ("udp: enable busy polling for all sockets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:34:35 -07:00
David S. Miller
29f52875ba Merge branch 'ICMP-flow-improvements'
Matteo Croce says:

====================
ICMP flow improvements

This series improves the flow inspector handling of ICMP packets:
The first two patches just add some comments in the code which would have saved
me a few minutes of time, and refactor a piece of code.
The third one adds to the flow inspector the capability to extract the
Identifier field, if present, so echo requests and replies are classified
as part of the same flow.
The fourth patch uses the function introduced earlier to the bonding driver,
so echo replies can be balanced across bonding slaves.

v1 -> v2:
 - remove unused struct members
 - add an helper to check for the Id field
 - use a local flow_dissector_key in the bonding to avoid
   changing behaviour of the flow dissector
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:21:35 -07:00
Matteo Croce
58deb77cc5 bonding: balance ICMP echoes in layer3+4 mode
The bonding uses the L4 ports to balance flows between slaves. As the ICMP
protocol has no ports, those packets are sent all to the same device:

    # tcpdump -qltnni veth0 ip |sed 's/^/0: /' &
    # tcpdump -qltnni veth1 ip |sed 's/^/1: /' &
    # ping -qc1 192.168.0.2
    1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 315, seq 1, length 64
    1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 315, seq 1, length 64
    # ping -qc1 192.168.0.2
    1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 316, seq 1, length 64
    1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 316, seq 1, length 64
    # ping -qc1 192.168.0.2
    1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 317, seq 1, length 64
    1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 317, seq 1, length 64

But some ICMP packets have an Identifier field which is
used to match packets within sessions, let's use this value in the hash
function to balance these packets between bond slaves:

    # ping -qc1 192.168.0.2
    0: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 303, seq 1, length 64
    0: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 303, seq 1, length 64
    # ping -qc1 192.168.0.2
    1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 304, seq 1, length 64
    1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 304, seq 1, length 64

Aso, let's use a flow_dissector_key which defines FLOW_DISSECTOR_KEY_ICMP,
so we can balance pings encapsulated in a tunnel when using mode encap3+4:

    # ping -q 192.168.1.2 -c1
    0: IP 192.168.0.1 > 192.168.0.2: GREv0, length 102: IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 585, seq 1, length 64
    0: IP 192.168.0.2 > 192.168.0.1: GREv0, length 102: IP 192.168.1.2 > 192.168.1.1: ICMP echo reply, id 585, seq 1, length 64
    # ping -q 192.168.1.2 -c1
    1: IP 192.168.0.1 > 192.168.0.2: GREv0, length 102: IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 586, seq 1, length 64
    1: IP 192.168.0.2 > 192.168.0.1: GREv0, length 102: IP 192.168.1.2 > 192.168.1.1: ICMP echo reply, id 586, seq 1, length 64

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:21:35 -07:00
Matteo Croce
5dec597e5c flow_dissector: extract more ICMP information
The ICMP flow dissector currently parses only the Type and Code fields.
Some ICMP packets (echo, timestamp) have a 16 bit Identifier field which
is used to correlate packets.
Add such field in flow_dissector_key_icmp and replace skb_flow_get_be16()
with a more complex function which populate this field.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:21:35 -07:00
Matteo Croce
3b336d6f4e flow_dissector: skip the ICMP dissector for non ICMP packets
FLOW_DISSECTOR_KEY_ICMP is checked for every packet, not only ICMP ones.
Even if the test overhead is probably negligible, move the
ICMP dissector code under the big 'switch(ip_proto)' so it gets called
only for ICMP packets.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:21:35 -07:00
Matteo Croce
98298e6ca6 flow_dissector: add meaningful comments
Documents two piece of code which can't be understood at a glance.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 17:21:35 -07:00
Takashi Iwai
a393318673 ALSA: timer: Fix mutex deadlock at releasing card
When a card is disconnected while in use, the system waits until all
opened files are closed then releases the card.  This is done via
put_device() of the card device in each device release code.

The recently reported mutex deadlock bug happens in this code path;
snd_timer_close() for the timer device deals with the global
register_mutex and it calls put_device() there.  When this timer
device is the last one, the card gets freed and it eventually calls
snd_timer_free(), which has again the protection with the global
register_mutex -- boom.

Basically put_device() call itself is race-free, so a relative simple
workaround is to move this put_device() call out of the mutex.  For
achieving that, in this patch, snd_timer_close_locked() got a new
argument to store the card device pointer in return, and each caller
invokes put_device() with the returned object after the mutex unlock.

Reported-and-tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-10-30 22:54:56 +01:00
Jens Axboe
6873e0bd6a io_uring: ensure we clear io_kiocb->result before each issue
We use io_kiocb->result == -EAGAIN as a way to know if we need to
re-submit a polled request, as -EAGAIN reporting happens out-of-line
for IO submission failures. This field is cleared when we originally
allocate the request, but it isn't reset when we retry the submission
from async context. This can cause issues where we think something
needs a re-issue, but we're really just reading stale data.

Reset ->result whenever we re-prep a request for polled submission.

Cc: stable@vger.kernel.org
Fixes: 9e645e1105 ("io_uring: add support for sqe links")
Reported-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-30 14:45:22 -06:00
Sven Schnelle
3d252454ed parisc: fix frame pointer in ftrace_regs_caller()
The current code in ftrace_regs_caller() doesn't assign
%r3 to contain the address of the current frame. This
is hidden if the kernel is compiled with FRAME_POINTER,
but without it just crashes because it tries to dereference
an arbitrary address. Fix this by always setting %r3 to the
current stack frame.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
2019-10-30 21:24:40 +01:00
Eric Dumazet
7170a97774 net: annotate accesses to sk->sk_incoming_cpu
This socket field can be read and written by concurrent cpus.

Use READ_ONCE() and WRITE_ONCE() annotations to document this,
and avoid some compiler 'optimizations'.

KCSAN reported :

BUG: KCSAN: data-race in tcp_v4_rcv / tcp_v4_rcv

write to 0xffff88812220763c of 4 bytes by interrupt on cpu 0:
 sk_incoming_cpu_update include/net/sock.h:953 [inline]
 tcp_v4_rcv+0x1b3c/0x1bb0 net/ipv4/tcp_ipv4.c:1934
 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955
 napi_poll net/core/dev.c:6392 [inline]
 net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
 do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
 do_softirq kernel/softirq.c:329 [inline]
 __local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189

read to 0xffff88812220763c of 4 bytes by interrupt on cpu 1:
 sk_incoming_cpu_update include/net/sock.h:952 [inline]
 tcp_v4_rcv+0x181a/0x1bb0 net/ipv4/tcp_ipv4.c:1934
 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
 dst_input include/net/dst.h:442 [inline]
 ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
 NF_HOOK include/linux/netfilter.h:305 [inline]
 NF_HOOK include/linux/netfilter.h:299 [inline]
 ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
 process_backlog+0x1d3/0x420 net/core/dev.c:5955
 napi_poll net/core/dev.c:6392 [inline]
 net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
 __do_softirq+0x115/0x33f kernel/softirq.c:292
 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 13:24:25 -07:00
Roman Mashak
c4917bfc3a tc-testing: fixed two failing pedit tests
Two pedit tests were failing due to incorrect operation
value in matchPattern, should be 'add' not 'val', so fix it.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 12:38:08 -07:00
Jon Maloy
c0bceb97db tipc: add smart nagle feature
We introduce a feature that works like a combination of TCP_NAGLE and
TCP_CORK, but without some of the weaknesses of those. In particular,
we will not observe long delivery delays because of delayed acks, since
the algorithm itself decides if and when acks are to be sent from the
receiving peer.

- The nagle property as such is determined by manipulating a new
  'maxnagle' field in struct tipc_sock. If certain conditions are met,
  'maxnagle' will define max size of the messages which can be bundled.
  If it is set to zero no messages are ever bundled, implying that the
  nagle property is disabled.
- A socket with the nagle property enabled enters nagle mode when more
  than 4 messages have been sent out without receiving any data message
  from the peer.
- A socket leaves nagle mode whenever it receives a data message from
  the peer.

In nagle mode, messages smaller than 'maxnagle' are accumulated in the
socket write queue. The last buffer in the queue is marked with a new
'ack_required' bit, which forces the receiving peer to send a CONN_ACK
message back to the sender upon reception.

The accumulated contents of the write queue is transmitted when one of
the following events or conditions occur.

- A CONN_ACK message is received from the peer.
- A data message is received from the peer.
- A SOCK_WAKEUP pseudo message is received from the link level.
- The write queue contains more than 64 1k blocks of data.
- The connection is being shut down.
- There is no CONN_ACK message to expect. I.e., there is currently
  no outstanding message where the 'ack_required' bit was set. As a
  consequence, the first message added after we enter nagle mode
  is always sent directly with this bit set.

This new feature gives a 50-100% improvement of throughput for small
(i.e., less than MTU size) messages, while it might add up to one RTT
to latency time when the socket is in nagle mode.

Acked-by: Ying Xue <ying.xue@windreiver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 12:16:22 -07:00
David S. Miller
6c814e8c4e Merge branch 'mlxsw-Update-firmware-version'
Ido Schimmel says:

====================
mlxsw: Update firmware version

This patch set updates the firmware version for Spectrum-1 and enforces
a firmware version for Spectrum-2.

The version adds support for querying port module type. It will be used
by a followup patch set from Jiri to make port split code more generic.

Patch #1 increases the size of an existing register in order to be
compatible with the new firmware version. In the future the firmware
will assign default values to fields not specified by the driver.

Patch #2 temporarily increases the PCI reset timeout for SN3800 systems.
Note that in normal cases the driver will need to wait no longer than 5
seconds for the device to become ready following reset command.

Patch #3 bumps the firmware version for Spectrum-1.

Patch #4 enforces a minimum firmware version for Spectrum-2.

v2:
* Added patch #2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 12:07:05 -07:00
Ido Schimmel
a72afb6879 mlxsw: Enforce firmware version for Spectrum-2
In a similar fashion to Spectrum-1, enforce a specific firmware version
for Spectrum-2 so that the driver and firmware are always in sync with
regards to new features.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 12:07:05 -07:00
Ido Schimmel
5fd2ef4689 mlxsw: Bump firmware version to 13.2000.2308
The version adds support for querying port module type. It will be used
by a followup patch set from Jiri to make port split code more generic.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 12:07:05 -07:00
Ido Schimmel
ff298839b6 mlxsw: pci: Increase PCI reset timeout for SN3800 systems
SN3800 Spectrum-2 based systems have gearboxes that need to be
initialized by the firmware during its initialization flow. In certain
cases, the firmware might need to flash these gearboxes, which is
currently a time-consuming process.

In newer firmware versions, the firmware will not signal to the driver
that it is ready until the gearboxes are flashed. Increase the PCI reset
timeout for these situations. In normal cases, the driver will need to
wait no longer than 5 seconds.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 12:07:05 -07:00
Ido Schimmel
5075066a77 mlxsw: reg: Increase size of MPAR register
In new firmware versions this register is extended with a sampling rate
for Spectrum-2 and future ASICs.

Increase the size of the register to ensure the field is initialized to
0 which means every packet is mirrored.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 12:07:05 -07:00
Jiri Pirko
b7265a0df8 mlxsw: core: Unpublish devlink parameters during reload
The devlink parameter "acl_region_rehash_interval" is a runtime
parameter whose value is stored in a dynamically allocated memory. While
reloading the driver, this memory is freed and then allocated again. A
use-after-free might happen if during this time frame someone tries to
retrieve its value.

Since commit 070c63f20f ("net: devlink: allow to change namespaces
during reload") the use-after-free can be reliably triggered when
reloading the driver into a namespace, as after freeing the memory (via
reload_down() callback) all the parameters are notified.

Fix this by unpublishing and then re-publishing the parameters during
reload.

Fixes: 98bbf70c1c ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param")
Fixes: 7c62cfb8c5 ("devlink: publish params only after driver init is done")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 12:02:52 -07:00
Sudarsana Reddy Kalluru
c63b096894 qed: Optimize execution time for nvm attributes configuration.
Current implementation for nvm_attr configuration instructs the management
FW to load/unload the nvm-cfg image for each user-provided attribute in
the input file. This consumes lot of cycles even for few tens of
attributes.
This patch updates the implementation to perform load/commit of the config
for every 50 attributes. After loading the nvm-image, MFW expects that
config should be committed in a predefined timer value (5 sec), hence it's
not possible to write large number of attributes in a single load/commit
window. Hence performing the commits in chunks.

Fixes: 0dabbe1bb3 ("qed: Add driver API for flashing the config attributes.")
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 11:57:14 -07:00
Taehee Yoo
c6761cf521 vxlan: fix unexpected failure of vxlan_changelink()
After commit 0ce1822c2a ("vxlan: add adjacent link to limit depth
level"), vxlan_changelink() could fail because of
netdev_adjacent_change_prepare().
netdev_adjacent_change_prepare() returns -EEXIST when old lower device
and new lower device are same.
(old lower device is "dst->remote_dev" and new lower device is "lowerdev")
So, before calling it, lowerdev should be NULL if these devices are same.

Test command1:
    ip link add dummy0 type dummy
    ip link add vxlan0 type vxlan dev dummy0 dstport 4789 vni 1
    ip link set vxlan0 type vxlan ttl 5
    RTNETLINK answers: File exists

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 0ce1822c2a ("vxlan: add adjacent link to limit depth level")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 11:52:47 -07:00
Colin Ian King
dc99da4f31 qed: fix spelling mistake "queuess" -> "queues"
There is a spelling misake in a DP_NOTICE message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-30 11:39:05 -07:00
Trond Myklebust
669996add4 SUNRPC: Destroy the back channel when we destroy the host transport
When we're destroying the host transport mechanism, we should ensure
that we do not leak memory by failing to release any back channel
slots that might still exist.

Reported-by: Neil Brown <neilb@suse.de>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-30 12:04:35 -04:00
Trond Myklebust
9edb455e67 SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding
If there are RDMA back channel requests being processed by the
server threads, then we should hold a reference to the transport
to ensure it doesn't get freed from underneath us.

Reported-by: Neil Brown <neilb@suse.de>
Fixes: 63cae47005 ("xprtrdma: Handle incoming backward direction RPC calls")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-30 12:04:35 -04:00
Trond Myklebust
875f0706ac SUNRPC: The TCP back channel mustn't disappear while requests are outstanding
If there are TCP back channel requests being processed by the
server threads, then we should hold a reference to the transport
to ensure it doesn't get freed from underneath us.

Reported-by: Neil Brown <neilb@suse.de>
Fixes: 2ea24497a1 ("SUNRPC: RPC callbacks may be split across several..")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-30 12:04:35 -04:00
Nick Desaulniers
e8a170ff9a drm/amdgpu: enable -msse2 for GCC 7.1+ users
A final attempt at enabling sse2 for GCC users.

Orininally attempted in:
commit 1011745073 ("drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines")

Reverted due to "reported instability" in:
commit 193392ed9f ("Revert "drm/amd/display: add -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines"")

Re-added just for Clang in:
commit 0f0727d971 ("drm/amd/display: readd -msse2 to prevent Clang from emitting libcalls to undefined SW FP routines")

The original report didn't have enough information to know if the GPF
was due to misalignment, but I suspect that it was. (The missing
information was the disassembly of the function at the bottom of the
trace, to see if the instruction pointer pointed to an instruction with
16B alignment memory operand requirements.  The stack trace does show
the stack was only 8B but not 16B aligned though, which makes this a
strong possibility).

Now that the stack misalignment issue has been fixed for users of GCC
7.1+, reattempt adding -msse2. This matches Clang.

It will likely never be safe to enable this for pre-GCC 7.1 AND use a
16B aligned stack in these translation units.

This is only a functional change for GCC 7.1+ users, and should be boot
tested.

Link: https://bugs.freedesktop.org/show_bug.cgi?id=109487
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-30 11:56:20 -04:00
Nick Desaulniers
00db297106 drm/amdgpu: fix stack alignment ABI mismatch for GCC 7.1+
GCC earlier than 7.1 errors when compiling code that makes use of
`double`s and sets a stack alignment outside of the range of [2^4-2^12]:

$ cat foo.c
double foo(double x, double y) {
  return x + y;
}
$ gcc-4.9 -mpreferred-stack-boundary=3 foo.c
error: -mpreferred-stack-boundary=3 is not between 4 and 12

This is likely why the AMDGPU driver was ever compiled with a different
stack alignment (and thus different ABI) than the rest of the x86
kernel. The kernel uses 8B stack alignment, while the driver was using
16B stack alignment in a few places.

Since GCC 7.1+ doesn't error, fix the ABI mismatch for users of newer
versions of GCC.

There was discussion about whether to mark the driver broken or not for
users of GCC earlier than 7.1, but since the driver currently is
working, don't explicitly break the driver for them here.

Relying on differing stack alignment is unspecified behavior, and
brittle, and may break in the future.

This patch is no functional change for GCC users earlier than 7.1. It's
been compile tested on GCC 4.9 and 8.3 to check the correct flags. It
should be boot tested when built with GCC 7.1+.

-mincoming-stack-boundary= or -mstackrealign may help keep this code
building for pre-GCC 7.1 users.

The version check for GCC is broken into two conditionals, both because
cc-ifversion is currently GCC specific, and it simplifies a subsequent
patch.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-30 11:56:20 -04:00
Nick Desaulniers
c868868f6b drm/amdgpu: fix stack alignment ABI mismatch for Clang
The x86 kernel is compiled with an 8B stack alignment via
`-mpreferred-stack-boundary=3` for GCC since 3.6-rc1 via
commit d9b0cde91c ("x86-64, gcc: Use -mpreferred-stack-boundary=3 if supported")
or `-mstack-alignment=8` for Clang. Parts of the AMDGPU driver are
compiled with 16B stack alignment.

Generally, the stack alignment is part of the ABI. Linking together two
different translation units with differing stack alignment is dangerous,
particularly when the translation unit with the smaller stack alignment
makes calls into the translation unit with the larger stack alignment.
While 8B aligned stacks are sometimes also 16B aligned, they are not
always.

Multiple users have reported General Protection Faults (GPF) when using
the AMDGPU driver compiled with Clang. Clang is placing objects in stack
slots assuming the stack is 16B aligned, and selecting instructions that
require 16B aligned memory operands.

At runtime, syscall handlers with 8B aligned stack call into code that
assumes 16B stack alignment.  When the stack is a multiple of 8B but not
16B, these instructions result in a GPF.

Remove the code that added compatibility between the differing compiler
flags, as it will result in runtime GPFs when built with Clang. Cleanups
for GCC will be sent in later patches in the series.

Link: https://github.com/ClangBuiltLinux/linux/issues/735
Debugged-by: Yuxuan Shui <yshuiv7@gmail.com>
Reported-by: Shirish S <shirish.s@amd.com>
Reported-by: Yuxuan Shui <yshuiv7@gmail.com>
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-30 11:56:20 -04:00
Kyle Mahlkuch
722608433c drm/radeon: Fix EEH during kexec
During kexec some adapters hit an EEH since they are not properly
shut down in the radeon_pci_shutdown() function. Adding
radeon_suspend_kms() fixes this issue.
Enabled only on PPC because this patch causes issues on some other
boards.

Signed-off-by: Kyle Mahlkuch <kmahlkuc@linux.vnet.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-10-30 11:56:20 -04:00
Alex Deucher
30ef5c7eab drm/amdgpu/gmc10: properly set BANK_SELECT and FRAGMENT_SIZE
These were not aligned for optimal performance for GPUVM.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tianci Yin <tianci.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2019-10-30 11:56:20 -04:00