Even when vhost-net is in zero-copy transmit mode,
net core might still decide to copy the skb later
which is somewhat slower than a copy in user
context: data copy overhead is added to the cost of
page pin/unpin. The result is that enabling tx zero copy
option leads to higher CPU utilization for guest to guest
and guest to host traffic.
To fix this, suppress zero copy tx after a given number of
packets triggered late data copy. Re-enable periodically
to detect workload changes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zerocopy handling code is vhost-net specific.
Move it from vhost.c/vhost.h out to net.c
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used to disable zerocopy when error rate
is high.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Better document macros for DMA tracking. Add an
explicit one for DMA in progress instead of
relying on user supplying len != 1.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tun transmits a zero copy skb, it orphans the frags
which might need to allocate extra memory, in atomic context.
If that fails, notify ubufs callback before freeing the skb
as a hint that device should disable zerocopy mode.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Orphaning frags for zero copy skbs needs to allocate data in atomic
context so is has a chance to fail. If it does we currently discard
the skb which is safe, but we don't report anything to the caller,
so it can not recover by e.g. disabling zero copy.
Add an API to free skb reporting such errors: this is used
by tun in case orphaning frags fails.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even if skb is marked for zero copy, net core might still decide
to copy it later which is somewhat slower than a copy in user context:
besides copying the data we need to pin/unpin the pages.
Add a parameter reporting such cases through zero copy callback:
if this happens a lot, device can take this into account
and switch to copying in user context.
This patch updates all users but ignores the passed value for now:
it will be used by follow-up patches.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the email address of the powernow-k8 maintainer.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJQlFWVAAoJEKhOf7ml8uNsi7MQAI5FZNla5D7fXf6JuI4n53S6
2aLFzkf9q0ievYBBeyDphLnmIWS44yfJN+OKyhvyd/BF4tlvmrZVzUjMHAFWaLe6
Eza8wRgtc8igGgQtE06xpnmaYOfNUU4B408SU+SqSurIvV9s2s1F3t8+gBaCPymk
9UxYVaE1rAHxTLzoDryhQ3H2GbWle08e44ce5kLW1GEGbC0B6l6Dfsrag+lzPedd
5QNBcoq+Fv0kvzSUJFNDwMwGrg5F7KL21EsfRJtPLXPVtMRTO1OdL1enWLDOE+jw
K0lpw/GQQx7sqyPRHrH4hTZrPHfzeY2mRAl/Q3pro/nqEy4jhfQjrH3G0s8o/xgQ
w1sKf0rDmUSA2+FhcGEMWm/eMYOE1oOyZVdxn0B1e8gN5yOqXfyho0CIvz8YMyAy
DoCjURor92teddNF83THKTDlIK248x4eCKThDIXleYE+t5th6aQdHoeQgmXQ4Y+g
eTUjFnT0glxWpyhoeTHSvhd61xgYU3aRG99OX/Y5teIIHW5rUyXhd9O1hl53UoVt
SSq7V5LRxJTU+hnW/FX19B/1/XfDe5BW6Z9+yGtibGHq4sSCb1yJeWaLOVCg3u3D
TOdbtl14ohq+zvs9MhKyTOxh710I+e10CGQHqY61MK5n+seG6YZw98GAepLQqnMW
T4VZrJBNCrXd94+4OxcU
=jN09
-----END PGP SIGNATURE-----
Merge tag 'pm-for-3.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management update from Rafael J. Wysocki:
"Change the email address of the powernow-k8 maintainer."
* tag 'pm-for-3.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq / powernow-k8: Change maintainer's email address
Jeff Kirsher says:
====================
This series contains updates to igb, ixgbe and e1000.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull more scsi target fixes from Nicholas Bellinger:
"This series is a second round of target fixes for v3.7-rc4 that have
come into target-devel over the last days, and are important enough to
be applied ASAP.
All are being CC'ed to stable. The most important two are:
- target: Re-add explict zeroing of INQUIRY bounce buffer memory to
fix a regression for handling zero-length payloads, a bug that went
during v3.7-rc1, and hit >= v3.6.3 stable. (nab + paolo)
- iscsi-target: Fix a long-standing missed R2T wakeup race in TX
thread processing when using a single queue slot. (Roland)
Thanks to Roland & PureStorage team for helping to track down this
long standing race with iscsi-target single queue slot operation.
Also, the tcm_fc(FCoE) regression bug that was observed recently with
-rc2 code has also been resolved with the cancel_delayed_work() return
bugfix (commit c0158ca64d: "workqueue: cancel_delayed_work() should
return %false if work item is idle") now in -rc3. Thanks again to Yi
Zou, MDR, Robert Love @ Intel for helping to track this down."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
target: Fix incorrect usage of nested IRQ spinlocks in ABORT_TASK path
iscsi-target: Fix missed wakeup race in TX thread
target: Avoid integer overflow in se_dev_align_max_sectors()
target: Don't return success from module_init() if setup fails
target: Re-add explict zeroing of INQUIRY bounce buffer memory
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIVAwUAUJP/EhOxKuMESys7AQIVrQ/8CgzkMigNI9ryQ4zURibgnK0z9b0OU6mo
0dizoDr48JrK4+DmEmx2GjvV2C5ur6qu5dYf2NiEP2zHXDT0m1BihzVAoiqOfh9R
TlfsSYTdmUH7s0/1VdB0IwVeh46wFYxArroyedPLjftWBK/iR/h6qJ4R1OIpwZR8
b4f9hDwZtdWKrJSwgtJWeMDLw/phYxGDdcD77C9s1WN2OQT532N6Eoea+2kGKSNY
C51en4uBMWajbc2iVSMLQ6dgydcEGTM3gO6gawx3uEwXjl+HoycBbGiu0b/fvnmG
PAbMu2jBgh+uWScHkWQUa+KzYSUraQOOq1J89mfTDMZ3NZ4qUAChcPyfpz9ya0bl
nsPX8oElOY21Hlsx532OQvlNGQTw4rt9czK81svrjbBZm94gd10LJNte/8T5wUC0
i9BZ5ajV9bZZ3fCM+nuQhCjWITn3PSWI6v/a3bV781IkySBLuL9OtlVZrVCO/5Ew
uG0Z64d5PNc8cZthq6PHVGbWlN2bEyx6sIB+Wr6wRPSqF5jRaS+N6oL8rUZIomC6
DuVrZ6hxzsIHpFWYqlqGPal9ViEvnnNOPLGBPRTvXn4vdefwijHfiui2GMtQpL6D
w8qRzJksuzqAzJ1topvgMVIoJ1HkUu30F2X5cQnOeVYanjNduJ3oAEXCDFh2qLuS
CCje/Adr2cM=
=6uE0
-----END PGP SIGNATURE-----
Merge tag 'frv-fixes-20121102' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-frv
Pull FRV fixes from David Howells:
"A collection of small fixes for the FRV architecture."
* tag 'frv-fixes-20121102' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-frv:
frv: fix the broken preempt
frv: switch to saner kernel_execve() semantics
FRV: Fix the new-style kernel_thread() stuff
FRV: Fix the preemption handling
FRV: gcc-4.1.2 also inlines weak functions
FRV: Don't objcopy the GNU build_id note
FRV: Add missing linux/export.h #inclusions
This hashtable implementation is using hlist buckets to provide a simple
hashtable to prevent it from getting reimplemented all over the kernel.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
[ Merging this now, so that subsystems can start applying Sasha's
patches that use this - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just get %icc2 into the state we would have after local_irq_disable()
and physical IRQ having happened since then. Then we can simply
use preempt_schedule_irq() and be done with the whole mess.
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The kernel_thread() changes for FRV don't work, and FRV fails to boot,
starting with:
commit 02ce496f15
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Tue Sep 18 22:18:51 2012 -0400
Subject: frv: split ret_from_fork, simplify kernel_thread() a lot
The problem is that the userspace registers are completely cleared when a
kernel thread is created and all subsequent user threads are then copied from
that. Unfortunately, however, the TBR and PSR registers are restored from the
pt_regs and the values they should be set to are clobbered by the memset.
Instead, copy across the old user registers as normal, and then merely alter
GR8 and GR9 in it if we're going to execute a kernel thread.
Signed-off-by: David Howells <dhowells@redhat.com>
Fix the preemption handling in FRV code where the PREEMPT_ACTIVE value is
incorrectly loaded into the threadinfo flags rather than the threadinfo
preemption count.
Unfortunately, the code cannot be simply converted to use
preempt_schedule_irq() as is because FRV uses virtual interrupt disablement to
cut down on the cost of actually disabling interrupts and thus
local_irq_enable() doesn't actually enable interrupts.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Al Viro <viro@ZenIV.linux.org.uk>
gcc-4.1.2 inlines weak functions, which causes FRV to fail when the dummy
thread_info_cache_init() gets inlined into start_kernel().
Signed-off-by: David Howells <dhowells@redhat.com>
Don't let objcopy transfer the GNU build_id note into the loadable image as it
is located at address 0 and the image ends up >3G in size.
Signed-off-by: David Howells <dhowells@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQkvQbAAoJEI9vqH3mFV2slX0P/1trXh121qyahs82VY3t7iDN
z4hxO7DW+xum3NYvuJx5/egkkl3NiS1Wyq47RIEdIgI/Isr0tenvmG3f2czl3Qtj
omcNJtLKVo7WxODnl16S6VGDvyRA4OkxzVGD9eidD8Z4+jXLvYyL4WKPiO5SEzTi
jA54j4w+mk2mpFw/5w4IjmDFr7zBKI7ewL7aJ32spcacT+onzSnQzWyod/30cEen
krmMr7M+jBUfGweWYRsM4ur9vf2cNECXQSFMMssDsKd1kHKQmf8lS1qy2rC5PX15
0uK1s/AvzrbTl+PxRg4L/m/R//owdaT3w1mUXQd2LyZdgXq/qAvcPNg+DygIl+MJ
uL+Bb+CD5g7G4YGNTY15oShLUaxVGwZzKV0Xgw0jZYkKgCO8gEduRUlSP3IpVdbD
0kulkyMJoIvw93p8af/ey8YD3YfAaS/qCn4qwQ6UJG7G32ATDDsF093q+y3m6DiX
mPx6i2BY7Jmr2uSiac4eU6L7Myu+JthUwjhfIC9NCwY7nV0mlsq53fKJt8/5Tuu5
MwzorUFTfJkKXaVNhATkVA+E4uTbQhqxvFxzNe+8ojwSIeN758LR80zrRTe5fwQx
qJdyBytLZF8YKhIjoBELgqgfb7QofAXk6FPmQ+4jwgFngrgcx5hizVd7PEoJW2SW
/cJq2tpfr2/pD6BmyFoq
=X45P
-----END PGP SIGNATURE-----
Merge tag 'xtensa-next-20121101' of git://github.com/czankel/xtensa-linux
Pull Xtensa fixes from Chris Zankel:
"Some important bug fixes.
With the change to uapi, there was a bug introduced that results in an
empty syscall table (mult-inclusion bug). Switching to the generic
thread/execve allowed us to fix a bug we had in vfork()."
* tag 'xtensa-next-20121101' of git://github.com/czankel/xtensa-linux:
xtensa: switch to generic sys_execve()
xtensa: switch to generic kernel_execve()
xtensa: switch to generic kernel_thread()
xtensa: reset windowbase/windowstart when cloning the VM
xtensa: use physical addresses for bus addresses
xtensa: allow multi-inclusion for uapi/unistd.h
The following fixes build errors on sparc. Without any DT support,
of_match_ptr is NULL and the below is a no-op. However, if just
CONFIG_OF is defined then so is of_match_ptr.
All useful parts of the gpio-fan DT support rely on CONFIG_OF_GPIO
anyway, so of_match_table should too.
Signed-off-by: Jamie Lentin <jm@lentin.co.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
#if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE)
can be replaced by
#if IS_ENABLED(CONFIG_FOO)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
#if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE)
can be replaced by
#if IS_ENABLED(CONFIG_FOO)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a driver for the FEC(MX6) that offers time
stamping and a PTP haderware clock. Because FEC\ENET(MX6)
hardware frequency adjustment is complex, we have implemented
this in software by changing the multiplication factor of the
timecounter.
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set GRP1 BIT21 ENET_CLK_SEL:
Enet tx reference clk from internal clock from anatop
(loopback through pad), this clock also sent out to external PHY
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ENET 1588 clock input pin
MX6Q_PAD_GPIO_16__ENET_ANATOP_ETHERNET_REF_OUT
and anatop PLL8 clock source for ENET
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A new file fec_ptp.c will use fec_enet_private to support 1588 PTP
move such structure to common header file fec.h
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch hooks into the CPTS code and adds support for the HWTSTAMP
ioctl. The patch includes code for the CPSW version found in the dm814x
even though the background device tree support for this board is still
missing.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a way to configure the CPTS input clock scaling factors
via the device tree.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because time stamping on both external ports of the switch simultaneously
is positively useless from the application's point of view, this patch
provides a DT configuration method to choose the active port.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a driver for the CPTS that offers time
stamping and a PTP hardware clock. Because some of the
CPTS hardware variants (like the am335x) do not support
frequency adjustment, we have implemented this in software
by changing the multiplication factor of the timecounter.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the cpsw driver to operate correctly with both the
dm814x and the am335x versions of the switch hardware.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch lets the CPSW driver remember the version number in order to
support the two different variants already in the wild.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code mixes up the CPSW_SS and the CPSW_WR register naming. This patch
changes the names to conform to the published Technical Reference Manual
from TI, in order to make working on the code less confusing.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding multicast address to ALE table via netdev ops to subscribe, transmit
or receive multicast frames to and from the network
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix double sizeof when parsing IPv6 address from
user space because it breaks get/del by specific IPv6 address.
Problem noticed by David Binderman:
https://bugzilla.kernel.org/show_bug.cgi?id=49171
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reading TCP stats when using TCP Illinois congestion control algorithm
can cause a divide by zero kernel oops.
The division by zero occur in tcp_illinois_info() at:
do_div(t, ca->cnt_rtt);
where ca->cnt_rtt can become zero (when rtt_reset is called)
Steps to Reproduce:
1. Register tcp_illinois:
# sysctl -w net.ipv4.tcp_congestion_control=illinois
2. Monitor internal TCP information via command "ss -i"
# watch -d ss -i
3. Establish new TCP conn to machine
Either it fails at the initial conn, or else it needs to wait
for a loss or a reset.
This is only related to reading stats. The function avg_delay() also
performs the same divide, but is guarded with a (ca->cnt_rtt > 0) at its
calling point in update_params(). Thus, simply fix tcp_illinois_info().
Function tcp_illinois_info() / get_info() is called without
socket lock. Thus, eliminate any race condition on ca->cnt_rtt
by using a local stack variable. Simply reuse info.tcpv_rttcnt,
as its already set to ca->cnt_rtt.
Function avg_delay() is not affected by this race condition, as
its called with the socket lock.
Cc: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct spelling typo in net/sctp/socket.c
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix off-by-one error because IFNAMSIZ == 16 and when this
code gets executed we stick a NULL byte where we should not.
How to reproduce:
with CONFIG_CC_STACKPROTECTOR=y (otherwise it may pass by silently)
modprobe bonding; echo 1 > /sys/class/net/bond0/bonding/mode;
echo "AAAAAAAAAAAAAAAA" > /sys/class/net/bond0/bonding/active_slave;
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Note: Sorry for the second patch but I missed this one while checking
the file. You can squash them into one patch.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix off-by-one error because IFNAMSIZ == 16 and when this
code gets executed we stick a NULL byte where we should not.
How to reproduce:
with CONFIG_CC_STACKPROTECTOR=y (otherwise it may pass by silently)
modprobe bonding; echo 1 > /sys/class/net/bond0/bonding/mode;
echo "AAAAAAAAAAAAAAAA" > /sys/class/net/bond0/bonding/primary;
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If no pinctrl available just report a warning as some architecture may not
need to do anything.
Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
[nicolas.ferre@atmel.com: adapt the error path, remove unneeded headers]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the ethernet frame payload word-aligned, possibly making the
memcpy into the skb a bit faster. This will be even more important
after we eliminate the copy altogether.
Also eliminate the redundant RX_OFFSET constant -- it has the same
definition and purpose as NET_IP_ALIGN.
Signed-off-by: Havard Skinnemoen <havard@skinnemoen.net>
[nicolas.ferre@atmel.com: adapt to newer kernel]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>