Now all ctrl chunks are counted for asoc stats.octrlchunks and net
SCTP_MIB_OUTCTRLCHUNKS either after queuing up or bundling, other
than the chunk maked and bundled in sctp_packet_bundle_sack, which
caused 'outctrlchunks' not consistent with 'inctrlchunks' in peer.
This issue exists since very beginning, here to fix it by increasing
both net SCTP_MIB_OUTCTRLCHUNKS and asoc stats.octrlchunks when sack
chunk is maked and bundled in sctp_packet_bundle_sack.
Reported-by: Ja Ram Jeon <jajeon@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
r8153b_rx_agg_chg_indicate() needs to be called after enabling TX/RX and
before calling rxdy_gated_en(tp, false). Otherwise, the change of the
settings of RX aggregation wouldn't work.
Besides, adjust rtl8152_set_coalesce() for the same reason. If
rx_coalesce_usecs is changed, restart TX/RX to let the setting work.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If IPV6 was disabled, then ss command would cause a kernel warning
because the command was attempting to dump IPV6 socket information.
The fix is to just remove the warning.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202249
Fixes: 432490f9d4 ("net: ip, diag -- Add diag interface for raw sockets")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The psock_tpacket test will need to access /proc/kallsyms, this would
require the kernel config CONFIG_KALLSYMS to be enabled first.
Apart from adding CONFIG_KALLSYMS to the net/config file here, check the
file existence to determine if we can run this test will be helpful to
avoid a false-positive test result when testing it directly with the
following commad against a kernel that have CONFIG_KALLSYMS disabled:
make -C tools/testing/selftests TARGETS=net run_tests
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the rxrpc_eproto tracepoint is enabled, an oops will be cause by the
trace line that rxrpc_extract_header() tries to emit when a protocol error
occurs (typically because the packet is short) because the call argument is
NULL.
Fix this by using ?: to assume 0 as the debug_id if call is NULL.
This can then be induced by:
echo -e '\0\0\0\0\0\0\0\0' | ncat -4u --send-only <addr> 20001
where addr has the following program running on it:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <linux/rxrpc.h>
int main(void)
{
struct sockaddr_rxrpc srx;
int fd;
memset(&srx, 0, sizeof(srx));
srx.srx_family = AF_RXRPC;
srx.srx_service = 0;
srx.transport_type = AF_INET;
srx.transport_len = sizeof(srx.transport.sin);
srx.transport.sin.sin_family = AF_INET;
srx.transport.sin.sin_port = htons(0x4e21);
fd = socket(AF_RXRPC, SOCK_DGRAM, AF_INET6);
bind(fd, (struct sockaddr *)&srx, sizeof(srx));
sleep(20);
return 0;
}
It results in the following oops.
BUG: kernel NULL pointer dereference, address: 0000000000000340
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
...
RIP: 0010:trace_event_raw_event_rxrpc_rx_eproto+0x47/0xac
...
Call Trace:
<IRQ>
rxrpc_extract_header+0x86/0x171
? rcu_read_lock_sched_held+0x5d/0x63
? rxrpc_new_skb+0xd4/0x109
rxrpc_input_packet+0xef/0x14fc
? rxrpc_input_data+0x986/0x986
udp_queue_rcv_one_skb+0xbf/0x3d0
udp_unicast_rcv_skb.isra.8+0x64/0x71
ip_protocol_deliver_rcu+0xe4/0x1b4
ip_local_deliver+0xf0/0x154
__netif_receive_skb_one_core+0x50/0x6c
netif_receive_skb_internal+0x26b/0x2e9
napi_gro_receive+0xf8/0x1da
rtl8169_poll+0x303/0x4c4
net_rx_action+0x10e/0x333
__do_softirq+0x1a5/0x38f
irq_exit+0x54/0xc4
do_IRQ+0xda/0xf8
common_interrupt+0xf/0xf
</IRQ>
...
? cpuidle_enter_state+0x23c/0x34d
cpuidle_enter+0x2a/0x36
do_idle+0x163/0x1ea
cpu_startup_entry+0x1d/0x1f
start_secondary+0x157/0x172
secondary_startup_64+0xa4/0xb0
Fixes: a25e21f0bc ("rxrpc, afs: Use debug_ids rather than pointers in traces")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Steinmetz says:
====================
macsec: fix some bugs in the receive path
This series fixes some bugs in the receive path of macsec. The first
is a use after free when processing macsec frames with a SecTAG that
has the TCI E bit set but the C bit clear. In the 2nd bug, the driver
leaves an invalid checksumming state after decrypting the packet.
This is a combined effort of Sabrina Dubroca <sd@queasysnail.net> and me.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix use-after-free of skb when rx_handler returns RX_HANDLER_PASS.
Signed-off-by: Andreas Steinmetz <ast@domdv.de>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If sendmsg() or sendmmsg() is called on a connected socket that hasn't had
bind() called on it, then an oops will occur when the kernel tries to
connect the call because no local endpoint has been allocated.
Fix this by implicitly binding the socket if it is in the
RXRPC_CLIENT_UNBOUND state, just like it does for the RXRPC_UNBOUND state.
Further, the state should be transitioned to RXRPC_CLIENT_BOUND after this
to prevent further attempts to bind it.
This can be tested with:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <linux/rxrpc.h>
static const unsigned char inet6_addr[16] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0xac, 0x14, 0x14, 0xaa
};
int main(void)
{
struct sockaddr_rxrpc srx;
struct cmsghdr *cm;
struct msghdr msg;
unsigned char control[16];
int fd;
memset(&srx, 0, sizeof(srx));
srx.srx_family = 0x21;
srx.srx_service = 0;
srx.transport_type = AF_INET;
srx.transport_len = 0x1c;
srx.transport.sin6.sin6_family = AF_INET6;
srx.transport.sin6.sin6_port = htons(0x4e22);
srx.transport.sin6.sin6_flowinfo = htons(0x4e22);
srx.transport.sin6.sin6_scope_id = htons(0xaa3b);
memcpy(&srx.transport.sin6.sin6_addr, inet6_addr, 16);
cm = (struct cmsghdr *)control;
cm->cmsg_len = CMSG_LEN(sizeof(unsigned long));
cm->cmsg_level = SOL_RXRPC;
cm->cmsg_type = RXRPC_USER_CALL_ID;
*(unsigned long *)CMSG_DATA(cm) = 0;
msg.msg_name = NULL;
msg.msg_namelen = 0;
msg.msg_iov = NULL;
msg.msg_iovlen = 0;
msg.msg_control = control;
msg.msg_controllen = cm->cmsg_len;
msg.msg_flags = 0;
fd = socket(AF_RXRPC, SOCK_DGRAM, AF_INET);
connect(fd, (struct sockaddr *)&srx, sizeof(srx));
sendmsg(fd, &msg, 0);
return 0;
}
Leading to the following oops:
BUG: kernel NULL pointer dereference, address: 0000000000000018
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
...
RIP: 0010:rxrpc_connect_call+0x42/0xa01
...
Call Trace:
? mark_held_locks+0x47/0x59
? __local_bh_enable_ip+0xb6/0xba
rxrpc_new_client_call+0x3b1/0x762
? rxrpc_do_sendmsg+0x3c0/0x92e
rxrpc_do_sendmsg+0x3c0/0x92e
rxrpc_sendmsg+0x16b/0x1b5
sock_sendmsg+0x2d/0x39
___sys_sendmsg+0x1a4/0x22a
? release_sock+0x19/0x9e
? reacquire_held_locks+0x136/0x160
? release_sock+0x19/0x9e
? find_held_lock+0x2b/0x6e
? __lock_acquire+0x268/0xf73
? rxrpc_connect+0xdd/0xe4
? __local_bh_enable_ip+0xb6/0xba
__sys_sendmsg+0x5e/0x94
do_syscall_64+0x7d/0x1bf
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 2341e07757 ("rxrpc: Simplify connect() implementation and simplify sendmsg() op")
Reported-by: syzbot+7966f2a0b2c7da8939b4@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov says:
====================
net: bridge: fix possible stale skb pointers
In the bridge driver we have a couple of places which call pskb_may_pull
but we've cached skb pointers before that and use them after which can
lead to out-of-bounds/stale pointer use. I've had these in my "to fix"
list for some time and now we got a report (patch 01) so here they are.
Patches 02-04 are fixes based on code inspection. Also patch 01 was
tested by Martin Weinelt, Martin if you don't mind please add your
tested-by tag to it by replying with Tested-by: name <email>.
I've also briefly tested the set by trying to exercise those code paths.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't cache eth dest pointer before calling pskb_may_pull.
Fixes: cf0f02d04a ("[BRIDGE]: use llc for receiving STP packets")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We would cache ether dst pointer on input in br_handle_frame_finish but
after the neigh suppress code that could lead to a stale pointer since
both ipv4 and ipv6 suppress code do pskb_may_pull. This means we have to
always reload it after the suppress code so there's no point in having
it cached just retrieve it directly.
Fixes: 057658cb33 ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
Fixes: ed842faeb2 ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We get a pointer to the ipv6 hdr in br_ip6_multicast_query but we may
call pskb_may_pull afterwards and end up using a stale pointer.
So use the header directly, it's just 1 place where it's needed.
Fixes: 08b202b672 ("bridge br_multicast: IPv6 MLD support.")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Martin Weinelt <martin@linuxlounge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Rename r8153b_queue_wake() to r8153_queue_wake().
2. Correct the setting. The enable bit should be 0xd38c bit 0. Besides,
the 0xd38a bit 0 and 0xd398 bit 8 have to be cleared for both enabled
and disabled.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 86029d10af ("tls: zero the crypto information from tls_context
before freeing") added memzero_explicit() calls to clear the key material
before freeing struct tls_context, but it missed tls_device.c has its
own way of freeing this structure. Replace the missing free.
Fixes: 86029d10af ("tls: zero the crypto information from tls_context before freeing")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neither drivers nor the tls offload code currently supports TLS
version 1.3. Check the TLS version when installing connection
state. TLS 1.3 will just fallback to the kernel crypto for now.
Fixes: 130b392c6c ("net: tls: Add tls 1.3 support")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang says:
====================
idr: fix overflow cases on 32-bit CPU
idr_get_next_ul() is problematic by design, it can't handle
the following overflow case well on 32-bit CPU:
u32 id = UINT_MAX;
idr_alloc_u32(&id);
while (idr_get_next_ul(&id) != NULL)
id++;
when 'id' overflows and becomes 0 after UINT_MAX, the loop
goes infinite.
Fix this by eliminating external users of idr_get_next_ul()
and migrating them to idr_for_each_entry_continue_ul(). And
add an additional parameter for these iteration macros to detect
overflow properly.
Please merge this through networking tree, as all the users
are in networking subsystem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly, other callers of idr_get_next_ul() suffer the same
overflow bug as they don't handle it properly either.
Introduce idr_for_each_entry_continue_ul() to help these callers
iterate from a given ID.
cls_flower needs more care here because it still has overflow when
does arg->cookie++, we have to fold its nested loops into one
and remove the arg->cookie++.
Fixes: 01683a1469 ("net: sched: refactor flower walk to iterate over idr")
Fixes: 12d6066c3b ("net/mlx5: Add flow counters idr")
Reported-by: Li Shuang <shuali@redhat.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Chris Mi <chrism@mellanox.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
idr_for_each_entry_ul() is buggy as it can't handle overflow
case correctly. When we have an ID == UINT_MAX, it becomes an
infinite loop. This happens when running on 32-bit CPU where
unsigned long has the same size with unsigned int.
There is no better way to fix this than casting it to a larger
integer, but we can't just 64 bit integer on 32 bit CPU. Instead
we could just use an additional integer to help us to detect this
overflow case, that is, adding a new parameter to this macro.
Fortunately tc action is its only user right now.
Fixes: 65a206c01e ("net/sched: Change act_api and act_xxx modules to use IDR")
Reported-by: Li Shuang <shuali@redhat.com>
Tested-by: Davide Caratti <dcaratti@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Chris Mi <chrism@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella says:
====================
vsock/virtio: several fixes in the .probe() and .remove()
During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
before registering the driver", Stefan pointed out some possible issues
in the .probe() and .remove() callbacks of the virtio-vsock driver.
This series tries to solve these issues:
- Patch 1 adds RCU critical sections to avoid use-after-free of
'the_virtio_vsock' pointer.
- Patch 2 stops workers before to call vdev->config->reset(vdev) to
be sure that no one is accessing the device.
- Patch 3 moves the works flush at the end of the .remove() to avoid
use-after-free of 'vsock' object.
v2:
- Patch 1: use RCU to protect 'the_virtio_vsock' pointer
- Patch 2: no changes
- Patch 3: flush works only at the end of .remove()
- Removed patch 4 because virtqueue_detach_unused_buf() returns all the buffers
allocated.
v1: https://patchwork.kernel.org/cover/10964733/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the flush of works after vdev->config->del_vqs(vdev),
because we need to be sure that no workers run before to free the
'vsock' object.
Since we stopped the workers using the [tx|rx|event]_run flags,
we are sure no one is accessing the device while we are calling
vdev->config->reset(vdev), so we can safely move the workers' flush.
Before the vdev->config->del_vqs(vdev), workers can be scheduled
by VQ callbacks, so we must flush them after del_vqs(), to avoid
use-after-free of 'vsock' object.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before to call vdev->config->reset(vdev) we need to be sure that
no one is accessing the device, for this reason, we add new variables
in the struct virtio_vsock to stop the workers during the .remove().
This patch also add few comments before vdev->config->reset(vdev)
and vdev->config->del_vqs(vdev).
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some callbacks used by the upper layers can run while we are in the
.remove(). A potential use-after-free can happen, because we free
the_virtio_vsock without knowing if the callbacks are over or not.
To solve this issue we move the assignment of the_virtio_vsock at the
end of .probe(), when we finished all the initialization, and at the
beginning of .remove(), before to release resources.
For the same reason, we do the same also for the vdev->priv.
We use RCU to be sure that all callbacks that use the_virtio_vsock
ended before freeing it. This is not required for callbacks that
use vdev->priv, because after the vdev->config->del_vqs() we are sure
that they are ended and will no longer be invoked.
We also take the mutex during the .remove() to avoid that .probe() can
run while we are resetting the device.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It allocates the extended area for outbound streams only on sendmsg
calls, if they are not yet allocated. When using the priority
stream scheduler, this initialization may imply into a subsequent
allocation, which may fail. In this case, it was aborting the stream
scheduler initialization but leaving the ->ext pointer (allocated) in
there, thus in a partially initialized state. On a subsequent call to
sendmsg, it would notice the ->ext pointer in there, and trip on
uninitialized stuff when trying to schedule the data chunk.
The fix is undo the ->ext initialization if the stream scheduler
initialization fails and avoid the partially initialized state.
Although syzkaller bisected this to commit 4ff40b8626 ("sctp: set
chunk transport correctly when it's a new asoc"), this bug was actually
introduced on the commit I marked below.
Reported-by: syzbot+c1a380d42b190ad1e559@syzkaller.appspotmail.com
Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the skb is associated with a new sock, just assigning
it to skb->sk is not sufficient, we have to set its destructor
to free the sock properly too.
Reported-by: syzbot+d6636a36d3c34bd88938@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid the situation where an IPV6 only flag is applied to an IPv4 address:
# ip addr add 192.0.2.1/24 dev dummy0 nodad home mngtmpaddr noprefixroute
# ip -4 addr show dev dummy0
2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
inet 192.0.2.1/24 scope global noprefixroute dummy0
valid_lft forever preferred_lft forever
Or worse, by sending a malicious netlink command:
# ip -4 addr show dev dummy0
2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
inet 192.0.2.1/24 scope global nodad optimistic dadfailed home tentative mngtmpaddr noprefixroute stable-privacy dummy0
valid_lft forever preferred_lft forever
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix GUE_PFLAG_REMCSUM to use "U" cast to avoid shifting signed
32-bit value by 31 bits problem.
Signed-off-by: Vandana BN <bnvandana@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix DST_FEATURE_ECN_CA to use "U" cast to avoid shifting signed
32-bit value by 31 bits problem.
Signed-off-by: Vandana BN <bnvandana@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
default_ttl should be integer instead of bool
Reported-by: Ying Xu <yinxu@redhat.com>
Fixes: a59166e470 ("mpls: allow TTL propagation from IP packets to be configured")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Skbs may have their checksum value populated by HW. If this is a checksum
calculated over the entire packet then the CHECKSUM_COMPLETE field is
marked. Changes to the data pointer on the skb throughout the network
stack still try to maintain this complete csum value if it is required
through functions such as skb_postpush_rcsum.
The MPLS actions in Open vSwitch modify a CHECKSUM_COMPLETE value when
changes are made to packet data without a push or a pull. This occurs when
the ethertype of the MAC header is changed or when MPLS lse fields are
modified.
The modification is carried out using the csum_partial function to get the
csum of a buffer and add it into the larger checksum. The buffer is an
inversion of the data to be removed followed by the new data. Because the
csum is calculated over 16 bits and these values align with 16 bits, the
effect is the removal of the old value from the CHECKSUM_COMPLETE and
addition of the new value.
However, the csum fed into the function and the outcome of the
calculation are also inverted. This would only make sense if it was the
new value rather than the old that was inverted in the input buffer.
Fix the issue by removing the bit inverts in the csum_partial calculation.
The bug was verified and the fix tested by comparing the folded value of
the updated CHECKSUM_COMPLETE value with the folded value of a full
software checksum calculation (reset skb->csum to 0 and run
skb_checksum_complete(skb)). Prior to the fix the outcomes differed but
after they produce the same result.
Fixes: 25cd9ba0ab ("openvswitch: Add basic MPLS support to kernel")
Fixes: bc7cc5999f ("openvswitch: update checksum in {push,pop}_mpls")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Bug fixes.
Miscellaneous bug fix patches, including two resource handling fixes for
the RDMA driver, a PCI shutdown patch to add pci_disable_device(), a patch
to fix ethtool selftest crash, and the last one suppresses an unnecessry
error message.
Please also queue patches 1, 2, and 3 for -stable. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some firmware versions do not support this so use the silent variant
to send the message to firmware to suppress the harmless error. This
error message is unnecessarily alarming the user.
Fixes: afdc8a8484 ("bnxt_en: Add DCBNL DSCP application protocol support.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In an earlier commit to improve NQ reservations on 57500 chips, we
set the resv_irqs on the 57500 VFs to the fixed value assigned by
the PF regardless of how many are actually used. The current
code assumes that resv_irqs minus the ones used by the network driver
must be the ones for the RDMA driver. This is no longer true and
we may return more MSIX vectors than requested, causing inconsistency.
Fix it by capping the value.
Fixes: 01989c6b69 ("bnxt_en: Improve NQ reservations.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current logic assumes that the RDMA driver uses one statistics
context adjacent to the ones used by the network driver. This
assumption is not true and the statistics context used by the
RDMA driver is tied to its MSIX base vector. This wrong assumption
can cause RDMA driver failure after changing ethtool rings on the
network side. Fix the statistics reservation logic accordingly.
Fixes: 780baad44f ("bnxt_en: Reserve 1 stat_ctx for RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After ethtool loopback packet tests, we re-open the nic for the next
IRQ test. If the open fails, we must not proceed with the IRQ test
or we will crash with NULL pointer dereference. Fix it by checking
the bnxt_open_nic() return code before proceeding.
Reported-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Fixes: 67fea463fd ("bnxt_en: Add interrupt test to ethtool -t selftest.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some chips with older firmware can continue to perform DMA read from
context memory even after the memory has been freed. In the PCI shutdown
method, we need to call pci_disable_device() to shutdown DMA to prevent
this DMA before we put the device into D3hot. DMA memory request in
D3hot state will generate PCI fatal error. Similarly, in the driver
remove method, the context memory should only be freed after DMA has
been shutdown for correctness.
Fixes: 98f04cf0f1 ("bnxt_en: Check context memory requirements from firmware.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a 1ms delay after reset deactivation. Otherwise the chip returns
bogus ID value. This is observed with 88E6390 (Peridot) chip.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently bnx2x ptp worker tries to read a register with timestamp
information in case of TX packet timestamping and in case it fails,
the routine reschedules itself indefinitely. This was reported as a
kworker always at 100% of CPU usage, which was narrowed down to be
bnx2x ptp_task.
By following the ioctl handler, we could narrow down the problem to
an NTP tool (chrony) requesting HW timestamping from bnx2x NIC with
RX filter zeroed; this isn't reproducible for example with ptp4l
(from linuxptp) since this tool requests a supported RX filter.
It seems NIC FW timestamp mechanism cannot work well with
RX_FILTER_NONE - driver's PTP filter init routine skips a register
write to the adapter if there's not a supported filter request.
This patch addresses the problem of bnx2x ptp thread's everlasting
reschedule by retrying the register read 10 times; between the read
attempts the thread sleeps for an increasing amount of time starting
in 1ms to give FW some time to perform the timestamping. If it still
fails after all retries, we bail out in order to prevent an unbound
resource consumption from bnx2x.
The patch also adds an ethtool statistic for accounting the skipped
TX timestamp packets and it reduces the priority of timestamping
error messages to prevent log flooding. The code was tested using
both linuxptp and chrony.
Reported-and-tested-by: Przemyslaw Hausman <przemyslaw.hausman@canonical.com>
Suggested-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harini Katakam says:
====================
Sub ns increment fixes in Macb PTP
The subns increment register fields are not captured correctly in the
driver. Fix the same and also increase the subns incr resolution.
Sub ns resolution was increased to 24 bits in r1p06f2 version. To my
knowledge, this PTP driver, with its current BD time stamp
implementation, is only useful to that version or above. So, I have
increased the resolution unconditionally. Please let me know if there
is any IP versions incompatible with this - there is no register to
obtain this information from.
Changes from RFC:
None
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The subns increment register has 24 bits as follows:
RegBit[15:0] = Subns[23:8]; RegBit[31:24] = Subns[7:0]
Fix the same in the driver and increase sub ns resolution to the
best capable, 24 bits. This should be the case on all GEM versions
that this PTP driver supports.
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The scaled ppm parameter passed to _adjfine() contains a 16 bit
fraction. This just happens to be the same as SUBNSINCR_SIZE now.
Hence define this separately.
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shifting signed 32-bit value by 31 bits is undefined. Changing most
significant bit to unsigned.
Changes included in v2:
- use subsystem specific subject lines
- CC required mailing lists
Signed-off-by: Jiunn Chang <c0d1n61at3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter did not expect that skb_dst_force() can cause skb to lose its
dst entry.
I got a bug report with a skb->dst NULL dereference in netfilter
output path. The backtrace contains nf_reinject(), so the dst might have
been cleared when skb got queued to userspace.
Other users were fixed via
if (skb_dst(skb)) {
skb_dst_force(skb);
if (!skb_dst(skb))
goto handle_err;
}
But I think its preferable to make the 'dst might be cleared' part
of the function explicit.
In netfilter case, skb with a null dst is expected when queueing in
prerouting hook, so drop skb for the other hooks.
v2:
v1 of this patch returned true in case skb had no dst entry.
Eric said:
Say if we have two skb_dst_force() calls for some reason
on the same skb, only the first one will return false.
This now returns false even when skb had no dst, as per Erics
suggestion, so callers might need to check skb_dst() first before
skb_dst_force().
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now when sctp_connect() is called with a wrong sa_family, it binds
to a port but doesn't set bp->port, then sctp_get_af_specific will
return NULL and sctp_connect() returns -EINVAL.
Then if sctp_bind() is called to bind to another port, the last
port it has bound will leak due to bp->port is NULL by then.
sctp_connect() doesn't need to bind ports, as later __sctp_connect
will do it if bp->port is NULL. So remove it from sctp_connect().
While at it, remove the unnecessary sockaddr.sa_family len check
as it's already done in sctp_inet_connect.
Fixes: 644fbdeacf ("sctp: fix the issue that flags are ignored when using kernel_connect")
Reported-by: syzbot+079bf326b38072f849d9@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Header Parser allows identifying various fields in the packet
headers, used for various kind of filtering and classification
steps.
This is a re-entrant process, where the offset in the packet header
depends on the previous lookup results. This offset is represented in
the SRAM results of the TCAM, as a shift to be operated.
This shift can be negative in some cases, such as in IPv6 parsing.
This commit prevents overriding the sign bit when setting the shift
value, which could cause instabilities when parsing IPv6 flows.
Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Suggested-by: Alan Winkowski <walan@marvell.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>