[ Upstream commit e6487833182a8a0187f0292aca542fc163ccd03e ]
This device shares the PCI ID with the Samsung 970 Evo Plus that
does not need or want the quirks. Move the the quirk entry to the
core table based on the model number instead.
Fixes: bc360b0b1611 ("nvme-pci: add quirks for Samsung X5 SSDs")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5a6254d55e2a9f7919ead8580d7aa0c7a382b26a ]
This particular Kioxia device times out and aborts I/O during any load,
but it's more easily observable with discards (fstrim).
The device gets to a state that is also not possible to use
"nvme set-feature" to disable APST.
Booting with nvme_core.default_ps_max_latency=0 solves the issue.
We had a dozen or so of these devices behaving this same way in
customer environments.
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af7fae857ea22e9c2aef812e1321d9c5c206edde ]
Except for pci, all the nvme transport drivers allocate a command within
the driver's pdu. Align pci with everyone else by allocating the nvme
command within pci's pdu and replace the .queue_rq() stack variable with
this.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c03fd85de293a4f65fcb94a795bf4c12a432bb6c ]
nvme_clear_request() has a check for flag REQ_DONTPREP and it is called
from nvme_init_request() and nvme_setuo_cmd().
The function nvme_init_request() is called from nvme_alloc_request()
and nvme_alloc_request_qid(). From these two callers new request is
allocated everytime. For newly allocated request RQF_DONTPREP is never
set. Since after getting a tag, block layer sets the req->rq_flags == 0
and never sets the REQ_DONTPREP when returning the request :-
nvme_alloc_request()
blk_mq_alloc_request()
blk_mq_rq_ctx_init()
rq->rq_flags = 0 <----
nvme_alloc_request_qid()
blk_mq_alloc_request_hctx()
blk_mq_rq_ctx_init()
rq->rq_flags = 0 <----
The block layer does set req->rq_flags but REQ_DONTPREP is not one of
them and that is set by the driver.
That means we can unconditinally set the REQ_DONTPREP value to the
rq->rq_flags when nvme_init_request()->nvme_clear_request() is called
from above two callers.
Move the check for REQ_DONTPREP from nvme_clear_nvme_request() into
nvme_setup_cmd().
This is needed since nvme_alloc_request() now gets called from fast
path when NVMeOF target is configured with passthru backend to avoid
unnecessary checks in the fast path.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a36604668b9b1f84126ef0342144ba5b07e518f ]
Since nvmet_setup_passthru() function falls in fast path when called
from the NVMeOF passthru backend, make it inline.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39dfe84451b4526a8054cc5a127337bca980dfa3 ]
Right now nvme_alloc_request() allocates a request from block layer
based on the value of the qid. When qid set to NVME_QID_ANY it used
blk_mq_alloc_request() else blk_mq_alloc_request_hctx().
The function nvme_alloc_request() is called from different context, The
only place where it uses non NVME_QID_ANY value is for fabrics connect
commands :-
nvme_submit_sync_cmd() NVME_QID_ANY
nvme_features() NVME_QID_ANY
nvme_sec_submit() NVME_QID_ANY
nvmf_reg_read32() NVME_QID_ANY
nvmf_reg_read64() NVME_QID_ANY
nvmf_reg_write32() NVME_QID_ANY
nvmf_connect_admin_queue() NVME_QID_ANY
nvme_submit_user_cmd() NVME_QID_ANY
nvme_alloc_request()
nvme_keep_alive() NVME_QID_ANY
nvme_alloc_request()
nvme_timeout() NVME_QID_ANY
nvme_alloc_request()
nvme_delete_queue() NVME_QID_ANY
nvme_alloc_request()
nvmet_passthru_execute_cmd() NVME_QID_ANY
nvme_alloc_request()
nvmf_connect_io_queue() QID
__nvme_submit_sync_cmd()
nvme_alloc_request()
With passthru nvme_alloc_request() now falls into the I/O fast path such
that blk_mq_alloc_request_hctx() is never gets called and that adds
additional branch check in fast path.
Split the nvme_alloc_request() into nvme_alloc_request() and
nvme_alloc_request_qid().
Replace each call of the nvme_alloc_request() with NVME_QID_ANY param
with a call to newly added nvme_alloc_request() without NVME_QID_ANY.
Replace a call to nvme_alloc_request() with QID param with a call to
newly added nvme_alloc_request() and nvme_alloc_request_qid()
based on the qid value set in the __nvme_submit_sync_cmd().
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d2e7c840b178bf9a47bd0de89d8f9182fa71d86 ]
The function nvme_alloc_request() is called from different context
(I/O and Admin queue) where callers do not consider the I/O timeout when
called from I/O queue context.
Update nvme_alloc_request() to set the default I/O and Admin timeout
value based on whether the queuedata is set or not.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1b205d948fbb06a7613d87dcea0ff5fd8a08ed91 ]
This reverts commit 69135c572d1f84261a6de2a1268513a7e71753e2.
This commit was just papering over the issue, ULP should not
get ->update() called with its own sk_prot. Each ULP would
need to add this check.
Fixes: 69135c572d1f ("net/tls: fix tls_sk_proto_close executed repeatedly")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20220620191353.1184629-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8af52fe9fd3bf5e7478da99193c0632276e1dfce ]
The following sequence currently causes a driver bug warning
when using virtio_net:
# ip link set eth0 up
# echo mem > /sys/power/state (or e.g. # rtcwake -s 10 -m mem)
<resume>
# ip link set eth0 down
Missing register, driver bug
WARNING: CPU: 0 PID: 375 at net/core/xdp.c:138 xdp_rxq_info_unreg+0x58/0x60
Call trace:
xdp_rxq_info_unreg+0x58/0x60
virtnet_close+0x58/0xac
__dev_close_many+0xac/0x140
__dev_change_flags+0xd8/0x210
dev_change_flags+0x24/0x64
do_setlink+0x230/0xdd0
...
This happens because virtnet_freeze() frees the receive_queue
completely (including struct xdp_rxq_info) but does not call
xdp_rxq_info_unreg(). Similarly, virtnet_restore() sets up the
receive_queue again but does not call xdp_rxq_info_reg().
Actually, parts of virtnet_freeze_down() and virtnet_restore_up()
are almost identical to virtnet_close() and virtnet_open(): only
the calls to xdp_rxq_info_(un)reg() are missing. This means that
we can fix this easily and avoid such problems in the future by
just calling virtnet_close()/open() from the freeze/restore handlers.
Aside from adding the missing xdp_rxq_info calls the only difference
is that the refill work is only cancelled if netif_running(). However,
this should not make any functional difference since the refill work
should only be active if the network interface is actually up.
Fixes: 754b8a21a9 ("virtio_net: setup xdp_rxq_info")
Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220621114845.3650258-1-stephan.gerhold@kernkonzept.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4e0effd9007ea0be31f7488611eb3824b4541554 ]
Intel I210 on some Intel Alder Lake platforms can only achieve ~750Mbps
Tx speed via iperf. The RR2DCDELAY shows around 0x2xxx DMA delay, which
will be significantly lower when 1) ASPM is disabled or 2) SoC package
c-state stays above PC3. When the RR2DCDELAY is around 0x1xxx the Tx
speed can reach to ~950Mbps.
According to the I210 datasheet "8.26.1 PCIe Misc. Register - PCIEMISC",
"DMA Idle Indication" doesn't seem to tie to DMA coalesce anymore, so
set it to 1b for "DMA is considered idle when there is no Rx or Tx AND
when there are no TLPs indicating that CPU is active detected on the
PCIe link (such as the host executes CSR or Configuration register read
or write operation)" and performing Tx should also fall under "active
CPU on PCIe link" case.
In addition to that, commit b6e0c419f0 ("igb: Move DMA Coalescing init
code to separate function.") seems to wrongly changed from enabling
E1000_PCIEMISC_LX_DECISION to disabling it, also fix that.
Fixes: b6e0c419f0 ("igb: Move DMA Coalescing init code to separate function.")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220621221056.604304-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 485037ae9a095491beb7f893c909a76cc4f9d1e7 ]
When enabling a type_in_mask irq, the type_buf contents must be
AND'd with the mask of the IRQ we're enabling to avoid enabling
other IRQs by accident, which can happen if several type_in_mask
irqs share a mask register.
Fixes: bc998a7303 ("regmap: irq: handle HW using separate rising/falling edge interrupts")
Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com>
Link: https://lore.kernel.org/r/20220620200644.1961936-2-aidanmacdonald.0x0@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c3d184c83ff4b80167e34edfc3d21df424bf27ff ]
In current implementation ice_update_phy_type enables all link modes
for selected speed. This approach doesn't work for 1000M speeds,
because both copper (1000baseT) and optical (1000baseX) standards
cannot be enabled at once.
Fix this, by adding the function `ice_set_phy_type_from_speed()`
for 1000M speeds.
Fixes: 48cb27f2fd ("ice: Implement handlers for ethtool PHY/link operations")
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cb78d1b5efffe4cf97e16766329dd7358aed3deb ]
The recent patch to make afs_getattr consult the server didn't account
for the pseudo-inodes employed by the dynamic root-type afs superblock
not having a volume or a server to access, and thus an oops occurs if
such a directory is stat'd.
Fix this by checking to see if the vnode->volume pointer actually points
anywhere before following it in afs_getattr().
This can be tested by stat'ing a directory in /afs. It may be
sufficient just to do "ls /afs" and the oops looks something like:
BUG: kernel NULL pointer dereference, address: 0000000000000020
...
RIP: 0010:afs_getattr+0x8b/0x14b
...
Call Trace:
<TASK>
vfs_statx+0x79/0xf5
vfs_fstatat+0x49/0x62
Fixes: 2aeb8c86d499 ("afs: Fix afs_getattr() to refetch file status if callback break occurred")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/165408450783.1031787.7941404776393751186.stgit@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c81aba8fde2aee4f5778ebab3a1d51bd2ef48e4c ]
commit 979934da9e ("[PATCH] mips: update IRQ handling for vr41xx") added
a function irq_dispatch, and it'll increase irq_err_count when the get_irq
callback returns a negative value, but increase irq_err_count in get_irq
was not removed.
And also, modpost complains once gpio-vr41xx drivers become modules.
ERROR: modpost: "irq_err_count" [drivers/gpio/gpio-vr41xx.ko] undefined!
So it would be a good idea to remove repetitive increase irq_err_count in
get_irq callback.
Fixes: 27fdd325da ("MIPS: Update VR41xx GPIO driver to use gpiolib")
Fixes: 979934da9e ("[PATCH] mips: update IRQ handling for vr41xx")
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: huhai <huhai@kylinos.cn>
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d79d8af8dec58bf709b3124d09d9572edd9c617 ]
Before change:
make -C netfilter
TEST: performance
net,port [SKIP]
perf not supported
port,net [SKIP]
perf not supported
net6,port [SKIP]
perf not supported
port,proto [SKIP]
perf not supported
net6,port,mac [SKIP]
perf not supported
net6,port,mac,proto [SKIP]
perf not supported
net,mac [SKIP]
perf not supported
After change:
net,mac [ OK ]
baseline (drop from netdev hook): 2061098pps
baseline hash (non-ranged entries): 1606741pps
baseline rbtree (match on first field only): 1191607pps
set with 1000 full, ranged entries: 1639119pps
ok 8 selftests: netfilter: nft_concat_range.sh
Fixes: 611973c1e0 ("selftests: netfilter: Introduce tests for sets with range concatenation")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jie2x Zhou <jie2x.zhou@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 05b252cccb2e5c3f56119d25de684b4f810ba40a ]
Check vm_fault->pgoff before using it. When we removed the warning, we
also removed the check.
Fixes: 7b26e4e211 ("udmabuf: drop WARN_ON() check.")
Reported-by: zdi-disclosures@trendmicro.com
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 69135c572d1f84261a6de2a1268513a7e71753e2 ]
After setting the sock ktls, update ctx->sk_proto to sock->sk_prot by
tls_update(), so now ctx->sk_proto->close is tls_sk_proto_close(). When
close the sock, tls_sk_proto_close() is called for sock->sk_prot->close
is tls_sk_proto_close(). But ctx->sk_proto->close() will be executed later
in tls_sk_proto_close(). Thus tls_sk_proto_close() executed repeatedly
occurred. That will trigger the following bug.
=================================================================
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:tls_sk_proto_close+0xd8/0xaf0 net/tls/tls_main.c:306
Call Trace:
<TASK>
tls_sk_proto_close+0x356/0xaf0 net/tls/tls_main.c:329
inet_release+0x12e/0x280 net/ipv4/af_inet.c:428
__sock_release+0xcd/0x280 net/socket.c:650
sock_close+0x18/0x20 net/socket.c:1365
Updating a proto which is same with sock->sk_prot is incorrect. Add proto
and sock->sk_prot equality check at the head of tls_update() to fix it.
Fixes: 95fa145479 ("bpf: sockmap/tls, close can race with map free")
Reported-by: syzbot+29c3c12f3214b85ad081@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c58eb1b54feefc3a47fab78addd14083bc941c44 ]
Some usb type-c dongle use irq_hpd request to perform device connection
and disconnection. This patch add handling of both connection and
disconnection are based on the state of hpd_state and sink_count.
Changes in V2:
-- add dp_display_handle_port_ststus_changed()
-- fix kernel test robot complaint
Changes in V3:
-- add encoder_mode_set into struct dp_display_private
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 26b8d66a399e ("drm/msm/dp: promote irq_hpd handle to handle link training correctly")
Tested-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 26b8d66a399e625f3aa2c02ccbab1bff2e00040c ]
Some dongles require link training done at irq_hpd request instead
of plugin request. This patch promote irq_hpd handler to handle link
training and setup hpd_state correctly.
Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 231a04fcc6cb5b0e5f72c015d36462a17355f925 ]
DP compo phy have to be enable to start link training. When
link training failed phy need to be disabled so that next
link traning can be proceed smoothly at next plug in. This
patch de-initialize mainlink to disable phy if link training
failed. This prevent system crash due to
disp_cc_mdss_dp_link_intf_clk stuck at "off" state. This patch
also perform checking power_on flag at dp_display_enable() and
dp_display_disable() to avoid crashing when unplug cable while
display is off.
Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 62671d2ef24bca1e2e1709a59a5bfb5c423cdc8e ]
Connection state is not set correctly happen when either failure of link
train due to cable unplugged in the middle of aux channel reading or
cable plugged in while in suspended state. This patch fixes these problems.
This patch also replace ST_SUSPEND_PENDING with ST_DISPLAY_OFF.
Changes in V2:
-- Add more information to commit message.
Changes in V3:
-- change base
Changes in V4:
-- add Fixes tag
Signed-off-by: Kuogee Hsieh <khsieh@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b9cc4598607cb7f7eae5c75fc1e3209cd52ff5e0 ]
of_graph_get_remote_node() returns remote device node pointer with
refcount incremented, we should use of_node_put() on it
when not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: 86418f90a4 ("drm: convert drivers to use of_graph_get_remote_node")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/488473/
Link: https://lore.kernel.org/r/20220607110841.53889-1-linmq006@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a2b1a5d40bd12b44322c2ccd40bb0ec1699708b6 ]
As reported by Yuming, currently tc always show a latency of UINT_MAX
for netem Qdisc's on 32-bit platforms:
$ tc qdisc add dev dummy0 root netem latency 100ms
$ tc qdisc show dev dummy0
qdisc netem 8001: root refcnt 2 limit 1000 delay 275s 275s
^^^^^^^^^^^^^^^^
Let us take a closer look at netem_dump():
qopt.latency = min_t(psched_tdiff_t, PSCHED_NS2TICKS(q->latency,
UINT_MAX);
qopt.latency is __u32, psched_tdiff_t is signed long,
(psched_tdiff_t)(UINT_MAX) is negative for 32-bit platforms, so
qopt.latency is always UINT_MAX.
Fix it by using psched_time_t (u64) instead.
Note: confusingly, users have two ways to specify 'latency':
1. normally, via '__u32 latency' in struct tc_netem_qopt;
2. via the TCA_NETEM_LATENCY64 attribute, which is s64.
For the second case, theoretically 'latency' could be negative. This
patch ignores that corner case, since it is broken (i.e. assigning a
negative s64 to __u32) anyways, and should be handled separately.
Thanks Ted Lin for the analysis [1] .
[1] https://github.com/raspberrypi/linux/issues/3512
Reported-by: Yuming Chen <chenyuming.junnan@bytedance.com>
Fixes: 112f9cb656 ("netem: convert to qdisc_watchdog_schedule_ns")
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Link: https://lore.kernel.org/r/20220616234336.2443-1-yepeilin.cs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a9214f3d88cfdb099f3896e102a306b316d8707 ]
The bonding ARP monitor fails to decrement send_peer_notif, the
number of peer notifications (gratuitous ARP or ND) to be sent. This
results in a continuous series of notifications.
Correct this by decrementing the counter for each notification.
Reported-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Fixes: b0929915e0 ("bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor")
Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/
Tested-by: Jonathan Toppins <jtoppins@redhat.com>
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/9400.1655407960@famine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 911600bf5a5e84bfda4d33ee32acc75ecf6159f0 ]
syzbot found the following issue on:
==================================================================
BUG: KASAN: use-after-free in tipc_named_reinit+0x94f/0x9b0
net/tipc/name_distr.c:413
Read of size 8 at addr ffff88805299a000 by task kworker/1:9/23764
CPU: 1 PID: 23764 Comm: kworker/1:9 Not tainted
5.18.0-rc4-syzkaller-00878-g17d49e6e8012 #0
Hardware name: Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Workqueue: events tipc_net_finalize_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0xeb/0x495
mm/kasan/report.c:313
print_report mm/kasan/report.c:429 [inline]
kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
tipc_named_reinit+0x94f/0x9b0 net/tipc/name_distr.c:413
tipc_net_finalize+0x234/0x3d0 net/tipc/net.c:138
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
</TASK>
[...]
==================================================================
In the commit
d966ddcc38 ("tipc: fix a deadlock when flushing scheduled work"),
the cancel_work_sync() function just to make sure ONLY the work
tipc_net_finalize_work() is executing/pending on any CPU completed before
tipc namespace is destroyed through tipc_exit_net(). But this function
is not guaranteed the work is the last queued. So, the destroyed instance
may be accessed in the work which will try to enqueue later.
In order to completely fix, we re-order the calling of cancel_work_sync()
to make sure the work tipc_net_finalize_work() was last queued and it
must be completed by calling cancel_work_sync().
Reported-by: syzbot+47af19f3307fc9c5c82e@syzkaller.appspotmail.com
Fixes: d966ddcc38 ("tipc: fix a deadlock when flushing scheduled work")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit be07f056396d6bb40963c45a02951c566ddeef8e ]
This patch is to use "struct work_struct" for the finalize work queue
instead of "struct tipc_net_work", as it can get the "net" and "addr"
from tipc_net's other members and there is no need to add extra net
and addr in tipc_net by defining "struct tipc_net_work".
Note that it's safe to get net from tn->bcl as bcl is always released
after the finalize work queue is done.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9b7fd1670a94a57d974795acebde843a5c1a354e ]
Even when the eth port is resticted to work with speeds not higher than 1G,
and so the eth driver is requesting the phy (via phylink) to advertise up
to 1000BASET support, the aquantia phy device is still advertising for 2.5G
and 5G speeds.
Clear these advertising defaults when requested.
Cc: Ondrej Spacek <ondrej.spacek@nxp.com>
Fixes: 09c4c57f7b ("net: phy: aquantia: add support for auto-negotiation configuration")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20220610084037.7625-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff672c67ee7635ca1e28fb13729e8ef0d1f08ce5 ]
On x86-64 the tail call count is passed from one BPF function to another
through %rax. Additionally, on function entry, the tail call count value
is stored on stack right after the BPF program stack, due to register
shortage.
The stored count is later loaded from stack either when performing a tail
call - to check if we have not reached the tail call limit - or before
calling another BPF function call in order to pass it via %rax.
In the latter case, we miscalculate the offset at which the tail call count
was stored on function entry. The JIT does not take into account that the
allocated BPF program stack is always a multiple of 8 on x86, while the
actual stack depth does not have to be.
This leads to a load from an offset that belongs to the BPF stack, as shown
in the example below:
SEC("tc")
int entry(struct __sk_buff *skb)
{
/* Have data on stack which size is not a multiple of 8 */
volatile char arr[1] = {};
return subprog_tail(skb);
}
int entry(struct __sk_buff * skb):
0: (b4) w2 = 0
1: (73) *(u8 *)(r10 -1) = r2
2: (85) call pc+1#bpf_prog_ce2f79bb5f3e06dd_F
3: (95) exit
int entry(struct __sk_buff * skb):
0xffffffffa0201788: nop DWORD PTR [rax+rax*1+0x0]
0xffffffffa020178d: xor eax,eax
0xffffffffa020178f: push rbp
0xffffffffa0201790: mov rbp,rsp
0xffffffffa0201793: sub rsp,0x8
0xffffffffa020179a: push rax
0xffffffffa020179b: xor esi,esi
0xffffffffa020179d: mov BYTE PTR [rbp-0x1],sil
0xffffffffa02017a1: mov rax,QWORD PTR [rbp-0x9] !!! tail call count
0xffffffffa02017a8: call 0xffffffffa02017d8 !!! is at rbp-0x10
0xffffffffa02017ad: leave
0xffffffffa02017ae: ret
Fix it by rounding up the BPF stack depth to a multiple of 8, when
calculating the tail call count offset on stack.
Fixes: ebf7d1f508 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220616162037.535469-2-jakub@cloudflare.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1342b5b23da9559a1578978eaff7f797d8a87d91 ]
If the component driver fails to bind, or is unbound, the driver data
for the top-level platform device points to a freed drm_device. If the
system is then suspended, the driver passes this dangling pointer to
drm_mode_config_helper_suspend(), which crashes.
Fix this by only setting the driver data while the platform driver holds
a reference to the drm_device.
Fixes: 624b4b48d9 ("drm: sun4i: Add support for suspending the display driver")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220615054254.16352-1-samuel@sholland.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3046a827316c0e55fc563b4fb78c93b9ca5c7c37 ]
A customer reported a request_socket leak in a Calico cloud environment. We
found that a BPF program was doing a socket lookup with takes a refcnt on
the socket and that it was finding the request_socket but returning the parent
LISTEN socket via sk_to_full_sk() without decrementing the child request socket
1st, resulting in request_sock slab object leak. This patch retains the
existing behaviour of returning full socks to the caller but it also decrements
the child request_socket if one is present before doing so to prevent the leak.
Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
thanks to Antoine Tenart for the reproducer and patch input.
v2 of this patch contains, refactor as per Daniel Borkmann's suggestions to
validate RCU flags on the listen socket so that it balances with bpf_sk_release()
and update comments as per Martin KaFai Lau's suggestion. One small change to
Daniels suggestion, put "sk = sk2" under "if (sk2 != sk)" to avoid an extra
instruction.
Fixes: f7355a6c04 ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()")
Fixes: edbf8c01de ("bpf: add skc_lookup_tcp helper")
Co-developed-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Curtis Taylor <cutaylor-pub@yahoo.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/56d6f898-bde0-bb25-3427-12a330b29fb8@iogearbox.net
Link: https://lore.kernel.org/bpf/20220615011540.813025-1-jmaxwell37@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 62b5e322fb6cc5a5a91fdeba0e4e57e75d9f4387 ]
The dma_map_sgtable() call (used to invalidate cache) overwrites sgt->nents
with 1, so msm_iommu_pagetable_map maps only the first physical segment.
To fix this problem use for_each_sgtable_sg(), which uses orig_nents.
Fixes: b145c6e65e ("drm/msm: Add support to create a local pagetable")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Link: https://lore.kernel.org/r/20220613221019.11399-1-jonathan@marek.ca
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 566d3c57eb526f32951af15866086e236ce1fc8a ]
When a write command to a sequential write required or sequential write
preferred zone result in the zone write pointer reaching the end of the
zone, the zone condition must be set to full AND the number of implicitly
or explicitly open zones updated to have a correct accounting for zone
resources. However, the function zbc_inc_wp() only sets the zone condition
to full without updating the open zone counters, resulting in a zone state
machine breakage.
Introduce the helper function zbc_set_zone_full() and use it in
zbc_inc_wp() to correctly transition zones to the full condition.
Link: https://lore.kernel.org/r/20220608011302.92061-1-damien.lemoal@opensource.wdc.com
Fixes: f0d1cf9378 ("scsi: scsi_debug: Add ZBC zone commands")
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b1fd94e704571f98b21027340eecf821b2bdffba ]
bh might occur while updating per-cpu rnd_state from user context,
ie. local_out path.
BUG: using smp_processor_id() in preemptible [00000000] code: nginx/2725
caller is nft_ng_random_eval+0x24/0x54 [nft_numgen]
Call Trace:
check_preemption_disabled+0xde/0xe0
nft_ng_random_eval+0x24/0x54 [nft_numgen]
Use the random driver instead, this also avoids need for local prandom
state. Moreover, prandom now uses the random driver since d4150779e60f
("random32: use real rng for non-deterministic randomness").
Based on earlier patch from Pablo Neira.
Fixes: 6b2faee0ca ("netfilter: nft_meta: place prandom handling in a helper")
Fixes: 978d8f9055 ("netfilter: nft_numgen: add map lookups for numgen random operations")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 345023b0db315648ccc3c1a36aee88304a8b4d91 ]
This new function combines the netlink register attribute parser
and the store validation function.
This update requires to replace:
enum nft_registers dreg:8;
in many of the expression private areas otherwise compiler complains
with:
error: cannot take address of bit-field ‘dreg’
when passing the register field as reference.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4f16d25c68ec844299a4df6ecbb0234eaf88a935 ]
This new function combines the netlink register attribute parser
and the load validation function.
This update requires to replace:
enum nft_registers sreg:8;
in many of the expression private areas otherwise compiler complains
with:
error: cannot take address of bit-field ‘sreg’
when passing the register field as reference.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce0db505bc0c51ef5e9ba446c660de7e26f78f29 ]
Following commit 17e822f759 ("drm/msm: fix unbalanced
pm_runtime_enable in adreno_gpu_{init, cleanup}"), any call to
adreno_unbind() will disable runtime PM twice, as indicated by the call
trees below:
adreno_unbind()
-> pm_runtime_force_suspend()
-> pm_runtime_disable()
adreno_unbind()
-> gpu->funcs->destroy() [= aNxx_destroy()]
-> adreno_gpu_cleanup()
-> pm_runtime_disable()
Note that pm_runtime_force_suspend() is called right before
gpu->funcs->destroy() and both functions are called unconditionally.
With recent addition of the eDP AUX bus code, this problem manifests
itself when the eDP panel cannot be found yet and probing is deferred.
On the first probe attempt, we disable runtime PM twice as described
above. This then causes any later probe attempt to fail with
[drm:adreno_load_gpu [msm]] *ERROR* Couldn't power up the GPU: -13
preventing the driver from loading.
As there seem to be scenarios where the aNxx_destroy() functions are not
called from adreno_unbind(), simply removing pm_runtime_disable() from
inside adreno_unbind() does not seem to be the proper fix. This is what
commit 17e822f759 ("drm/msm: fix unbalanced pm_runtime_enable in
adreno_gpu_{init, cleanup}") intended to fix. Therefore, instead check
whether runtime PM is still enabled, and only disable it in that case.
Fixes: 17e822f759 ("drm/msm: fix unbalanced pm_runtime_enable in adreno_gpu_{init, cleanup}")
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com>
Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Link: https://lore.kernel.org/r/20220606211305.189585-1-luzmaximilian@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 90736eb3232d208ee048493f371075e4272e0944 upstream.
Commit 85e123c27d5c ("dm mirror log: round up region bitmap size to
BITS_PER_LONG") introduced a regression on 64-bit architectures in the
lvm testsuite tests: lvcreate-mirror, mirror-names and vgsplit-operation.
If the device is shrunk, we need to clear log bits beyond the end of the
device. The code clears bits up to a 32-bit boundary and then calculates
lc->sync_count by summing set bits up to a 64-bit boundary (the commit
changed that; previously, this boundary was 32-bit too). So, it was using
some non-zeroed bits in the calculation and this caused misbehavior.
Fix this regression by clearing bits up to BITS_PER_LONG boundary.
Fixes: 85e123c27d5c ("dm mirror log: round up region bitmap size to BITS_PER_LONG")
Cc: stable@vger.kernel.org
Reported-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ae6e8b1c9bbf6874163d1243e393137313762b7 upstream.
During postsuspend dm-era does the following:
1. Archives the current era
2. Commits the metadata, as part of the RPC call for archiving the
current era
3. Stops the worker
Until the worker stops, it might write to the metadata again. Moreover,
these writes are not flushed to disk immediately, but are cached by the
dm-bufio client, which writes them back asynchronously.
As a result, the committed metadata of a suspended dm-era device might
not be consistent with the in-core metadata.
In some cases, this can result in the corruption of the on-disk
metadata. Suppose the following sequence of events:
1. Load a new table, e.g. a snapshot-origin table, to a device with a
dm-era table
2. Suspend the device
3. dm-era commits its metadata, but the worker does a few more metadata
writes until it stops, as part of digesting an archived writeset
4. These writes are cached by the dm-bufio client
5. Load the dm-era table to another device.
6. The new instance of the dm-era target loads the committed, on-disk
metadata, which don't include the extra writes done by the worker
after the metadata commit.
7. Resume the new device
8. The new dm-era target instance starts using the metadata
9. Resume the original device
10. The destructor of the old dm-era target instance is called and
destroys the dm-bufio client, which results in flushing the cached
writes to disk
11. These writes might overwrite the writes done by the new dm-era
instance, hence corrupting its metadata.
Fix this by committing the metadata after the worker stops running.
stop_worker uses flush_workqueue to flush the current work. However, the
work item may re-queue itself and flush_workqueue doesn't wait for
re-queued works to finish.
This could result in the worker changing the metadata after they have
been committed, or writing to the metadata concurrently with the commit
in the postsuspend thread.
Use drain_workqueue instead, which waits until the work and all
re-queued works finish.
Fixes: eec40579d8 ("dm: add era target")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 540a92bfe6dab7310b9df2e488ba247d784d0163 upstream.
Add flags value to check the result of ata completion
Fixes: 255c03d15a ("libata: Add tracepoints")
Cc: stable@vger.kernel.org
Signed-off-by: Edward Wu <edwardwu@realtek.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06781a5026350cde699d2d10c9914a25c1524f45 upstream.
The DEVICE_BUSY_TIMEOUT value is described in the Reference Manual as:
| Timeout waiting for NAND Ready/Busy or ATA IRQ. Used in WAIT_FOR_READY
| mode. This value is the number of GPMI_CLK cycles multiplied by 4096.
So instead of multiplying the value in cycles with 4096, we have to
divide it by that value. Use DIV_ROUND_UP to make sure we are on the
safe side, especially when the calculated value in cycles is smaller
than 4096 as typically the case.
This bug likely never triggered because any timeout != 0 usually will
do. In my case the busy timeout in cycles was originally calculated as
2408, which multiplied with 4096 is 0x968000. The lower 16 bits were
taken for the 16 bit wide register field, so the register value was
0x8000. With 2970bf5a32f0 ("mtd: rawnand: gpmi: fix controller timings
setting") however the value in cycles became 2384, which multiplied
with 4096 is 0x950000. The lower 16 bit are 0x0 now resulting in an
intermediate timeout when reading from NAND.
Fixes: b120612206 ("mtd: rawnand: gpmi: use core timings instead of an empirical derivation")
Cc: stable@vger.kernel.org
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220614083138.3455683-1-s.hauer@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e3a4167c880cf889f66887a152799df4d609dd21 upstream.
Almost none of the errors stemming from a valid mount option but wrong
value prints a descriptive message which would help to identify why
mount failed. Like in the linked report:
$ uname -r
v4.19
$ mount -o compress=zstd /dev/sdb /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on
/dev/sdb, missing codepage or helper program, or other error.
$ dmesg
...
BTRFS error (device sdb): open_ctree failed
Errors caused by memory allocation failures are left out as it's not a
user error so reporting that would be confusing.
Link: https://lore.kernel.org/linux-btrfs/9c3fec36-fc61-3a33-4977-a7e207c3fa4e@gmx.de/
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>