Go to file
Brian King b45a47e9ad scsi: ibmvfc: Set default timeout to avoid crash during migration
[ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ]

While testing live partition mobility, we have observed occasional crashes
of the Linux partition. What we've seen is that during the live migration,
for specific configurations with large amounts of memory, slow network
links, and workloads that are changing memory a lot, the partition can end
up being suspended for 30 seconds or longer. This resulted in the following
scenario:

CPU 0                          CPU 1
-------------------------------  ----------------------------------
scsi_queue_rq                    migration_store
 -> blk_mq_start_request          -> rtas_ibm_suspend_me
  -> blk_add_timer                 -> on_each_cpu(rtas_percpu_suspend_me
              _______________________________________V
             |
             V
    -> IPI from CPU 1
     -> rtas_percpu_suspend_me
                                     -> __rtas_suspend_last_cpu

-- Linux partition suspended for > 30 seconds --
                                      -> for_each_online_cpu(cpu)
                                           plpar_hcall_norets(H_PROD
 -> scsi_dispatch_cmd
                                      -> scsi_times_out
                                       -> scsi_abort_command
                                        -> queue_delayed_work
  -> ibmvfc_queuecommand_lck
   -> ibmvfc_send_event
    -> ibmvfc_send_crq
     - returns H_CLOSED
   <- returns SCSI_MLQUEUE_HOST_BUSY
-> __blk_mq_requeue_request

                                      -> scmd_eh_abort_handler
                                       -> scsi_try_to_abort_cmd
                                         - returns SUCCESS
                                       -> scsi_queue_insert

Normally, the SCMD_STATE_COMPLETE bit would protect against the command
completion and the timeout, but that doesn't work here, since we don't
check that at all in the SCSI_MLQUEUE_HOST_BUSY path.

In this case we end up calling scsi_queue_insert on a request that has
already been queued, or possibly even freed, and we crash.

The patch below simply increases the default I/O timeout to avoid this race
condition. This is also the timeout value that nearly all IBM SAN storage
recommends setting as the default value.

Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vnet.ibm.com
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-07 15:37:15 +01:00
arch x86: __always_inline __{rd,wr}msr() 2021-02-07 15:37:14 +01:00
block blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue 2021-02-03 23:28:44 +01:00
certs
crypto crypto: xor - Fix divide error in do_xor_speed() 2021-01-27 11:54:52 +01:00
Documentation KVM: Documentation: Fix spec for KVM_CAP_ENABLE_CAP_VM 2021-02-03 23:28:43 +01:00
drivers scsi: ibmvfc: Set default timeout to avoid crash during migration 2021-02-07 15:37:15 +01:00
fs pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn 2021-02-03 23:28:47 +01:00
include tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN 2021-02-03 23:28:52 +01:00
init rcu-tasks: Move RCU-tasks initialization to before early_initcall() 2021-01-19 18:27:28 +01:00
ipc ipc: adjust proc_ipc_sem_dointvec definition to match prototype 2020-09-05 12:14:29 -07:00
kernel locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP 2021-02-07 15:37:14 +01:00
lib iov_iter: fix the uaccess area in copy_compat_iovec_from_user 2021-01-27 11:55:09 +01:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm mm: fix a race on nr_swap_pages 2021-01-30 13:55:19 +01:00
net mac80211: fix encryption key selection for 802.3 xmit 2021-02-07 15:37:15 +01:00
samples samples/bpf: Fix possible hang in xdpsock with multiple threads 2020-12-30 11:53:49 +01:00
scripts Revert "kconfig: remove 'kvmconfig' and 'xenconfig' shorthands" 2021-01-23 16:03:57 +01:00
security dump_common_audit_data(): fix racy accesses to ->d_name 2021-01-19 18:27:29 +01:00
sound ALSA: hda: Add AlderLake-P PCI ID and HDMI codec vid 2021-02-07 15:37:14 +01:00
tools objtool: Don't add empty symbols to the rbtree 2021-02-07 15:37:14 +01:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt KVM: Forbid the use of tagged userspace addresses for memslots 2021-02-03 23:28:41 +01:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore .gitignore: docs: ignore sphinx_*/ directories 2020-09-10 10:44:31 -06:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild
Kconfig
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-12-10 15:30:13 -08:00
Makefile Linux 5.10.13 2021-02-03 23:28:52 +01:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.