Commit Graph

5083 Commits

Author SHA1 Message Date
Asai Thambi SP
abb0ccd185 mtip32xx: Implement timeout handler
Added timeout handler. Replaced blk_mq_end_request() with
blk_mq_complete_request() to avoid double completion of a request.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 09:08:44 -07:00
Asai Thambi SP
aae4a03386 mtip32xx: Handle FTL rebuild failure state during device initialization
Allow device initialization to finish gracefully when it is in
FTL rebuild failure state. Also, recover device out of this state
after successfully secure erasing it.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 09:08:43 -07:00
Asai Thambi SP
51c6570eb9 mtip32xx: Handle safe removal during IO
Flush inflight IOs using fsync_bdev() when the device is safely
removed. Also, block further IOs in device open function.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 09:08:43 -07:00
Asai Thambi SP
59cf70e236 mtip32xx: Fix for rmmod crash when drive is in FTL rebuild
When FTL rebuild is in progress, alloc_disk() initializes the disk
but device node will be created by add_disk() only after successful
completion of FTL rebuild. So, skip deletion of device node in
removal path when FTL rebuild is in progress.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 09:08:43 -07:00
Asai Thambi SP
d8a18d2d8f mtip32xx: Avoid issuing standby immediate cmd during FTL rebuild
Prevent standby immediate command from being issued in remove,
suspend and shutdown paths, while drive is in FTL rebuild process.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 09:08:43 -07:00
Asai Thambi SP
5b7e0a8ac8 mtip32xx: Print exact time when an internal command is interrupted
Print exact time when an internal command is interrupted.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 09:08:43 -07:00
Asai Thambi SP
e35b94738a mtip32xx: Remove unwanted code from taskfile error handler
Remove setting and clearing MTIP_PF_EH_ACTIVE_BIT flag in
mtip_handle_tfe() as they are redundant. Also avoid waking
up service thread from mtip_handle_tfe() because it is
already woken up in case of taskfile error.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 09:08:43 -07:00
Asai Thambi SP
cfc05bd313 mtip32xx: Fix broken service thread handling
Service thread does not detect the need for taskfile error hanlding. Fixed the
flag condition to process taskfile error.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-03 09:08:43 -07:00
Markus Pargmann
37091fdd83 nbd: Create size change events for userspace
The userspace needs to know when nbd devices are ready for use.
Currently no events are created for the userspace which doesn't work for
systemd.

See the discussion here: https://github.com/systemd/systemd/pull/358

This patch uses a central point to setup the nbd-internal sizes. A ioctl
to set a size does not lead to a visible size change. The size of the
block device will be kept at 0 until nbd is connected. As soon as it
connects, the size will be changed to the real value and a uevent is
created. When disconnecting, the blockdevice is set to 0 size and
another uevent is generated.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-02-15 10:35:47 +01:00
Dan Streetman
da6ccaaa79 nbd: ratelimit error msgs after socket close
Make the "Attempted send on closed socket" error messages generated in
nbd_request_handler() ratelimited.

When the nbd socket is shutdown, the nbd_request_handler() function emits
an error message for every request remaining in its queue.  If the queue
is large, this will spam a large amount of messages to the log.  There's
no need for a separate error message for each request, so this patch
ratelimits it.

In the specific case this was found, the system was virtual and the error
messages were logged to the serial port, which overwhelmed it.

Fixes: 4d48a542b4 ("nbd: fix I/O hang on disconnected nbds")
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-02-05 08:55:15 +01:00
Markus Pargmann
d02cf53107 nbd: Move flag parsing to a function
nbd changes properties of the blockdevice depending on flags that were
received. This patch moves this flag parsing into a separate function
nbd_parse_flags().

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-02-05 08:52:33 +01:00
Markus Pargmann
0e4f0f6f63 nbd: Cleanup reset of nbd and bdev after a disconnect
Group all variables that are reset after a disconnect into reset
functions. This patch adds two of these functions, nbd_reset() and
nbd_bdev_reset().

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-02-05 08:52:32 +01:00
Markus Pargmann
1f7b5cf1be nbd: Timeouts are not user requested disconnects
It may be useful to know in the client that a connection timed out. The
current code returns success for a timeout.

This patch reports the error code -ETIMEDOUT for a timeout.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-02-05 08:52:31 +01:00
Markus Pargmann
23272a6754 nbd: Remove signal usage
As discussed on the mailing list, the usage of signals for timeout
handling has a lot of potential issues. The nbd driver used for some
time signals for timeouts. These signals where able to get the threads
out of the blocking socket operations.

This patch removes all signal usage and uses a socket shutdown instead.
The socket descriptor itself is cleared later when the whole nbd device
is closed.

The tasks_lock is removed as we do not depend on this anymore. Instead
a new lock for the socket is introduced so we can safely work with the
socket in the timeout handler outside of the two main threads.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-02-05 08:52:25 +01:00
Markus Pargmann
27ea43fe2a nbd: Fix debugfs error handling
Static checker complains about the implemented error handling. It is
indeed wrong. We don't care about the return values of created debugfs
files.

We only have to check the return values of created dirs for NULL
pointer. If we use a null pointer as parent directory for files, this
may lead to debugfs files in wrong places.

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
2016-02-03 11:02:56 +01:00
Linus Torvalds
00e3f5cc30 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "The two main changes are aio support in CephFS, and a series that
  fixes several issues in the authentication key timeout/renewal code.

  On top of that are a variety of cleanups and minor bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: remove outdated comment
  libceph: kill off ceph_x_ticket_handler::validity
  libceph: invalidate AUTH in addition to a service ticket
  libceph: fix authorizer invalidation, take 2
  libceph: clear messenger auth_retry flag if we fault
  libceph: fix ceph_msg_revoke()
  libceph: use list_for_each_entry_safe
  ceph: use i_size_{read,write} to get/set i_size
  ceph: re-send AIO write request when getting -EOLDSNAP error
  ceph: Asynchronous IO support
  ceph: Avoid to propagate the invalid page point
  ceph: fix double page_unlock() in page_mkwrite()
  rbd: delete an unnecessary check before rbd_dev_destroy()
  libceph: use list_next_entry instead of list_entry_next
  ceph: ceph_frag_contains_value can be boolean
  ceph: remove unused functions in ceph_frag.h
2016-01-24 12:34:13 -08:00
Linus Torvalds
cc673757e2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull final vfs updates from Al Viro:

 - The ->i_mutex wrappers (with small prereq in lustre)

 - a fix for too early freeing of symlink bodies on shmem (they need to
   be RCU-delayed) (-stable fodder)

 - followup to dedupe stuff merged this cycle

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: abort dedupe loop if fatal signals are pending
  make sure that freeing shmem fast symlinks is RCU-delayed
  wrappers for ->i_mutex access
  lustre: remove unused declaration
2016-01-23 12:24:56 -08:00
Tetsuo Handa
1d5cfdb076 tree wide: use kvfree() than conditional kfree()/vfree()
There are many locations that do

  if (memory_was_allocated_by_vmalloc)
    vfree(ptr);
  else
    kfree(ptr);

but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
using is_vmalloc_addr().  Unless callers have special reasons, we can
replace this branch with kvfree().  Please check and reply if you found
problems.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Boris Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Al Viro
5955102c99 wrappers for ->i_mutex access
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 18:04:28 -05:00
Linus Torvalds
0a13daedf7 Merge branch 'for-4.5/lightnvm' of git://git.kernel.dk/linux-block
Pull lightnvm fixes and updates from Jens Axboe:
 "This should have been part of the drivers branch, but it arrived a bit
  late and wasn't based on the official core block driver branch.  So
  they got a small scolding, but got a pass since it's still new.  Hence
  it's in a separate branch.

  This is mostly pure fixes, contained to lightnvm/, and minor feature
  additions"

* 'for-4.5/lightnvm' of git://git.kernel.dk/linux-block: (26 commits)
  lightnvm: ensure that nvm_dev_ops can be used without CONFIG_NVM
  lightnvm: introduce factory reset
  lightnvm: use system block for mm initialization
  lightnvm: introduce ioctl to initialize device
  lightnvm: core on-disk initialization
  lightnvm: introduce mlc lower page table mappings
  lightnvm: add mccap support
  lightnvm: manage open and closed blocks separately
  lightnvm: fix missing grown bad block type
  lightnvm: reference rrpc lun in rrpc block
  lightnvm: introduce nvm_submit_ppa
  lightnvm: move rq->error to nvm_rq->error
  lightnvm: support multiple ppas in nvm_erase_ppa
  lightnvm: move the pages per block check out of the loop
  lightnvm: sectors first in ppa list
  lightnvm: fix locking and mempool in rrpc_lun_gc
  lightnvm: put block back to gc list on its reclaim fail
  lightnvm: check bi_error in gc
  lightnvm: return the get_bb_tbl return value
  lightnvm: refactor end_io functions for sync
  ...
2016-01-21 19:01:55 -08:00
Linus Torvalds
641203549a Merge branch 'for-4.5/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "This is the block driver pull request for 4.5, with the exception of
  NVMe, which is in a separate branch and will be posted after this one.

  This pull request contains:

   - A set of bcache stability fixes, which have been acked by Kent.
     These have been used and tested for more than a year by the
     community, so it's about time that they got in.

   - A set of drbd updates from the drbd team (Andreas, Lars, Philipp)
     and Markus Elfring, Oleg Drokin.

   - A set of fixes for xen blkback/front from the usual suspects, (Bob,
     Konrad) as well as community based fixes from Kiri, Julien, and
     Peng.

   - A 2038 time fix for sx8 from Shraddha, with a fix from me.

   - A small mtip32xx cleanup from Zhu Yanjun.

   - A null_blk division fix from Arnd"

* 'for-4.5/drivers' of git://git.kernel.dk/linux-block: (71 commits)
  null_blk: use sector_div instead of do_div
  mtip32xx: restrict variables visible in current code module
  xen/blkfront: Fix crash if backend doesn't follow the right states.
  xen/blkback: Fix two memory leaks.
  xen/blkback: make st_ statistics per ring
  xen/blkfront: Handle non-indirect grant with 64KB pages
  xen-blkfront: Introduce blkif_ring_get_request
  xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule()
  xen/blkback: Free resources if connect_ring failed.
  xen/blocks: Return -EXX instead of -1
  xen/blkback: make pool of persistent grants and free pages per-queue
  xen/blkback: get the number of hardware queues/rings from blkfront
  xen/blkback: pseudo support for multi hardware queues/rings
  xen/blkback: separate ring information out of struct xen_blkif
  xen/blkfront: correct setting for xen_blkif_max_ring_order
  xen/blkfront: make persistent grants pool per-queue
  xen/blkfront: Remove duplicate setting of ->xbdev.
  xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors.
  xen/blkfront: negotiate number of queues/rings to be used with backend
  xen/blkfront: split per device io_lock
  ...
2016-01-21 18:19:38 -08:00
Markus Elfring
1761b22966 rbd: delete an unnecessary check before rbd_dev_destroy()
The rbd_dev_destroy() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-01-21 19:36:07 +01:00
Linus Torvalds
7c24d9f3b2 Merge branch 'for-4.5/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
 "We don't have a lot of core changes this time around, it's mostly in
  drivers, which will come in a subsequent pull.

  The cores changes include:

   - blk-mq
        - Prep patch from Christoph, changing blk_mq_alloc_request() to
          take flags instead of just using gfp_t for sleep/nosleep.
        - Doc patch from me, clarifying the difference between legacy
          and blk-mq for timer usage.
        - Fixes from Raghavendra for memory-less numa nodes, and a reuse
          of CPU masks.

   - Cleanup from Geliang Tang, using offset_in_page() instead of open
     coding it.

   - From Ilya, rename request_queue slab to it reflects what it holds,
     and a fix for proper use of bdgrab/put.

   - A real fix for the split across stripe boundaries from Keith.  We
     yanked a broken version of this from 4.4-rc final, this one works.

   - From Mike Krinkin, emit a trace message when we split.

   - From Wei Tang, two small cleanups, not explicitly clearing memory
     that is already cleared"

* 'for-4.5/core' of git://git.kernel.dk/linux-block:
  block: use bd{grab,put}() instead of open-coding
  block: split bios to max possible length
  block: add call to split trace point
  blk-mq: Avoid memoryless numa node encoded in hctx numa_node
  blk-mq: Reuse hardware context cpumask for tags
  blk-mq: add a flags parameter to blk_mq_alloc_request
  Revert "blk-flush: Queue through IO scheduler when flush not required"
  block: clarify blk_add_timer() use case for blk-mq
  bio: use offset_in_page macro
  block: do not initialise statics to 0 or NULL
  block: do not initialise globals to 0 or NULL
  block: rename request_queue slab cache
2016-01-19 15:03:34 -08:00
Dan Williams
34c0fd540e mm, dax, pmem: introduce pfn_t
For the purpose of communicating the optional presence of a 'struct
page' for the pfn returned from ->direct_access(), introduce a type that
encapsulates a page-frame-number plus flags.  These flags contain the
historical "page_link" encoding for a scatterlist entry, but can also
denote "device memory".  Where "device memory" is a set of pfns that are
not part of the kernel's linear mapping by default, but are accessed via
the same memory controller as ram.

The motivation for this new type is large capacity persistent memory
that needs struct page entries in the 'memmap' to support 3rd party DMA
(i.e.  O_DIRECT I/O with a persistent memory source/target).  However,
we also need it in support of maintaining a list of mapped inodes which
need to be unmapped at driver teardown or freeze_bdev() time.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave@sr71.net>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Jerome Marchand
17ec4cd985 zram: don't call idr_remove() from zram_remove()
The use of idr_remove() is forbidden in the callback functions of
idr_for_each().  It is therefore unsafe to call idr_remove in
zram_remove().

This patch moves the call to idr_remove() from zram_remove() to
hot_remove_store().  In the detroy_devices() path, idrs are removed by
idr_destroy().  This solves an use-after-free detected by KASan.

[akpm@linux-foundation.org: fix coding stype, per Sergey]
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Linus Torvalds
875fc4f5dd Merge branch 'akpm' (patches from Andrew)
Merge first patch-bomb from Andrew Morton:

 - A few hotfixes which missed 4.4 becasue I was asleep.  cc'ed to
   -stable

 - A few misc fixes

 - OCFS2 updates

 - Part of MM.  Including pretty large changes to page-flags handling
   and to thp management which have been buffered up for 2-3 cycles now.

  I have a lot of MM material this time.

[ It turns out the THP part wasn't quite ready, so that got dropped from
  this series  - Linus ]

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
  zsmalloc: reorganize struct size_class to pack 4 bytes hole
  mm/zbud.c: use list_last_entry() instead of list_tail_entry()
  zram/zcomp: do not zero out zcomp private pages
  zram: pass gfp from zcomp frontend to backend
  zram: try vmalloc() after kmalloc()
  zram/zcomp: use GFP_NOIO to allocate streams
  mm: add tracepoint for scanning pages
  drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64
  mm/page_isolation: use macro to judge the alignment
  mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE()
  mm: rework virtual memory accounting
  include/linux/memblock.h: fix ordering of 'flags' argument in comments
  mm: move lru_to_page to mm_inline.h
  Documentation/filesystems: describe the shared memory usage/accounting
  memory-hotplug: don't BUG() in register_memory_resource()
  hugetlb: make mm and fs code explicitly non-modular
  mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations
  mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd()
  mm: make sure isolate_lru_page() is never called for tail page
  vmstat: make vmstat_updater deferrable again and shut down on idle
  ...
2016-01-15 11:41:44 -08:00
Sergey Senozhatsky
e02d238c98 zram/zcomp: do not zero out zcomp private pages
Do not __GFP_ZERO allocated zcomp ->private pages.  We keep allocated
streams around and use them for read/write requests, so we supply a
zeroed out ->private to compression algorithm as a scratch buffer only
once -- the first time we use that stream.  For the rest of IO requests
served by this stream ->private usually contains some temporarily data
from the previous requests.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 11:40:52 -08:00
Minchan Kim
75d8947a36 zram: pass gfp from zcomp frontend to backend
Each zcomp backend uses own gfp flag but it's pointless because the
context they could be called is driven by upper layer(ie, zcomp
frontend).  As well, zcomp frondend could call them in different
context.  One context(ie, zram init part) is it should be better to make
sure successful allocation other context(ie, further stream allocation
part for accelarating I/O speed) is just optional so let's pass gfp down
from driver (ie, zcomp frontend) like normal MM convention.

[sergey.senozhatsky@gmail.com: add missing __vmalloc zero and highmem gfps]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 11:40:51 -08:00
Kyeongdon Kim
d913897aba zram: try vmalloc() after kmalloc()
When we're using LZ4 multi compression streams for zram swap, we found
out page allocation failure message in system running test.  That was
not only once, but a few(2 - 5 times per test).  Also, some failure
cases were continually occurring to try allocation order 3.

In order to make parallel compression private data, we should call
kzalloc() with order 2/3 in runtime(lzo/lz4).  But if there is no order
2/3 size memory to allocate in that time, page allocation fails.  This
patch makes to use vmalloc() as fallback of kmalloc(), this prevents
page alloc failure warning.

After using this, we never found warning message in running test, also
It could reduce process startup latency about 60-120ms in each case.

For reference a call trace :

    Binder_1: page allocation failure: order:3, mode:0x10c0d0
    CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
    Call trace:
      dump_backtrace+0x0/0x270
      show_stack+0x10/0x1c
      dump_stack+0x1c/0x28
      warn_alloc_failed+0xfc/0x11c
      __alloc_pages_nodemask+0x724/0x7f0
      __get_free_pages+0x14/0x5c
      kmalloc_order_trace+0x38/0xd8
      zcomp_lz4_create+0x2c/0x38
      zcomp_strm_alloc+0x34/0x78
      zcomp_strm_multi_find+0x124/0x1ec
      zcomp_strm_find+0xc/0x18
      zram_bvec_rw+0x2fc/0x780
      zram_make_request+0x25c/0x2d4
      generic_make_request+0x80/0xbc
      submit_bio+0xa4/0x15c
      __swap_writepage+0x218/0x230
      swap_writepage+0x3c/0x4c
      shrink_page_list+0x51c/0x8d0
      shrink_inactive_list+0x3f8/0x60c
      shrink_lruvec+0x33c/0x4cc
      shrink_zone+0x3c/0x100
      try_to_free_pages+0x2b8/0x54c
      __alloc_pages_nodemask+0x514/0x7f0
      __get_free_pages+0x14/0x5c
      proc_info_read+0x50/0xe4
      vfs_read+0xa0/0x12c
      SyS_read+0x44/0x74
    DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
         0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB

[minchan@kernel.org: change vmalloc gfp and adding comment about gfp]
[sergey.senozhatsky@gmail.com: tweak comments and styles]
Signed-off-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 11:40:51 -08:00
Sergey Senozhatsky
3d5fe03a3e zram/zcomp: use GFP_NOIO to allocate streams
We can end up allocating a new compression stream with GFP_KERNEL from
within the IO path, which may result is nested (recursive) IO
operations.  That can introduce problems if the IO path in question is a
reclaimer, holding some locks that will deadlock nested IOs.

Allocate streams and working memory using GFP_NOIO flag, forbidding
recursive IO and FS operations.

An example:

  inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
  git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
   (jbd2_handle){+.+.?.}, at:  start_this_handle+0x4ca/0x555
  {IN-RECLAIM_FS-W} state was registered at:
     __lock_acquire+0x8da/0x117b
     lock_acquire+0x10c/0x1a7
     start_this_handle+0x52d/0x555
     jbd2__journal_start+0xb4/0x237
     __ext4_journal_start_sb+0x108/0x17e
     ext4_dirty_inode+0x32/0x61
     __mark_inode_dirty+0x16b/0x60c
     iput+0x11e/0x274
     __dentry_kill+0x148/0x1b8
     shrink_dentry_list+0x274/0x44a
     prune_dcache_sb+0x4a/0x55
     super_cache_scan+0xfc/0x176
     shrink_slab.part.14.constprop.25+0x2a2/0x4d3
     shrink_zone+0x74/0x140
     kswapd+0x6b7/0x930
     kthread+0x107/0x10f
     ret_from_fork+0x3f/0x70
  irq event stamp: 138297
  hardirqs last  enabled at (138297):  debug_check_no_locks_freed+0x113/0x12f
  hardirqs last disabled at (138296):  debug_check_no_locks_freed+0x33/0x12f
  softirqs last  enabled at (137818):  __do_softirq+0x2d3/0x3e9
  softirqs last disabled at (137813):  irq_exit+0x41/0x95

               other info that might help us debug this:
   Possible unsafe locking scenario:
         CPU0
         ----
    lock(jbd2_handle);
    <Interrupt>
      lock(jbd2_handle);

                *** DEADLOCK ***
  5 locks held by git/20158:
   #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
   #1:  (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
   #2:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
   #3:  (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
   #4:  (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555

               stack backtrace:
  CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
  Call Trace:
    dump_stack+0x4c/0x6e
    mark_lock+0x384/0x56d
    mark_held_locks+0x5f/0x76
    lockdep_trace_alloc+0xb2/0xb5
    kmem_cache_alloc_trace+0x32/0x1e2
    zcomp_strm_alloc+0x25/0x73 [zram]
    zcomp_strm_multi_find+0xe7/0x173 [zram]
    zcomp_strm_find+0xc/0xe [zram]
    zram_bvec_rw+0x2ca/0x7e0 [zram]
    zram_make_request+0x1fa/0x301 [zram]
    generic_make_request+0x9c/0xdb
    submit_bio+0xf7/0x120
    ext4_io_submit+0x2e/0x43
    ext4_bio_write_page+0x1b7/0x300
    mpage_submit_page+0x60/0x77
    mpage_map_and_submit_buffers+0x10f/0x21d
    ext4_writepages+0xc8c/0xe1b
    do_writepages+0x23/0x2c
    __filemap_fdatawrite_range+0x84/0x8b
    filemap_flush+0x1c/0x1e
    ext4_alloc_da_blocks+0xb8/0x117
    ext4_rename+0x132/0x6dc
    ? mark_held_locks+0x5f/0x76
    ext4_rename2+0x29/0x2b
    vfs_rename+0x540/0x636
    SyS_renameat2+0x359/0x44d
    SyS_rename+0x1e/0x20
    entry_SYSCALL_64_fastpath+0x12/0x6f

[minchan@kernel.org: add stable mark]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 11:40:51 -08:00
Linus Torvalds
7d1fc01afc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  floppy: make local variable non-static
  exynos: fixes an incorrect header guard
  dt-bindings: fixes some incorrect header guards
  cpufreq-dt: correct dead link in documentation
  cpufreq: ARM big LITTLE: correct dead link in documentation
  treewide: Fix typos in printk
  Documentation: filesystem: Fix typo in fs/eventfd.c
  fs/super.c: use && instead of & for warn_on condition
  Documentation: fix sysfs-ptp
  lib: scatterlist: fix Kconfig description
2016-01-14 17:04:19 -08:00
Linus Torvalds
1289ace5b4 SCSI misc on 20160113
This pull includes driver updates from the usual suspects (bfa, arcmsr,
 scsi_dh_alua, lpfc, storvsc, cxlflash).  The major change is the addition of
 the hisi_sas driver, which is an ARM platform device for SAS.  The other
 change of note is an enormous style transformation to the atp870u driver
 (which is our worst written SCSI driver).
 
 Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJWluqqAAoJEDeqqVYsXL0McuwH/1oqvFOagsvoDcwDyNAUR/eW
 VAH454ndIJ0eSXORNfA7ko3ZQKa53x1WN9eKr+RHI7lpGCjwBz2MjnvQsnKISvXp
 K0owkJTcAAF+Wdq7rdNlm1VlQHuLvG8TMTnno+NY3CtxCR2yiRWlctkNkjr0rWUv
 leXJkXZSThkkiY/rEDZZXee8Ajwac87QT+ELEqCT2HueGZD+J8s59JpsOtZdt6Bj
 n94ydOuct8hF3Xt3pdu1oDRpWpoJIyjHtYhdrvzSiKKBHTWtuq1oN0Cwnp0qtEDD
 X3K1Mr0yBmAjTOsK+y+bZnJ1y7qJLLt5ZHmVixkzFWujXPNbrIsyYkV5eI432XA=
 =ggNi
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull first round of SCSI updates from James Bottomley:
 "This includes driver updates from the usual suspects (bfa, arcmsr,
  scsi_dh_alua, lpfc, storvsc, cxlflash).

  The major change is the addition of the hisi_sas driver, which is an
  ARM platform device for SAS.  The other change of note is an enormous
  style transformation to the atp870u driver (which is our worst written
  SCSI driver)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (169 commits)
  cxlflash: Enable device id for future IBM CXL adapter
  cxlflash: Resolve oops in wait_port_offline
  cxlflash: Fix to resolve cmd leak after host reset
  cxlflash: Removed driver date print
  cxlflash: Fix to avoid virtual LUN failover failure
  cxlflash: Fix to escalate LINK_RESET also on port 1
  storvsc: Tighten up the interrupt path
  storvsc: Refactor the code in storvsc_channel_init()
  storvsc: Properly support Fibre Channel devices
  storvsc: Fix a bug in the layout of the hv_fc_wwn_packet
  mvsas: Add SGPIO support to Marvell 94xx
  mpt3sas: A correction in unmap_resources
  hpsa: Add box and bay information for enclosure devices
  hpsa: Change SAS transport devices to bus 0.
  hpsa: fix path_info_show
  cciss: print max outstanding commands as a hex value
  scsi_debug: Increase the reported optimal transfer length
  lpfc: Update version to 11.0.0.10 for upstream patch set
  lpfc: Use kzalloc instead of kmalloc
  lpfc: Delete unnecessary checks before the function call "mempool_destroy"
  ...
2016-01-13 19:37:36 -08:00
Arnd Bergmann
e93d12ae3b null_blk: use sector_div instead of do_div
Dividing a sector_t number should be done using sector_div rather than do_div
to optimize the 32-bit sector_t case, and with the latest do_div optimizations,
we now get a compile-time warning for this:

arch/arm/include/asm/div64.h:32:95: note: expected 'uint64_t * {aka long long unsigned int *}' but argument is of type 'sector_t * {aka long unsigned int *}'
drivers/block/null_blk.c:521:81: warning: comparison of distinct pointer types lacks a cast

This changes the newly added code to use sector_div. It is a simplified version
of the original patch, as Linus Torvalds pointed out that we should not be using
an expensive division function in the first place.

This version was suggested by Matias Bjorling.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Matias Bjorling <m@bjorling.me>
Fixes: b2b7e00148 ("null_blk: register as a LightNVM device")
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-13 15:10:34 -07:00
Jens Axboe
038a75afc5 Merge branch 'stable/for-jens-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-4.5/drivers
Konrad writes:

The pull is based on converting the backend driver into an multiqueue
driver and exposing more than one queue to the frontend. As such we had
to modify the frontend and also fix a bunch of bugs around this.

The original work is based on Arianna Avanzini's work as an OPW intern.
Bob took over the work and had been massaging it for quite some time.

Also included are are features to 64KB page support for ARM and various
bug-fixes.
2016-01-13 08:20:36 -07:00
Matias Bjørling
91276162de lightnvm: refactor end_io functions for sync
To implement sync I/O support within the LightNVM core, the end_io
functions are refactored to take an end_io function pointer instead of
testing for initialized media manager, followed by calling its end_io
function.

Sync I/O can then be implemented using a callback that signal I/O
completion. This is similar to the logic found in blk_to_execute_io().
By implementing it this way, the underlying device I/Os submission logic
is abstracted away from core, targets, and media managers.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 08:21:16 -07:00
Al Viro
263a3df18f nbd: use ->compat_ioctl()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08 21:20:32 -05:00
Al Viro
6108209c4a Merge branch 'for-linus' into work.misc 2016-01-08 21:20:11 -05:00
Zhu Yanjun
9e35fdcb9c mtip32xx: restrict variables visible in current code module
The modified variables are only used in the file mtip32xx.c.
As such, the static keyword is inserted to define that object
to be only visible to the current code module during compilation.

Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-08 11:47:53 -07:00
James Bottomley
abaee091a1 Merge branch 'jejb-scsi' into misc 2016-01-07 15:51:13 -08:00
Al Viro
820351f05b rsxx: don't open-code memdup_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 08:25:24 -05:00
Al Viro
8ed6010d50 mtip32xx: don't open-code memdup_user()
[folded a fix by Dan Carpenter]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 08:24:39 -05:00
Colin Ian King
a8036dfba9 cciss: print max outstanding commands as a hex value
The max outstanding commands is being printed with a 0x prefix to
suggest it is a hex value, when in fact the integer decimal %d format
specifier is being used and this is a bit confusing. Use %x instead to
match the proceeding 0x prefix.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-01-04 19:45:01 -05:00
Konrad Rzeszutek Wilk
c31ecf6c12 xen/blkfront: Fix crash if backend doesn't follow the right states.
We have split the setting up of all the resources in two steps:
1) talk_to_blkback  - which figures out the num_ring_pages (from
   the default value of zero), sets up shadow and so
2) blkfront_connect - does the real part of filling out the
   internal structures.

The problem is if we bypass the 1) step and go straight to 2)
and call blkfront_setup_indirect where we use the macro
BLK_RING_SIZE - which returns an negative value (because
sz is zero  - since num_ring_pages is zero - since it has never
been set).

We can fix this by making sure that we always have called
talk_to_blkback before going to blkfront_connect.

Or we could set in blkfront_probe info->nr_ring_pages = 1
to have a default value. But that looks odd - as we haven't
actually negotiated any ring size.

This patch changes XenbusStateConnected state to detect if
we haven't done the initial handshake - and if so continue
on as if were in XenbusStateInitWait state.

We also roll the error recovery (freeing the structure) into
talk_to_blkback error path - which is safe since that function
is only called from blkback_changed.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04 12:21:26 -05:00
Bob Liu
93bb277f97 xen/blkback: Fix two memory leaks.
This patch fixs two memleaks:
  backtrace:
    [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50
    [<ffffffff81205e3b>] kmem_cache_alloc+0xbb/0x1d0
    [<ffffffff81534028>] xen_blkbk_probe+0x58/0x230
    [<ffffffff8146adb6>] xenbus_dev_probe+0x76/0x130
    [<ffffffff81511716>] driver_probe_device+0x166/0x2c0
    [<ffffffff815119bc>] __device_attach_driver+0xac/0xb0
    [<ffffffff8150fa57>] bus_for_each_drv+0x67/0x90
    [<ffffffff81511ab7>] __device_attach+0xc7/0x120
    [<ffffffff81511b23>] device_initial_probe+0x13/0x20
    [<ffffffff8151059a>] bus_probe_device+0x9a/0xb0
    [<ffffffff8150f0a1>] device_add+0x3b1/0x5c0
    [<ffffffff8150f47e>] device_register+0x1e/0x30
    [<ffffffff8146a9e8>] xenbus_probe_node+0x158/0x170
    [<ffffffff8146abaf>] xenbus_dev_changed+0x1af/0x1c0
    [<ffffffff8146b1bb>] backend_changed+0x1b/0x20
    [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160
unreferenced object 0xffff880007ba8ef8 (size 224):

  backtrace:
    [<ffffffff817ba5e8>] kmemleak_alloc+0x28/0x50
    [<ffffffff81205c73>] __kmalloc+0xd3/0x1e0
    [<ffffffff81534d87>] frontend_changed+0x2c7/0x580
    [<ffffffff8146af12>] xenbus_otherend_changed+0xa2/0xb0
    [<ffffffff8146b2c0>] frontend_changed+0x10/0x20
    [<ffffffff81468ca6>] xenwatch_thread+0xb6/0x160
    [<ffffffff810d3e97>] kthread+0xd7/0xf0
    [<ffffffff817c4a9f>] ret_from_fork+0x3f/0x70
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff8800048dcd38 (size 224):

The first leak is caused by not put() the be->blkif reference
which we had gotten in xen_blkif_alloc(), while the second is
us not freeing blkif->rings in the right place.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04 12:21:26 -05:00
Bob Liu
db6fbc1067 xen/blkback: make st_ statistics per ring
Make st_* statistics per ring and the VBD sysfs would iterate over all the
rings.

Note: xenvbd_sysfs_delif() is called in xen_blkbk_remove() before all rings
are torn down, so it's safe.

Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
v2: Aligned the variables on the same column.
2016-01-04 12:21:25 -05:00
Julien Grall
6cc5683390 xen/blkfront: Handle non-indirect grant with 64KB pages
The minimal size of request in the block framework is always PAGE_SIZE.
It means that when 64KB guest is support, the request will at least be
64KB.

Although, if the backend doesn't support indirect descriptor (such as QDISK
in QEMU), a ring request is only able to accommodate 11 segments of 4KB
(i.e 44KB).

The current frontend is assuming that an I/O request will always fit in
a ring request. This is not true any more when using 64KB page
granularity and will therefore crash during boot.

On ARM64, the ABI is completely neutral to the page granularity used by
the domU. The guest has the choice between different page granularity
supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB).
This can't be enforced by the hypervisor and therefore it's possible to
run guests using different page granularity.

So we can't mandate the block backend to support indirect descriptor
when the frontend is using 64KB page granularity and have to fix it
properly in the frontend.

The solution exposed below is based on modifying directly the frontend
guest rather than asking the block framework to support smaller size
(i.e < PAGE_SIZE). This is because the change is the block framework are
not trivial as everything seems to relying on a struct *page (see [1]).
Although, it may be possible that someone succeed to do it in the future
and we would therefore be able to use it.

Given that a block request may not fit in a single ring request, a
second request is introduced for the data that cannot fit in the first
one. This means that the second ring request should never be used on
Linux if the page size is smaller than 44KB.

To achieve the support of the extra ring request, the block queue size
is divided by two. Therefore, the ring will always contain enough space
to accommodate 2 ring requests. While this will reduce the overall
performance, it will make the implementation more contained. The way
forward to get better performance is to implement in the backend either
indirect descriptor or multiple grants ring.

Note that the parameters blk_queue_max_* helpers haven't been updated.
The block code will set the mimimum size supported and we may be able
to support directly any change in the block framework that lower down
the minimal size of a request.

[1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04 12:21:25 -05:00
Julien Grall
2e073969d5 xen-blkfront: Introduce blkif_ring_get_request
The code to get a request is always the same. Therefore we can factorize
it in a single function.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04 12:21:24 -05:00
Jiri Kosina
a6e7af1288 xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule()
xen_blkif_schedule() kthread calls try_to_freeze() at the beginning of
every attempt to purge the LRU. This operation can't ever succeed though,
as the kthread hasn't marked itself as freezable.

Before (hopefully eventually) kthread freezing gets converted to fileystem
freezing, we'd rather mark xen_blkif_schedule() freezable (as it can
generate I/O during suspend).

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04 12:21:24 -05:00
Konrad Rzeszutek Wilk
2d0382fac1 xen/blkback: Free resources if connect_ring failed.
With the multi-queue support we could fail at setting up
some of the rings and fail the connection. That meant that
all resources tied to rings[0..n-1] (where n is the ring
that failed to be setup). Eventually the frontend will switch
to the states and we will call xen_blkif_disconnect.

However we do not want to be at the mercy of the frontend
deciding when to change states. This allows us to do the
cleanup right away and freeing resources.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04 12:21:07 -05:00
Konrad Rzeszutek Wilk
bde21f73b9 xen/blocks: Return -EXX instead of -1
Lets return sensible values instead of -1.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04 12:21:07 -05:00