Commit Graph

687884 Commits

Author SHA1 Message Date
Linus Torvalds
37949075ab OpenRISC fixes for 4.13
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZX+NnAAoJEMOzHC1eZifk1g4P/1TN07Vov7Yvtwnbsbu2oPNc
 6TvvZd3pZjW7FkunpuN2JlkWwJPyjXQgrCSOA/Qt+6E02PIW37vnCWZYEHN+tsw5
 guT7PtB1LzuvJ0VyuArnTNfeeJ/uNbgmVMUm5kg6vlc5p5qP7rd002KhZG3WJHCu
 zpd7hGtvB/oLln0t0WWR9tO2Sycb35QBy1J/m4HleEHQ1G8OMs3uHuKDT8unyX8M
 PGu/3Vim3vh+/esyEbYGksazvOa8Z2+b8jaS0QmB3BPYlcxRp+xFD31mg/Z8gg3J
 5KjfPoOoB6G0cKmx/DF3I1TtlD5aaTot4o4YfYaNUWAC8Un4hENJVPTlpHH7Mhyd
 q6xlTC3W2VHebFfbkmJ2x6SlvAresD8eCIYl5XuOJTIBEBGTtCx/BwZu57UFxlBL
 awo/MiGVspL/PwvMESwsWNImq5vn6p1NYFC9MnvbMk9uWiXYZ50iRKG8QQ4t5Dsc
 qsRtlrPOkPCDNzADfxw8fTdfabJxP5WUQtmNDBPGGnygWQvT12CiGfA7Jak3+5KN
 EcTFeihGbbxC8lku+82Ofjj5/6Ro91exb/1/tYblOX2NtP058TsceZQQgosMDcHO
 KGhHT8HaSha5huGNIh704SqARJZ7lf0CC5E6CAKqS7/BOrN/rFbXh0wFh9i/a+qW
 5cGIdNX+CK6+VwZKITkx
 =aihT
 -----END PGP SIGNATURE-----

Merge tag 'openrisc-for-linus' of git://github.com/openrisc/linux

Pull OpenRISC updates from Stafford Horne:
 "Openrisc fixes for this 4.13 merge window, there is not really much
  here:

   - include cleanups, one with should reduce build time slightly

   - switch to new toolchain to new (>2 year old) toolchain prefix"

* tag 'openrisc-for-linus' of git://github.com/openrisc/linux:
  openrisc: defconfig: Cleanup from old Kconfig options
  openrisc: explicitly include linux/bug.h in asm/fixmap.h
  openrisc: Switch to use export.h instead of module.h
  openrisc: Change toolchain from or32- to or1k-
2017-07-07 13:58:49 -07:00
Linus Torvalds
d691b7e7d1 powerpc updates for 4.13
Highlights include:
 
  - Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
 
  - Platform support for FSP2 (476fpe) board
 
  - Enable ZONE_DEVICE on 64-bit server CPUs.
 
  - Generic & powerpc spin loop primitives to optimise busy waiting
 
  - Convert VDSO update function to use new update_vsyscall() interface
 
  - Optimisations to hypercall/syscall/context-switch paths
 
  - Improvements to the CPU idle code on Power8 and Power9.
 
 As well as many other fixes and improvements.
 
 Thanks to:
   Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman Khandual, Anton
   Blanchard, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
   Lombard, Colin Ian King, Dan Carpenter, Gautham R. Shenoy, Hari Bathini, Ian
   Munsie, Ivan Mikhaylov, Javier Martinez Canillas, Madhavan Srinivasan,
   Masahiro Yamada, Matt Brown, Michael Neuling, Michal Suchanek, Murilo
   Opsfelder Araujo, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul
   Mackerras, Pavel Machek, Russell Currey, Santosh Sivaraj, Stephen Rothwell,
   Thiago Jung Bauermann, Yang Li.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZXyPCAAoJEFHr6jzI4aWAI9QQAISf2x5y//cqCi4ISyQB5pTq
 KLS/yQajNkQOw7c0fzBZOaH5Xd/SJ6AcKWDg8yDlpDR3+sRRsr98iIRECgKS5I7/
 DxD9ywcbSoMXFQQo1ZMCp5CeuMUIJRtugBnUQM+JXCSUCPbznCY5DchDTLyTBTpO
 MeMVhI//JxthhoOMA9MudiEGaYCU9ho442Z4OJUSiLUv8WRbvQX9pTqoc4vx1fxA
 BWf2mflztBVcIfKIyxIIIlDLukkMzix6gSYPMCbC7lzkbnU7JSqKiheJXjo1gJS2
 ePHKDxeNR2/QP0g/j3aT/MR1uTt9MaNBSX3gANE1xQ9OoJ8m1sOtCO4gNbSdLWae
 eXhDnoiEp930DRZOeEioOItuWWoxFaMyYk3BMmRKV4QNdYL3y3TRVeL2/XmRGqYL
 Lxz4IY/x5TteFEJNGcRX90uizNSi8AaEXPF16pUk8Ctt6eH3ZSwPMv2fHeYVCMr0
 KFlKHyaPEKEoztyzLcUR6u9QB56yxDN58bvLpd32AeHvKhqyxFoySy59x9bZbatn
 B2y8mmDItg860e0tIG6jrtplpOVvL8i5jla5RWFVoQDuxxrLAds3vG9JZQs+eRzx
 Fiic93bqeUAS6RzhXbJ6QQJYIyhE7yqpcgv7ME1W87SPef3HPBk9xlp3yIDwdA2z
 bBDvrRnvupusz8qCWrxe
 =w8rj
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights include:

   - Support for STRICT_KERNEL_RWX on 64-bit server CPUs.

   - Platform support for FSP2 (476fpe) board

   - Enable ZONE_DEVICE on 64-bit server CPUs.

   - Generic & powerpc spin loop primitives to optimise busy waiting

   - Convert VDSO update function to use new update_vsyscall() interface

   - Optimisations to hypercall/syscall/context-switch paths

   - Improvements to the CPU idle code on Power8 and Power9.

  As well as many other fixes and improvements.

  Thanks to: Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman
  Khandual, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt,
  Christophe Leroy, Christophe Lombard, Colin Ian King, Dan Carpenter,
  Gautham R. Shenoy, Hari Bathini, Ian Munsie, Ivan Mikhaylov, Javier
  Martinez Canillas, Madhavan Srinivasan, Masahiro Yamada, Matt Brown,
  Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Naveen N.
  Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pavel Machek,
  Russell Currey, Santosh Sivaraj, Stephen Rothwell, Thiago Jung
  Bauermann, Yang Li"

* tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
  powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs
  powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix
  powerpc/mm/hash: Implement mark_rodata_ro() for hash
  powerpc/vmlinux.lds: Align __init_begin to 16M
  powerpc/lib/code-patching: Use alternate map for patch_instruction()
  powerpc/xmon: Add patch_instruction() support for xmon
  powerpc/kprobes/optprobes: Use patch_instruction()
  powerpc/kprobes: Move kprobes over to patch_instruction()
  powerpc/mm/radix: Fix execute permissions for interrupt_vectors
  powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()
  powerpc/64s: Blacklist rtas entry/exit from kprobes
  powerpc/64s: Blacklist functions invoked on a trap
  powerpc/64s: Un-blacklist system_call() from kprobes
  powerpc/64s: Move system_call() symbol to just after setting MSR_EE
  powerpc/64s: Blacklist system_call() and system_call_common() from kprobes
  powerpc/64s: Convert .L__replay_interrupt_return to a local label
  powerpc64/elfv1: Only dereference function descriptor for non-text symbols
  cxl: Export library to support IBM XSL
  powerpc/dts: Use #include "..." to include local DT
  powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
  ...
2017-07-07 13:55:45 -07:00
Linus Torvalds
b59eea554f vfs: fix flock compat thinko
Michael Ellerman reported that commit 8c6657cb50 ("Switch flock
copyin/copyout primitives to copy_{from,to}_user()") broke his
networking on a bunch of PPC machines (64-bit kernel, 32-bit userspace).

The reason is a brown-paper bug by that commit, which had the arguments
to "copy_flock_fields()" in the wrong order, breaking the compat
handling for file locking.  Apparently very few people run 32-bit user
space on x86 any more, so the PPC people got the honor of noticing this
"feature".

Michael also sent a minimal diff that just changed the order of the
arguments in that macro.

This is not that minimal diff.

This not only changes the order of the arguments in the macro, it also
changes them to be pointers (to be consistent with all the other uses of
those pointers), and makes the functions that do all of this also have
the proper "const" attribution on the source pointers in order to make
issues like that (using the source as a destination) be really obvious.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-07 13:48:18 -07:00
Linus Torvalds
6481352082 USB fixes for 4.13-rc1
Here are some remaining USB fixes for 4.13-rc1.  They were originally
 scheduled for 4.12-final, but I didn't send them to you in time.
 Because of that, they were in a separate branch from the larger USB set
 of patches, so here they are in a separate pull request.
 
 Nothing major here a all, just 3 small patches:
 	- some usb-serial new device ids
 	- xhci bugfix for some crazy AMD hardware
 
 All of these have been in linux-next for a long time with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWV9FRg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylHxgCfTT31v4FaxH2k76qA1ZnHdXWHqyQAn3oyOE4T
 uJhpp1FbFR1VNXM/jRox
 =PCPm
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some remaining USB fixes for 4.13-rc1. They were originally
  scheduled for 4.12-final, but I didn't send them to you in time.
  Because of that, they were in a separate branch from the larger USB
  set of patches, so here they are in a separate pull request.

  Nothing major here a all, just three small patches:

   - some usb-serial new device ids
   - xhci bugfix for some crazy AMD hardware

  All of these have been in linux-next for a long time with no reported
  issues"

* tag 'usb-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  xhci: Limit USB2 port wake support for AMD Promontory hosts
  USB: serial: qcserial: new Sierra Wireless EM7305 device ID
  USB: serial: option: add two Longcheer device ids
2017-07-07 13:42:04 -07:00
Linus Torvalds
df7cb187ed - Core Frameworks
- Report correct error status to user
 
  - Fix-ups
    - Move Backlight headers out of I2C; adp8860, adp8870
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJZXgAWAAoJEFGvii+H/HdhlrQP/j+PXYo6jgYg+uoNvlKpgBWD
 bkx2JM9fXA6ZFaEhcvU1LWCt+CqnBeL6ujDb7GBJTRrazws1vZ2Snjrbl/go+6Y2
 h4thCecO9AJG1YaDXsWdDjLu9Flx+ZqKS7r+fyp/0O+s+G78SiQZHfoPKNplfrD3
 TQpw4MIMRPqeYILgIzOXkBg+HEa2L5oQD1zyr6wIUyqNQIQIR7cUHdVutd241BTD
 JitNk39e8CW8jw9zpYJKb7KxdwLtgm7iU6Lf+DWBffVtzh4CY7vJC0WZla7W6lhL
 uzBH5mgvmfpRg8/FL37dgQrmal1MHzkXJ5EHQk+08D+Yk6b2vqw6PBDSQ2ZZJmr5
 SLyLZK1Y1s2+wX+X7fit71QrcUWDUSYOhh+9GKDwcVfNqacWzro1i2p2ovOTntW4
 JbGx8u+UU9pcz1sY1qljjYKxi6G1EAvHbwwobe3syleqh7I09RsnSKqi6ZEXbLAE
 Nfxe4K+djnoirtjFL8vpIy0rXFGQKccIAisid5rFnXjL9rVB/7FiQWGovav+9YuG
 rpr+dQJe5FlLaeTx9CDZeFeefwVyaDOBVJsVC0yghJoz3o2tz/et5zhBEAtLtMQP
 DkS+mEFYLv3ZkXy2Wu4jU7y4jPoeRwTQrGaaUnsh/H+/dAmY6LH6Ccl2My+H0RBi
 wAFUl1yW8G+qgr8mhT7y
 =QD8S
 -----END PGP SIGNATURE-----

Merge tag 'backlight-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight

Pull backlight updates from Lee Jones:
 "Core Framework:
   - Report correct error status to user

  Fix-ups:
   - Move Backlight headers out of I2C (adp8860, adp8870)"

* tag 'backlight-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  video: adp8870: move header file out of I2C realm
  backlight: adp8860: Move header file out of I2C realm
  backlight: Report error on failure
2017-07-07 13:38:26 -07:00
Linus Torvalds
6972b007ca Merge (most of) tag 'mfd-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
 "New Drivers:
   - Intel Cherry Trail Whiskey Cove PMIC
   - TI LP87565 PMIC

  New Device Support:
   - Add support for Cannonlake to intel-lpss-pci
   - Add support for Simatic IOT2000 to intel_quark_i2c_gpio

  New Functionality:
   - Add Regulator support (axp20x)

  Fix-ups:
   - Rework IRQ handling (intel_soc_pmic_bxtwc, rtsx_pcr, cros_ec)
   - Remove unused/unwelcome code (ipaq-micro, wm831x-core, da9062-core)
   - Provide deregistration on unbind (rn5t618)
   - Rework DT code/documentation (arizona)
   - Constify things (fsl-imx25-tsadc)
   - MAINTAINERS updates (DA9062/61)
   - Kconfig configuration adaptions (INTEL_SOC_PMIC, MFD_AXP20X_I2C)
   - Switch to DMI matching (intel_quark_i2c_gpio)
   - Provide an appropriate level of error checking (wm831x-{i2c,spi},
     twl4030-irq, tc6393xb)
   - Make use of devm_* (resource handling) calls (intel_soc_pmic_bxtwc,
     stm32-timers, atmel-flexcom, cros_ec, fsl-imx25-tsadc,
     exynos-lpass, palmas, qcom-spmi-pmic, smsc-ece1099,
     motorola-cpcap)"

[ Skipped the last commit in that series that added eight thousand
  lines of pointless repeated register definitions.  - Linus ]

* tag 'mfd-next-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (38 commits)
  mfd: Add LP87565 PMIC support
  mfd: cros_ec: Free IRQ on exit
  dt-bindings: vendor-prefixes: Add arctic to vendor prefix
  mfd: da9061: Fix to remove BBAT_CONT register from chip model
  mfd: da9061: Fix to remove BBAT_CONT register from chip model
  mfd: axp20x-i2c: Document that this must be builtin on x86
  mfd: Add Cherry Trail Whiskey Cove PMIC driver
  mfd: tc6393xb: Handle return value of clk_prepare_enable
  mfd: intel_quark_i2c_gpio: Add support for SIMATIC IOT2000 platform
  mfd: intel_quark_i2c_gpio: Use dmi_system_id table for retrieving frequency
  mfd: motorola-cpcap: Use devm_of_platform_populate()
  mfd: smsc-ece: Use devm_of_platform_populate()
  mfd: qcom-spmi-pmic: Use devm_of_platform_populate()
  mfd: palmas: Use devm_of_platform_populate()
  mfd: exynos: Use devm_of_platform_populate()
  mfd: fsl-imx25: Use devm_of_platform_populate()
  mfd: cros_ec: Use devm_of_platform_populate()
  mfd: atmel: Use devm_of_platform_populate()
  mfd: stm32-timers: Use devm_of_platform_populate()
  mfd: intel_soc_pmic: Select designware i2c-bus driver
  ...
2017-07-07 13:30:05 -07:00
Linus Torvalds
c7d28eca1d This is the bulk of GPIO changes for the v4.13 series:
Core:
 - Export add/remove for lookup tables so that modules can export GPIO
   descriptor tables.
 - Handle GPIO sleep states: it is now possible to flag that a GPIO line
   may loose its state during suspend/resume of the system to save
   power. This is used in the Wolfson Micro Arizona driver.
 - ACPI-based GPIO was tightened up a lot around the edges.
 - Use bitmap_fill() to speed up a loop.
 
 New drivers:
 - Exar XRA1403 SPI-based GPIO.
 - MVEBU driver now supports Armada 7K and 8K.
 - LP87565 PMIC GPIO.
 - Renesas R-CAR R8A7743 (RZ/G1M).
 - The new IOT2040 8250 serial/GPIO also comes in through this
   changeset.
 
 Substantial driver changes:
 - Seriously fix the Exar 8250 GPIO portions to work.
 - The MCP23S08 was moved out to a pin control driver.
 - Convert MEVEBU to use regmap for register access.
 - Drop Vulcan support from the Broadcom driver.
 - Serious cleanup and improvement of the mockup driver, giving us a
   better test coverage.
 
 Misc:
 - Lots of janitorial clean up.
 - A bunch of documentation fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZX1MjAAoJEEEQszewGV1zEYUQALFsjJH7D2mRN4TSSEeVAcYr
 Uz52uupsou8tgW0IupRb/khO+V6zgd7j+kHDJLMxX+rCTw3pTq5+XGyi5+iNpxof
 TIIT1XBx4eq7Q/n4nWdGodHbHN9BXw7cGsNmTb1TS/G/6h1wOKxfzjvUNhDAC+2v
 idPy6B5G+WrDsYpBtTWlKHKQKVqbUlhLFyJYoglzqIeM5L9Ry/UoZ6sGleho3hKn
 Vlg/hMtkCexnVO9zopBe5CuEfseLrkcCgCvtQ713egzVXApryp4hqm3Xti20Ntgy
 OxnKhmVyloqd0kU0qLSpvDAf7B1invbHHbeZsag6wluTMrxgUYJONuonrqGeGiwB
 FBDtw9SGn2GlEXcs7sg8ANmAyr2XxxezKXD9XLBL5jadNB2KCY5yKMv1IK3VnYdq
 gEpFAiZ5cmlpZxIXqlyeZP6LKHNTci4amb33x1I/ghH2BTkGQ/3E3anXEbPNWF8G
 DDE6nrSgU0oQcNqRHyZaWNZpUIz4aFUgJtOEO4lYYP4+VzYSKTdrHseTiiJ91J7E
 WBz9p5JvSnB22+60RhyTAPjVjXgWa30nidf7WGCK0UHiIYffihCxGZRTlrhoEEUB
 fXgveJpqxLopYvxpUxi1OqlPYYo7zKRF5BzHsjKMpdVYXfdMdvs7eq2g/X889i1D
 WpbE9LyAH9FY5BM8YjFX
 =TpW1
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO updates from Linus Walleij:
 "This is the bulk of GPIO changes for the v4.13 series.

  Some administrativa:

  I have a slew of 8250 serial patches and the new IOT2040 serial+GPIO
  driver coming in through this tree, along with a whole bunch of Exar
  8250 fixes. These are ACKed by Greg and also hit drivers/platform/*
  where they are ACKed by Andy Shevchenko.

  Speaking about drivers/platform/* there is also a bunch of ACPI stuff
  coming through that route, again ACKed by Andy.

  The MCP23S08 changes are coming in here as well. You already have the
  commits in your tree, so this is just a result of sharing an immutable
  branch between pin control and GPIO.

  Core:
   - Export add/remove for lookup tables so that modules can export GPIO
     descriptor tables.
   - Handle GPIO sleep states: it is now possible to flag that a GPIO
     line may loose its state during suspend/resume of the system to
     save power. This is used in the Wolfson Micro Arizona driver.
   - ACPI-based GPIO was tightened up a lot around the edges.
   - Use bitmap_fill() to speed up a loop.

  New drivers:
   - Exar XRA1403 SPI-based GPIO.
   - MVEBU driver now supports Armada 7K and 8K.
   - LP87565 PMIC GPIO.
   - Renesas R-CAR R8A7743 (RZ/G1M).
   - The new IOT2040 8250 serial/GPIO also comes in through this
     changeset.

  Substantial driver changes:
   - Seriously fix the Exar 8250 GPIO portions to work.
   - The MCP23S08 was moved out to a pin control driver.
   - Convert MEVEBU to use regmap for register access.
   - Drop Vulcan support from the Broadcom driver.
   - Serious cleanup and improvement of the mockup driver, giving us a
     better test coverage.

  Misc:
   - Lots of janitorial clean up.
   - A bunch of documentation fixes"

* tag 'gpio-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (70 commits)
  serial: exar: Add support for IOT2040 device
  gpio-exar/8250-exar: Make set of exported GPIOs configurable
  platform: Accept const properties
  serial: exar: Factor out platform hooks
  gpio-exar/8250-exar: Rearrange gpiochip parenthood
  gpio: exar: Fix iomap request
  gpio-exar/8250-exar: Do not even instantiate a GPIO device for Commtech cards
  serial: uapi: Add support for bus termination
  gpio: rcar: Add R8A7743 (RZ/G1M) support
  gpio: gpio-wcove: Fix GPIO control register offset calculation
  gpio: lp87565: Add support for GPIO
  gpio: dwapb: fix missing first irq for edgeboth irq type
  MAINTAINERS: Take maintainership for GPIO ACPI support
  gpio: exar: Fix reading of directions and values
  gpio: exar: Allocate resources on behalf of the platform device
  gpio-exar/8250-exar: Fix passing in of parent PCI device
  gpio: mockup: use devm_kcalloc() where applicable
  gpio: mockup: add myself as author
  gpio: mockup: improve the error message
  gpio: mockup: don't return magic numbers from probe()
  ...
2017-07-07 12:40:27 -07:00
Krzysztof Kozlowski
2e08a0ef76 openrisc: defconfig: Cleanup from old Kconfig options
Remove old, dead Kconfig option INET_LRO. It is gone since
commit 7bbf3cae65 ("ipv4: Remove inet_lro library").

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Stafford Horne <shorne@gmail.com>
2017-07-08 04:35:30 +09:00
Tobias Klauser
e687448ca8 openrisc: explicitly include linux/bug.h in asm/fixmap.h
openrisc's asm/fixmap.h uses the BUG() and BUG_ON() macros but relies on
implict inclusion of linux/bug.h which means that changes in other
headers could break the build. Thus, add an explicit include.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Stafford Horne <shorne@gmail.com>
2017-07-08 04:35:17 +09:00
Linus Torvalds
dddd564dbb This time we've got one core change to introduce a bulk clk_get API,
some new clk drivers and updates for old ones. The diff is pretty
 spread out across a handful of different SoC clk drivers for Broadcom, TI,
 Qualcomm, Renesas, Rockchip, Samsung, and Allwinner, mostly due to the
 introduction of new drivers.
 
 Core:
  - New clk bulk get APIs
  - Clk divider APIs gained the ability to consider a different parent than
    the current one
 
 New Drivers:
  - Renesas r8a779{0,1,2,4} CPG/MSSR
  - TI Keystone SCI firmware controlled clks and OMAP4 clkctrl
  - Qualcomm IPQ8074 SoCs
  - Cortina Systems Gemini (SL3516/CS3516)
  - Rockchip rk3128 SoCs
  - Allwinner A83T clk control units
  - Broadcom Stingray SoCs
  - CPU clks for Mediatek MT8173/MT2701/MT7623 SoCs
 
 Removed Drivers:
  - Old non-DT version of the Realview clk driver
 
 Updates:
  - Renesas Kconfig/Makefile cleanups
  - Amlogic CEC EE clk support
  - Improved Armada 7K/8K cp110 clk support
  - Rockchip clk id exposing, critical clk markings
  - Samsung converted to clk_hw registration APIs
  - Fixes for Samsung exynos5420 audio clks
  - USB2 clks for Hisilicon hi3798cv200 SoC and video/camera clks for hi3660
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJZXujtAAoJEK0CiJfG5JUl8vIQAKbcH3rX+CS4jrg7Hs2Ghnhn
 ZbTf7vZYa6K7iuL7JHITEScAQ8+l0Bl7eWSfJZRt4oUW3Jt4F+AIs8qBofZAWn4M
 m+kDHs/IfAUITZp/unM/ogFfVcboZObjAK/A2yyRVyMxRkIyyUb6r7SDVpCpGyxU
 1YDAdis2M3F5J9CGV/tpmobnksMUlCnJlI0OGtMUnvY6mDkf8Re89sayMnQ/1Mgp
 CL1YwnqZ0L6rT664IMo74bB7UNjXdMZsuCeITkU+hMVq4NMXErKCcn8lHvP9P+uP
 AoZ8bf9WaQ/CglGFeeFrNQGUf+tiTlYxlVvvNFXR5+rmhu/yKxNI67APaupeERVl
 jMISKAC/A+C1j6JVMCqjM3d75F47SzuZQuQY0ZD0DWoqP9PBzV6IyThHIqWrN5O4
 IceLmD8BrwW+h8bs2SIubIygOGMMqGhVi2XaAAWpmRke7JzmSFOOyE3YGPisaBAq
 EcIF2i2jJ6Ja4rClgfQKOsx25MOILsIp/sMU6iC7U1h4NDj8yP5A13n60U6DuZhu
 ttjN+bXugR81R+bWyzC6Zl/KXF83Ka3ZSJs+XblunPRGKt2q6Kj12HBspkWL1QjY
 aLEEg3fpI/ovQoTMXHj7/G1MD60rxoHCuOjBwSWEQBzA1MiHol+ab/mZKfPsy50C
 116G1XJgtgrLxE00iZ6K
 =Yar+
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
 "This time we've got one core change to introduce a bulk clk_get API,
  some new clk drivers and updates for old ones. The diff is pretty
  spread out across a handful of different SoC clk drivers for Broadcom,
  TI, Qualcomm, Renesas, Rockchip, Samsung, and Allwinner, mostly due to
  the introduction of new drivers.

  Core:
   - New clk bulk get APIs
   - Clk divider APIs gained the ability to consider a different parent
     than the current one

  New Drivers:
   - Renesas r8a779{0,1,2,4} CPG/MSSR
   - TI Keystone SCI firmware controlled clks and OMAP4 clkctrl
   - Qualcomm IPQ8074 SoCs
   - Cortina Systems Gemini (SL3516/CS3516)
   - Rockchip rk3128 SoCs
   - Allwinner A83T clk control units
   - Broadcom Stingray SoCs
   - CPU clks for Mediatek MT8173/MT2701/MT7623 SoCs

  Removed Drivers:
   - Old non-DT version of the Realview clk driver

  Updates:
   - Renesas Kconfig/Makefile cleanups
   - Amlogic CEC EE clk support
   - Improved Armada 7K/8K cp110 clk support
   - Rockchip clk id exposing, critical clk markings
   - Samsung converted to clk_hw registration APIs
   - Fixes for Samsung exynos5420 audio clks
   - USB2 clks for Hisilicon hi3798cv200 SoC and video/camera clks for
     hi3660"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (147 commits)
  clk: gemini: Read status before using the value
  clk: scpi: error when clock fails to register
  clk: at91: Add sama5d2 suspend/resume
  gpio: dt-bindings: Add documentation for gpio controllers on Armada 7K/8K
  clk: keystone: TI_SCI_PROTOCOL is needed for clk driver
  clk: samsung: audss: Fix silent hang on Exynos4412 due to disabled EPLL
  clk: uniphier: provide NAND controller clock rate
  clk: hisilicon: add usb2 clocks for hi3798cv200 SoC
  clk: Add Gemini SoC clock controller
  clk: iproc: Remove __init marking on iproc_pll_clk_setup()
  clk: bcm: Add clocks for Stingray SOC
  dt-bindings: clk: Extend binding doc for Stingray SOC
  clk: mediatek: export cpu multiplexer clock for MT8173 SoCs
  clk: mediatek: export cpu multiplexer clock for MT2701/MT7623 SoCs
  clk: mediatek: add missing cpu mux causing Mediatek cpufreq can't work
  clk: renesas: cpg-mssr: Use of_device_get_match_data() helper
  clk: hi6220: add acpu clock
  clk: zx296718: export I2S mux clocks
  clk: imx7d: create clocks behind rawnand clock gate
  clk: hi3660: Set PPLL2 to 2880M
  ...
2017-07-07 12:26:13 -07:00
Linus Torvalds
dd6ec12f3b DeviceTree for 4.13:
- vsprintf format specifier %pOF for device_node's. This will enable us
   to stop storing the full node names. Conversion of users will happen
   next cycle.
 
 - Update documentation to point to DT specification instead of ePAPR.
 
 - Split out graph and property functions to a separate file.
 
 - New of-graph functions for ALSA
 
 - Add vendor prefixes for RISC-V, Linksys, iWave Systems, Roofull,
   Itead, and BananaPi.
 
 - Improve dtx_diff utility filename printing.
 -----BEGIN PGP SIGNATURE-----
 
 iQItBAABCAAXBQJZXpNsEBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcO2gg//
 VxhXDs6+oTkBCUzVtEHue/yv44q8Sa7M3jY3/VqVSLa3Eopp/4dmDgBAtWYYX2ou
 KfUl0+yD4cSKhw6oxycwsaS61zf8JkM4sbXYQTphty/5lwxq0/i3OGj98Uk9w9JH
 kM+b1Wi7Z6GBzqh1GuS4E+ADSktMadxd0LugXZvDEMVQZusv/nzWxzq/bdMUqW19
 0nvBL9ABRAPirhBuMSWpYlEEkwQn7JF3LO3i8IBDhhFzMsvbfR7cTp+ydt6I2pk8
 h8DxlsaPIOWH5KePNEmzsd1VlV/HcNl7/vZb0ev0Eb94TLHJRJ7V0ZMQxc5vxHgN
 x6aMlBLHGzG6LI5CV30pWAD/qrrtXNbqmlj1Qjd+FXen6NuQSngSfo5aXzXrM6X5
 ZUD7ou9KzYObraOarU6w2qSICok85bGQHOiBQDVTmE4E/4AVscnc1VQi/rTHrt2O
 Yt3AV8iwaum8q2PVOVKdy8tu7x/7BzBdSObYtjjMIuWcrInnlIyUkmehtCl38kqV
 fd6OIVEOhTJTr0CYDiXEbKtG81j7JhoREdVZvzcEhWFGt/98Rjc9tkTihhFzky4m
 D6lpzpf8mvemrBiMegyQbhVcfHyo0fJe+6giV7cssf2Xhe1QkC15UXywbccO7xFJ
 nf3yqCl8YVEPG0l1MrR+YEHHcnr4ZIEZpejOv+SzZeg=
 =DNwf
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull DeviceTree updates from Rob Herring:

 - vsprintf format specifier %pOF for device_node's. This will enable us
   to stop storing the full node names. Conversion of users will happen
   next cycle.

 - Update documentation to point to DT specification instead of ePAPR.

 - Split out graph and property functions to a separate file.

 - New of-graph functions for ALSA

 - Add vendor prefixes for RISC-V, Linksys, iWave Systems, Roofull,
   Itead, and BananaPi.

 - Improve dtx_diff utility filename printing.

* tag 'devicetree-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (32 commits)
  of: document /sys/firmware/fdt
  dt-bindings: Add RISC-V vendor prefix
  vsprintf: Add %p extension "%pOF" for device tree
  of: find_node_by_full_name rewrite to compare each level
  of: use kbasename instead of open coding
  dt-bindings: thermal: add file extension to brcm,ns-thermal
  of: update ePAPR references to point to Devicetree Specification
  scripts/dtc: dtx_diff - Show real file names in diff header
  of: detect invalid phandle in overlay
  of: be consistent in form of file mode
  of: make __of_attach_node() static
  of: address.c header comment typo
  of: fdt.c header comment typo
  of: make of_fdt_is_compatible() static
  dt-bindings: display-timing.txt convert non-ascii characters to ascii
  Documentation: remove overlay-notes reference to non-existent file
  dt-bindings: usb: exynos-usb: Add missing required VDD properties
  dt-bindings: Add vendor prefix for Linksys
  MAINTAINERS: add device tree ABI documentation file
  of: Add vendor prefix for iWave Systems Technologies Pvt. Ltd
  ...
2017-07-07 10:37:54 -07:00
Linus Torvalds
21c19bc7ee Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:

 - Minor improvement : avoid requiring unnecessary startup/shutdown
   callback that many drivers seem to not need

 - New controller driver for Qualcomm's APCS IPC

* 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
  mailbox: Introduce Qualcomm APCS IPC driver
  dt-bindings: mailbox: Introduce Qualcomm APCS global binding
  mailbox: Make startup and shutdown ops optional
2017-07-07 10:24:07 -07:00
Linus Torvalds
b6ffe9ba46 libnvdimm for 4.13
* Introduce the _flushcache() family of memory copy helpers and use them
   for persistent memory write operations on x86. The _flushcache()
   semantic indicates that the cache is either bypassed for the copy
   operation (movnt) or any lines dirtied by the copy operation are
   written back (clwb, clflushopt, or clflush).
 
 * Extend dax_operations with ->copy_from_iter() and ->flush()
   operations. These operations and other infrastructure updates allow
   all persistent memory specific dax functionality to be pushed into
   libnvdimm and the pmem driver directly. It also allows dax-specific
   sysfs attributes to be linked to a host device, for example:
       /sys/block/pmem0/dax/write_cache
 
 * Add support for the new NVDIMM platform/firmware mechanisms introduced
   in ACPI 6.2 and UEFI 2.7. This support includes the v1.2 namespace
   label format, extensions to the address-range-scrub command set, new
   error injection commands, and a new BTT (block-translation-table)
   layout. These updates support inter-OS and pre-OS compatibility.
 
 * Fix a longstanding memory corruption bug in nfit_test.
 
 * Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
   capable.
 
 * Miscellaneous fixes and small updates across libnvdimm and the nfit
   driver.
 
 Acknowledgements that came after the branch was pushed:
 
 commit 6aa734a2f3 "libnvdimm, region, pmem: fix 'badblocks'
   sysfs_get_dirent() reference lifetime"
 Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZXsUtAAoJEB7SkWpmfYgCOXcP/06bncqTEvtgrOF2b7O8w+8e
 mTySD51RUn6UpkFd37SMRch+rmbojuqj465TAE7XIXgyLgIOJixKaTlHYUoEnP3X
 rC4Q/g5mN0nittMDwL+vQaa1lQWd2kbjOlrqCgnLHVEEJpHmiQussunjvir4G1U7
 5ROooP8W+qMK5y5XPLJAg/gyGhYkjpRSlDg3Eo5meZZ0IdURbI7+WCLKrPcQUERT
 WmDc9gLhJdSQVxBV/0m2gdAER4ADmFjcrlm8kjXRBhdlUmEFjM0zpvlHJutHTkks
 rNZWCmCJs0Sas+DmRKszFmvVFHRHqUVA3dWK4P6PJEX+tl7BwlPcxpbfacHTG2EZ
 btArFc584DZ+EIrim1cXXRvLFlxnKOFBtBeteFs7l2kZjEcN6S4I5OZgTyeDpe/i
 2WDpHWLQWibkcIzH9y1EuMBkYnQjTJl1pecHzJoTaC+jAQ+opLiY7EecjLmCmQS6
 MBYUeQZNufLGfT5b8KXfpKeiXhpFkYrAGp+ErfoH/6RKy2zqTdagN1yVhos2y+a7
 JJu/Weetpn8qv+KTGUShO8TGyWv3wU46YkG2rKWl0FL1+C+6LMMw1/L0A97lwVlg
 BpypVVyaNu1D22ifZ8O5wbqPIYghoZ5akA0CiduhX19cpl5rTeTd8EvLjvcYhZEZ
 pMHuMAqIcIyLhIe2/sRF
 =xKQB
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "libnvdimm updates for the latest ACPI and UEFI specifications. This
  pull request also includes new 'struct dax_operations' enabling to
  undo the abuse of copy_user_nocache() for copy operations to pmem.

  The dax work originally missed 4.12 to address concerns raised by Al.

  Summary:

   - Introduce the _flushcache() family of memory copy helpers and use
     them for persistent memory write operations on x86. The
     _flushcache() semantic indicates that the cache is either bypassed
     for the copy operation (movnt) or any lines dirtied by the copy
     operation are written back (clwb, clflushopt, or clflush).

   - Extend dax_operations with ->copy_from_iter() and ->flush()
     operations. These operations and other infrastructure updates allow
     all persistent memory specific dax functionality to be pushed into
     libnvdimm and the pmem driver directly. It also allows dax-specific
     sysfs attributes to be linked to a host device, for example:
     /sys/block/pmem0/dax/write_cache

   - Add support for the new NVDIMM platform/firmware mechanisms
     introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
     namespace label format, extensions to the address-range-scrub
     command set, new error injection commands, and a new BTT
     (block-translation-table) layout. These updates support inter-OS
     and pre-OS compatibility.

   - Fix a longstanding memory corruption bug in nfit_test.

   - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
     capable.

   - Miscellaneous fixes and small updates across libnvdimm and the nfit
     driver.

  Acknowledgements that came after the branch was pushed: commit
  6aa734a2f3 ("libnvdimm, region, pmem: fix 'badblocks'
  sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
  <toshi.kani@hpe.com>"

* tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
  libnvdimm, namespace: record 'lbasize' for pmem namespaces
  acpi/nfit: Issue Start ARS to retrieve existing records
  libnvdimm: New ACPI 6.2 DSM functions
  acpi, nfit: Show bus_dsm_mask in sysfs
  libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
  acpi, nfit: Enable DSM pass thru for root functions.
  libnvdimm: passthru functions clear to send
  libnvdimm, btt: convert some info messages to warn/err
  libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
  libnvdimm: fix the clear-error check in nsio_rw_bytes
  libnvdimm, btt: fix btt_rw_page not returning errors
  acpi, nfit: quiet invalid block-aperture-region warnings
  libnvdimm, btt: BTT updates for UEFI 2.7 format
  acpi, nfit: constify *_attribute_group
  libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
  libnvdimm, pmem, dax: export a cache control attribute
  dax: convert to bitmask for flags
  dax: remove default copy_from_iter fallback
  libnvdimm, nfit: enable support for volatile ranges
  libnvdimm, pmem: fix persistence warning
  ...
2017-07-07 09:44:06 -07:00
Linus Torvalds
9f45efb928 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few hotfixes

 - various misc updates

 - ocfs2 updates

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (108 commits)
  mm, memory_hotplug: move movable_node to the hotplug proper
  mm, memory_hotplug: drop CONFIG_MOVABLE_NODE
  mm, memory_hotplug: drop artificial restriction on online/offline
  mm: memcontrol: account slab stats per lruvec
  mm: memcontrol: per-lruvec stats infrastructure
  mm: memcontrol: use generic mod_memcg_page_state for kmem pages
  mm: memcontrol: use the node-native slab memory counters
  mm: vmstat: move slab statistics from zone to node counters
  mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()
  mm/zswap.c: improve a size determination in zswap_frontswap_init()
  mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create()
  mm/swapfile.c: sort swap entries before free
  mm/oom_kill: count global and memory cgroup oom kills
  mm: per-cgroup memory reclaim stats
  mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects
  mm: kmemleak: factor object reference updating out of scan_block()
  mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures
  mm, mempolicy: don't check cpuset seqlock where it doesn't matter
  mm, cpuset: always use seqlock when changing task's nodemask
  mm, mempolicy: simplify rebinding mempolicies when updating cpusets
  ...
2017-07-06 22:27:08 -07:00
Linus Torvalds
dc502142b6 Merge branch 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull user access str* updates from Al Viro:
 "uaccess str...() dead code removal"

* 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  s390 keyboard.c: don't open-code strndup_user()
  mips: get rid of unused __strnlen_user()
  get rid of unused __strncpy_from_user() instances
  kill strlen_user()
2017-07-06 22:07:44 -07:00
Linus Torvalds
90880b532a Merge branch 'work.probe_kernel_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull probe_kernel_read() uses from Al Viro:
 "Several open-coded probe_kernel_read()..."

* 'work.probe_kernel_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  dio: use probe_kernel_read()
  hp_sdc: use probe_kernel_read()
  hpfb: use probe_kernel_read()
2017-07-06 22:06:11 -07:00
Linus Torvalds
1c91d2c691 Merge branch 'misc.alpha' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull alpha user access updates from Al Viro:
 "Several alpha osf_sys.c uaccess cleanups - getdomainname() had insane
  byte-by-byte copying of string to userland (instead of strnlen +
  copy_to_user) plus yet another compat variant of timeval/itimerval
  with associated copyin/copyout primitives"

* 'misc.alpha' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  osf_sigstack(): switch to put_user()
  osf_sys.c: switch handling of timeval32/itimerval32 to copy_{to,from}_user()
  osf_getdomainname(): use copy_to_user()
2017-07-06 22:04:27 -07:00
Linus Torvalds
c856863988 Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc compat stuff updates from Al Viro:
 "This part is basically untangling various compat stuff. Compat
  syscalls moved to their native counterparts, getting rid of quite a
  bit of double-copying and/or set_fs() uses. A lot of field-by-field
  copyin/copyout killed off.

   - kernel/compat.c is much closer to containing just the
     copyin/copyout of compat structs. Not all compat syscalls are gone
     from it yet, but it's getting there.

   - ipc/compat_mq.c killed off completely.

   - block/compat_ioctl.c cleaned up; floppy compat ioctls moved to
     drivers/block/floppy.c where they belong. Yes, there are several
     drivers that implement some of the same ioctls. Some are m68k and
     one is 32bit-only pmac. drivers/block/floppy.c is the only one in
     that bunch that can be built on biarch"

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mqueue: move compat syscalls to native ones
  usbdevfs: get rid of field-by-field copyin
  compat_hdio_ioctl: get rid of set_fs()
  take floppy compat ioctls to sodding floppy.c
  ipmi: get rid of field-by-field __get_user()
  ipmi: get COMPAT_IPMICTL_RECEIVE_MSG in sync with the native one
  rt_sigtimedwait(): move compat to native
  select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()
  put_compat_rusage(): switch to copy_to_user()
  sigpending(): move compat to native
  getrlimit()/setrlimit(): move compat to native
  times(2): move compat to native
  compat_{get,put}_bitmap(): use unsafe_{get,put}_user()
  fb_get_fscreeninfo(): don't bother with do_fb_ioctl()
  do_sigaltstack(): lift copying to/from userland into callers
  take compat_sys_old_getrlimit() to native syscall
  trim __ARCH_WANT_SYS_OLD_GETRLIMIT
2017-07-06 20:57:13 -07:00
Linus Torvalds
771d3feb4b Merge branch 'work.drm' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull DRM compat ioctl handling updates from Al Viro:
 "This kills the double-copies in there and tons of field-by-field
  copyin/copyout.

  Several dead ioctls put to rest, while we are at it - the native
  counterparts had been gone for a decade, so we can bloody well fail
  early on the compat side. No point rearranging the 32bit structure
  into 64bit one (and back) only to be told "piss off, I don't know that
  ioctl" by the native code..."

* 'work.drm' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (29 commits)
  Fix trivial misannotations
  mga: switch compat ioctls to drm_ioctl_kernel()
  radeon: take out dead compat ioctls
  drm compat: ia64 is not biarch
  drm_compat_ioctl(): tidy up a bit
  switch compat_drm_mapbufs() to drm_ioctl_kernel()
  switch compat_drm_rmmap() to drm_ioctl_kernel()
  switch compat_drm_mode_addfb2() to drm_ioctl_kernel()
  switch compat_drm_wait_vblank() to drm_ioctl_kernel()
  switch compat_drm_update_draw()
  compat_drm: switch sg ioctls
  compat_drm: switch AGP compat ioctls to drm_ioctl_kernel()
  switch compat_drm_dma() to drm_ioctl_kernel()
  switch compat_drm_resctx() to drm_ioctl_kernel()
  switch compat_drm_getsareactx() to drm_ioctl_kernel()
  switch compat_drm_setsareactx() to drm_ioctl_kernel()
  switch compat_drm_freebufs() to drm_ioctl_kernel()
  switch compat_drm_markbufs() to drm_ioctl_kernel()
  switch compat_drm_addmap() to drm_ioctl_kernel()
  switch compat_drm_getstats() to drm_ioctl_kernel()
  ...
2017-07-06 20:32:13 -07:00
Linus Torvalds
2074006dac The new features of this release:
- Added TRACE_DEFINE_SIZEOF() which allows trace events that use
     sizeof() it the TP_printk() to be converted to the actual size such
     that trace-cmd and perf can parse them correctly.
 
   - Some rework of the TRACE_DEFINE_ENUM() such that the above
     TRACE_DEFINE_SIZEOF() could reuse the same code.
 
   - Recording of tgid (Thread Group ID). This is similar to how
     task COMMs are recorded (cached at sched_switch), where it is
     in a table and used on output of the trace and trace_pipe files.
 
   - Have ":mod:<module>" be cached when written into set_ftrace_filter.
     Then the functions of the module will be traced at module load.
 
   - Some random clean ups and small fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJZXjYuFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 fsgIAKUvhpn2igoYCR9tWqu+DovEmwxCIumbCzmCFQcRKlLttRte94yY5+W9hnV0
 JPzd9T9zBDVqq1fI7iIop1SuTwEfKW6lJom0usZ8AFpK+YKm6FHnQ28POlvHzre2
 lzO41tpRWiehLQsITZ47eByhsvEfhx86mYT/oM1JSR6Pii1OpjyNYmDMw6BaMNBT
 kSCQFgIhzAhVuHjwAnB/S++E/ou7M5bCwCb5CNh7MubKubV5upHpoJcgYGO+WWa6
 56H/iEhff4EECTGJVefd8e78MtJPL8EsuM0nAcMPlnl8AaiOpP7XCdlgTwdefLvP
 b3o+nP15voSHkARGXC6eM6gH0po=
 =rvGB
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The new features of this release:

   - Added TRACE_DEFINE_SIZEOF() which allows trace events that use
     sizeof() it the TP_printk() to be converted to the actual size such
     that trace-cmd and perf can parse them correctly.

   - Some rework of the TRACE_DEFINE_ENUM() such that the above
     TRACE_DEFINE_SIZEOF() could reuse the same code.

   - Recording of tgid (Thread Group ID). This is similar to how task
     COMMs are recorded (cached at sched_switch), where it is in a table
     and used on output of the trace and trace_pipe files.

   - Have ":mod:<module>" be cached when written into set_ftrace_filter.
     Then the functions of the module will be traced at module load.

   - Some random clean ups and small fixes"

* tag 'trace-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (26 commits)
  ftrace: Test for NULL iter->tr in regex for stack_trace_filter changes
  ftrace: Decrement count for dyn_ftrace_total_info for init functions
  ftrace: Unlock hash mutex on failed allocation in process_mod_list()
  tracing: Add support for display of tgid in trace output
  tracing: Add support for recording tgid of tasks
  ftrace: Decrement count for dyn_ftrace_total_info file
  ftrace: Remove unused function ftrace_arch_read_dyn_info()
  sh/ftrace: Remove only user of ftrace_arch_read_dyn_info()
  ftrace: Have cached module filters be an active filter
  ftrace: Implement cached modules tracing on module load
  ftrace: Have the cached module list show in set_ftrace_filter
  ftrace: Add :mod: caching infrastructure to trace_array
  tracing: Show address when function names are not found
  ftrace: Add missing comment for FTRACE_OPS_FL_RCU
  tracing: Rename update the enum_map file
  tracing: Add TRACE_DEFINE_SIZEOF() macros
  tracing: define TRACE_DEFINE_SIZEOF() macro to map sizeof's to their values
  tracing: Rename enum_replace to eval_replace
  trace: rename enum_map functions
  trace: rename trace.c enum functions
  ...
2017-07-06 19:45:45 -07:00
Linus Torvalds
f72e24a124 This is the first pull request for the new dma-mapping subsystem
In this new subsystem we'll try to properly maintain all the generic
 code related to dma-mapping, and will further consolidate arch code
 into common helpers.
 
 This pull request contains:
 
  - removal of the DMA_ERROR_CODE macro, replacing it with calls
    to ->mapping_error so that the dma_map_ops instances are
    more self contained and can be shared across architectures (me)
  - removal of the ->set_dma_mask method, which duplicates the
    ->dma_capable one in terms of functionality, but requires more
    duplicate code.
  - various updates for the coherent dma pool and related arm code
    (Vladimir)
  - various smaller cleanups (me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlldmw0LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOiKA/+Ln1mFLSf3nfTzIHa24Bbk8ZTGr0B8TD4Vmyyt8iG
 oO3AeaTLn3d6ugbH/uih/tPz8PuyXsdiTC1rI/ejDMiwMTSjW6phSiIHGcStSR9X
 VFNhmMFacp7QpUpvxceV0XZYKDViAoQgHeGdp3l+K5h/v4AYePV/v/5RjQPaEyOh
 YLbCzETO+24mRWdJxdAqtTW4ovYhzj6XsiJ+pAjlV0+SWU6m5L5E+VAPNi1vqv1H
 1O2KeCFvVYEpcnfL3qnkw2timcjmfCfeFAd9mCUAc8mSRBfs3QgDTKw3XdHdtRml
 LU2WuA5cpMrOdBO4mVra2plo8E2szvpB1OZZXoKKdCpK3VGwVpVHcTvClK2Ks/3B
 GDLieroEQNu2ZIUIdWXf/g2x6le3BcC9MmpkAhnGPqCZ7skaIBO5Cjpxm0zTJAPl
 PPY3CMBBEktAvys6DcudOYGixNjKUuAm5lnfpcfTEklFdG0AjhdK/jZOplAFA6w4
 LCiy0rGHM8ZbVAaFxbYoFCqgcjnv6EjSiqkJxVI4fu/Q7v9YXfdPnEmE0PJwCVo5
 +i7aCLgrYshTdHr/F3e5EuofHN3TDHwXNJKGh/x97t+6tt326QMvDKX059Kxst7R
 rFukGbrYvG8Y7yXwrSDbusl443ta0Ht7T1oL4YUoJTZp0nScAyEluDTmrH1JVCsT
 R4o=
 =0Fso
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping infrastructure from Christoph Hellwig:
 "This is the first pull request for the new dma-mapping subsystem

  In this new subsystem we'll try to properly maintain all the generic
  code related to dma-mapping, and will further consolidate arch code
  into common helpers.

  This pull request contains:

   - removal of the DMA_ERROR_CODE macro, replacing it with calls to
     ->mapping_error so that the dma_map_ops instances are more self
     contained and can be shared across architectures (me)

   - removal of the ->set_dma_mask method, which duplicates the
     ->dma_capable one in terms of functionality, but requires more
     duplicate code.

   - various updates for the coherent dma pool and related arm code
     (Vladimir)

   - various smaller cleanups (me)"

* tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits)
  ARM: dma-mapping: Remove traces of NOMMU code
  ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus
  ARM: NOMMU: Introduce dma operations for noMMU
  drivers: dma-mapping: allow dma_common_mmap() for NOMMU
  drivers: dma-coherent: Introduce default DMA pool
  drivers: dma-coherent: Account dma_pfn_offset when used with device tree
  dma: Take into account dma_pfn_offset
  dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs
  dma-mapping: remove dmam_free_noncoherent
  crypto: qat - avoid an uninitialized variable warning
  au1100fb: remove a bogus dma_free_nonconsistent call
  MAINTAINERS: add entry for dma mapping helpers
  powerpc: merge __dma_set_mask into dma_set_mask
  dma-mapping: remove the set_dma_mask method
  powerpc/cell: use the dma_supported method for ops switching
  powerpc/cell: clean up fixed mapping dma_ops initialization
  tile: remove dma_supported and mapping_error methods
  xen-swiotlb: remove xen_swiotlb_set_dma_mask
  arm: implement ->dma_supported instead of ->set_dma_mask
  mips/loongson64: implement ->dma_supported instead of ->set_dma_mask
  ...
2017-07-06 19:20:54 -07:00
Linus Torvalds
2c669275dc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Martin Schwidefsky:

 - The fixup for the blk-mq clash with the scm driver

 - An improvement for the dasd driver in regard to raw I/O

 - Bug fixes and cleanup

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  Update my email address
  s390/syscalls: Fix out of bounds arguments access
  s390/vfio_ccw: remove unused variable
  s390/dasd: remove unneeded code
  s390/crash: Remove unused KEXEC_NOTE_BYTES
  s390/zcrypt: Fix missing newlines at some debug feature messages.
  s390/dasd: Make raw I/O usable without prefix support
  s390/dasd: Rename dasd_raw_build_cp()
  s390/dasd: Refactor prefix_LRE() and related functions
  s390: fix up for "blk-mq: switch ->queue_rq return value to blk_status_t"
2017-07-06 19:15:23 -07:00
Linus Torvalds
6e6c5b9606 xen: features and fixes for 4.13-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJZXdVXAAoJELDendYovxMvVA0IAITmvH21SDTFiilKCOrxhCv0
 W3q3cOhZA4D+UtTqqIm/os/et08n72864s0mUFoY4PxETaUsb1jBav7z7Tod2c6B
 wh26UgIAhVO3ZewFSmpdPYoW0l3elC5JUMkVMfwSvHkROaU+YDEYUsLWGuIHZiiy
 V/kIskcKe08HLObU//BMjfFusmMHmQSg+TruyqRWodlWj4Rwm7q5fNZ/xaap1UCM
 O7GcHyq1k699w5YYTlIEkLWsX/pGM+auGSlT1xdjJEc2bpjH8ps0xbvAn6dsAKsE
 yoDyxQWtX2wBUXCqF0hXYAB2r1iFx2aFfLQjwc7p+V6BvxpWwSsC7Ur4QIDnm3E=
 =OLb7
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
 "Other than fixes and cleanups it contains:

   - support > 32 VCPUs at domain restore

   - support for new sysfs nodes related to Xen

   - some performance tuning for Linux running as Xen guest"

* tag 'for-linus-4.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: allow userspace access during hypercalls
  x86: xen: remove unnecessary variable in xen_foreach_remap_area()
  xen: allocate page for shared info page from low memory
  xen: avoid deadlock in xenbus driver
  xen: add sysfs node for hypervisor build id
  xen: sync include/xen/interface/version.h
  xen: add sysfs node for guest type
  doc,xen: document hypervisor sysfs nodes for xen
  xen/vcpu: Handle xen_vcpu_setup() failure at boot
  xen/vcpu: Handle xen_vcpu_setup() failure in hotplug
  xen/pv: Fix OOPS on restore for a PV, !SMP domain
  xen/pvh*: Support > 32 VCPUs at domain restore
  xen/vcpu: Simplify xen_vcpu related code
  xen-evtchn: Bind dyn evtchn:qemu-dm interrupt to next online VCPU
  xen: avoid type warning in xchg_xen_ulong
  xen: fix HYPERVISOR_dm_op() prototype
  xen: don't print error message in case of missing Xenstore entry
  arm/xen: Adjust one function call together with a variable assignment
  arm/xen: Delete an error message for a failed memory allocation in __set_phys_to_machine_multi()
  arm/xen: Improve a size determination in __set_phys_to_machine_multi()
2017-07-06 19:11:24 -07:00
Linus Torvalds
c136b84393 PPC:
- Better machine check handling for HV KVM
 - Ability to support guests with threads=2, 4 or 8 on POWER9
 - Fix for a race that could cause delayed recognition of signals
 - Fix for a bug where POWER9 guests could sleep with interrupts pending.
 
 ARM:
 - VCPU request overhaul
 - allow timer and PMU to have their interrupt number selected from userspace
 - workaround for Cavium erratum 30115
 - handling of memory poisonning
 - the usual crop of fixes and cleanups
 
 s390:
 - initial machine check forwarding
 - migration support for the CMMA page hinting information
 - cleanups and fixes
 
 x86:
 - nested VMX bugfixes and improvements
 - more reliable NMI window detection on AMD
 - APIC timer optimizations
 
 Generic:
 - VCPU request overhaul + documentation of common code patterns
 - kvm_stat improvements
 
 There is a small conflict in arch/s390 due to an arch-wide field rename.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZW4XTAAoJEL/70l94x66DkhMH/izpk54KI17PtyQ9VYI2sYeZ
 BWK6Kl886g3ij4pFi3pECqjDJzWaa3ai+vFfzzpJJ8OkCJT5Rv4LxC5ERltVVmR8
 A3T1I/MRktSC0VJLv34daPC2z4Lco/6SPipUpPnL4bE2HATKed4vzoOjQ3tOeGTy
 dwi7TFjKwoVDiM7kPPDRnTHqCe5G5n13sZ49dBe9WeJ7ttJauWqoxhlYosCGNPEj
 g8ZX8+cvcAhVnz5uFL8roqZ8ygNEQq2mgkU18W8ZZKuiuwR0gdsG0gSBFNTdwIMK
 NoreRKMrw0+oLXTIB8SZsoieU6Qi7w3xMAMabe8AJsvYtoersugbOmdxGCr1lsA=
 =OD7H
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "PPC:
   - Better machine check handling for HV KVM
   - Ability to support guests with threads=2, 4 or 8 on POWER9
   - Fix for a race that could cause delayed recognition of signals
   - Fix for a bug where POWER9 guests could sleep with interrupts pending.

  ARM:
   - VCPU request overhaul
   - allow timer and PMU to have their interrupt number selected from userspace
   - workaround for Cavium erratum 30115
   - handling of memory poisonning
   - the usual crop of fixes and cleanups

  s390:
   - initial machine check forwarding
   - migration support for the CMMA page hinting information
   - cleanups and fixes

  x86:
   - nested VMX bugfixes and improvements
   - more reliable NMI window detection on AMD
   - APIC timer optimizations

  Generic:
   - VCPU request overhaul + documentation of common code patterns
   - kvm_stat improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
  Update my email address
  kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
  x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
  kvm: x86: mmu: allow A/D bits to be disabled in an mmu
  x86: kvm: mmu: make spte mmio mask more explicit
  x86: kvm: mmu: dead code thanks to access tracking
  KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
  KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
  KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
  KVM: x86: remove ignored type attribute
  KVM: LAPIC: Fix lapic timer injection delay
  KVM: lapic: reorganize restart_apic_timer
  KVM: lapic: reorganize start_hv_timer
  kvm: nVMX: Check memory operand to INVVPID
  KVM: s390: Inject machine check into the nested guest
  KVM: s390: Inject machine check into the guest
  tools/kvm_stat: add new interactive command 'b'
  tools/kvm_stat: add new command line switch '-i'
  tools/kvm_stat: fix error on interactive command 'g'
  KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
  ...
2017-07-06 18:38:31 -07:00
Michal Hocko
4932381ee2 mm, memory_hotplug: move movable_node to the hotplug proper
movable_node_is_enabled is defined in memblock proper while it is
initialized from the memory hotplug proper.  This is quite messy and it
makes a dependency between the two so move movable_node along with the
helper functions to memory_hotplug.

To make it more entertaining the kernel parameter is ignored unless
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y because we do not have the node
information for each memblock otherwise.  So let's warn when the option
is disabled.

Link: http://lkml.kernel.org/r/20170529114141.536-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Michal Hocko
f70029bbaa mm, memory_hotplug: drop CONFIG_MOVABLE_NODE
Commit 20b2f52b73 ("numa: add CONFIG_MOVABLE_NODE for
movable-dedicated node") has introduced CONFIG_MOVABLE_NODE without a
good explanation on why it is actually useful.

It makes a lot of sense to make movable node semantic opt in but we
already have that because the feature has to be explicitly enabled on
the kernel command line.  A config option on top only makes the
configuration space larger without a good reason.  It also adds an
additional ifdefery that pollutes the code.

Just drop the config option and make it de-facto always enabled.  This
shouldn't introduce any change to the semantic.

Link: http://lkml.kernel.org/r/20170529114141.536-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Michal Hocko
57c0a17238 mm, memory_hotplug: drop artificial restriction on online/offline
Patch series "remove CONFIG_MOVABLE_NODE".

I am continuing to clean up the memory hotplug code and
CONFIG_MOVABLE_NODE seems dubious at best.  The following two patches
simply removes the flag and make it de-facto always enabled.

The current semantic of the config option is twofold 1) it automatically
binds hotplugable nodes to have memory in zone_movable by default when
movable_node is enabled 2) forbids memory hotplug to online all the
memory as movable when !CONFIG_MOVABLE_NODE.

The later restriction is quite dubious because there is no clear cut of
how much normal memory do we need for a reasonable system operation.  A
single memory block which is sufficient to allow further movable onlines
is far from sufficient (e.g a node with >2GB and memblocks 128MB will
fill up this zone with struct pages leaving nothing for other
allocations).  Removing the config option will not only reduce the
configuration space it also removes quite some code.

The semantic of the movable_node command line parameter is preserved.

The first patch removes the restriction mentioned above and the second
one simply removes all the CONFIG_MOVABLE_NODE related stuff.  The last
patch moves movable_node flag handling to memory_hotplug proper where it
belongs.

[1] http://lkml.kernel.org/r/20170524122411.25212-1-mhocko@kernel.org

This patch (of 3):

Commit 74d42d8fe1 ("memory_hotplug: ensure every online node has
NORMAL memory") has introduced a restriction that every numa node has to
have at least some memory in !movable zones before a first movable
memory can be onlined if !CONFIG_MOVABLE_NODE.

Likewise can_offline_normal checks the amount of normal memory in
!movable zones and it disallows to offline memory if there is no normal
memory left with a justification that "memory-management acts bad when
we have nodes which is online but don't have any normal memory".

While it is true that not having _any_ memory for kernel allocations on
a NUMA node is far from great and such a node would be quite subotimal
because all kernel allocations will have to fallback to another NUMA
node but there is no reason to disallow such a configuration in
principle.

Besides that there is not really a big difference to have one memblock
for ZONE_NORMAL available or none.  With 128MB size memblocks the system
might trash on the kernel allocations requests anyway.  It is really
hard to draw a line on how much normal memory is really sufficient so we
have to rely on administrator to configure system sanely therefore drop
the artificial restriction and remove can_offline_normal and
can_online_high_movable altogether.

Link: http://lkml.kernel.org/r/20170529114141.536-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Johannes Weiner
7779f21236 mm: memcontrol: account slab stats per lruvec
Josef's redesign of the balancing between slab caches and the page cache
requires slab cache statistics at the lruvec level.

Link: http://lkml.kernel.org/r/20170530181724.27197-7-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Johannes Weiner
00f3ca2c2d mm: memcontrol: per-lruvec stats infrastructure
lruvecs are at the intersection of the NUMA node and memcg, which is the
scope for most paging activity.

Introduce a convenient accounting infrastructure that maintains
statistics per node, per memcg, and the lruvec itself.

Then convert over accounting sites for statistics that are already
tracked in both nodes and memcgs and can be easily switched.

[hannes@cmpxchg.org: fix crash in the new cgroup stat keeping code]
  Link: http://lkml.kernel.org/r/20170531171450.GA10481@cmpxchg.org
[hannes@cmpxchg.org: don't track uncharged pages at all
  Link: http://lkml.kernel.org/r/20170605175254.GA8547@cmpxchg.org
[hannes@cmpxchg.org: add missing free_percpu()]
  Link: http://lkml.kernel.org/r/20170605175354.GB8547@cmpxchg.org
[linux@roeck-us.net: hexagon: fix build error caused by include file order]
  Link: http://lkml.kernel.org/r/20170617153721.GA4382@roeck-us.net
Link: http://lkml.kernel.org/r/20170530181724.27197-6-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Johannes Weiner
ed52be7bfd mm: memcontrol: use generic mod_memcg_page_state for kmem pages
The kmem-specific functions do the same thing.  Switch and drop.

Link: http://lkml.kernel.org/r/20170530181724.27197-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Johannes Weiner
320492961c mm: memcontrol: use the node-native slab memory counters
Now that the slab counters are moved from the zone to the node level we
can drop the private memcg node stats and use the official ones.

Link: http://lkml.kernel.org/r/20170530181724.27197-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Johannes Weiner
385386cff4 mm: vmstat: move slab statistics from zone to node counters
Patch series "mm: per-lruvec slab stats"

Josef is working on a new approach to balancing slab caches and the page
cache.  For this to work, he needs slab cache statistics on the lruvec
level.  These patches implement that by adding infrastructure that
allows updating and reading generic VM stat items per lruvec, then
switches some existing VM accounting sites, including the slab
accounting ones, to this new cgroup-aware API.

I'll follow up with more patches on this, because there is actually
substantial simplification that can be done to the memory controller
when we replace private memcg accounting with making the existing VM
accounting sites cgroup-aware.  But this is enough for Josef to base his
slab reclaim work on, so here goes.

This patch (of 5):

To re-implement slab cache vs.  page cache balancing, we'll need the
slab counters at the lruvec level, which, ever since lru reclaim was
moved from the zone to the node, is the intersection of the node, not
the zone, and the memcg.

We could retain the per-zone counters for when the page allocator dumps
its memory information on failures, and have counters on both levels -
which on all but NUMA node 0 is usually redundant.  But let's keep it
simple for now and just move them.  If anybody complains we can restore
the per-zone counters.

[hannes@cmpxchg.org: fix oops]
  Link: http://lkml.kernel.org/r/20170605183511.GA8915@cmpxchg.org
Link: http://lkml.kernel.org/r/20170530181724.27197-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Markus Elfring
2b2695f5fd mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()
Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Link: http://lkml.kernel.org/r/bae25b04-2ce2-7137-a71c-50d7b4f06431@users.sourceforge.net
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Markus Elfring
9cd1f701ce mm/zswap.c: improve a size determination in zswap_frontswap_init()
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding
size determination a bit safer according to the Linux coding style
convention.

Link: http://lkml.kernel.org/r/19f9da22-092b-f867-bdf6-f4dbad7ccf1f@users.sourceforge.net
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Markus Elfring
f4ae0ce0fd mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create()
Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Link: http://lkml.kernel.org/r/2345aabc-ae98-1d31-afba-40a02c5baf3d@users.sourceforge.net
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Huang Ying
155b5f88e7 mm/swapfile.c: sort swap entries before free
To reduce the lock contention of swap_info_struct->lock when freeing
swap entry.  The freed swap entries will be collected in a per-CPU
buffer firstly, and be really freed later in batch.  During the batch
freeing, if the consecutive swap entries in the per-CPU buffer belongs
to same swap device, the swap_info_struct->lock needs to be
acquired/released only once, so that the lock contention could be
reduced greatly.  But if there are multiple swap devices, it is possible
that the lock may be unnecessarily released/acquired because the swap
entries belong to the same swap device are non-consecutive in the
per-CPU buffer.

To solve the issue, the per-CPU buffer is sorted according to the swap
device before freeing the swap entries.

With the patch, the memory (some swapped out) free time reduced 11.6%
(from 2.65s to 2.35s) in the vm-scalability swap-w-rand test case with
16 processes.  The test is done on a Xeon E5 v3 system.  The swap device
used is a RAM simulated PMEM (persistent memory) device.  To test
swapping, the test case creates 16 processes, which allocate and write
to the anonymous pages until the RAM and part of the swap device is used
up, finally the memory (some swapped out) is freed before exit.

[akpm@linux-foundation.org: tweak comment]
Link: http://lkml.kernel.org/r/20170525005916.25249-1-ying.huang@intel.com
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Tim Chen <tim.c.chen@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Konstantin Khlebnikov
8e675f7af5 mm/oom_kill: count global and memory cgroup oom kills
Show count of oom killer invocations in /proc/vmstat and count of
processes killed in memory cgroup in knob "memory.events" (in
memory.oom_control for v1 cgroup).

Also describe difference between "oom" and "oom_kill" in memory cgroup
documentation.  Currently oom in memory cgroup kills tasks iff shortage
has happened inside page fault.

These counters helps in monitoring oom kills - for now the only way is
grepping for magic words in kernel log.

[akpm@linux-foundation.org: fix for mem_cgroup_count_vm_event() rename]
[akpm@linux-foundation.org: fix comment, per Konstantin]
Link: http://lkml.kernel.org/r/149570810989.203600.9492483715840752937.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Roman Guschin <guroan@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Roman Gushchin
2262185c5b mm: per-cgroup memory reclaim stats
Track the following reclaim counters for every memory cgroup: PGREFILL,
PGSCAN, PGSTEAL, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE and PGLAZYFREED.

These values are exposed using the memory.stats interface of cgroup v2.

The meaning of each value is the same as for global counters, available
using /proc/vmstat.

Also, for consistency, rename mem_cgroup_count_vm_event() to
count_memcg_event_mm().

Link: http://lkml.kernel.org/r/1494530183-30808-1-git-send-email-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Catalin Marinas
94f4a1618b mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects
Kmemleak requires that vmalloc'ed objects have a minimum reference count
of 2: one in the corresponding vm_struct object and the other owned by
the vmalloc() caller.  There are cases, however, where the original
vmalloc() returned pointer is lost and, instead, a pointer to vm_struct
is stored (see free_thread_stack()).  Kmemleak currently reports such
objects as leaks.

This patch adds support for treating any surplus references to an object
as additional references to a specified object.  It introduces the
kmemleak_vmalloc() API function which takes a vm_struct pointer and sets
its surplus reference passing to the actual vmalloc() returned pointer.
The __vmalloc_node_range() calling site has been modified accordingly.

Link: http://lkml.kernel.org/r/1495726937-23557-4-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Catalin Marinas
04f70d13ca mm: kmemleak: factor object reference updating out of scan_block()
scan_block() updates the number of references (pointers) to objects,
adding them to the gray_list when object->min_count is reached.  The
patch factors out this functionality into a separate update_refs()
function.

Link: http://lkml.kernel.org/r/1495726937-23557-3-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Catalin Marinas
f66abf09e0 mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures
Change the kmemleak_object.flags type to unsigned int and moves the
early_log.min_count (int) near early_log.op_type (int) to slightly
reduce the size of these structures on 64-bit architectures.

Link: http://lkml.kernel.org/r/1495726937-23557-2-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Vlastimil Babka
e0dd7d53a6 mm, mempolicy: don't check cpuset seqlock where it doesn't matter
Two wrappers of __alloc_pages_nodemask() are checking
task->mems_allowed_seq themselves to retry allocation that has raced
with a cpuset update.

This has been shown to be ineffective in preventing premature OOM's
which can happen in __alloc_pages_slowpath() long before it returns back
to the wrappers to detect the race at that level.

Previous patches have made __alloc_pages_slowpath() more robust, so we
can now simply remove the seqlock checking in the wrappers to prevent
further wrong impression that it can actually help.

Link: http://lkml.kernel.org/r/20170517081140.30654-7-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Vlastimil Babka
5f155f27cb mm, cpuset: always use seqlock when changing task's nodemask
When updating task's mems_allowed and rebinding its mempolicy due to
cpuset's mems being changed, we currently only take the seqlock for
writing when either the task has a mempolicy, or the new mems has no
intersection with the old mems.

This should be enough to prevent a parallel allocation seeing no
available nodes, but the optimization is IMHO unnecessary (cpuset
updates should not be frequent), and we still potentially risk issues if
the intersection of new and old nodes has limited amount of
free/reclaimable memory.

Let's just use the seqlock for all tasks.

Link: http://lkml.kernel.org/r/20170517081140.30654-6-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Vlastimil Babka
213980c0f2 mm, mempolicy: simplify rebinding mempolicies when updating cpusets
Commit c0ff7453bb ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") has introduced a two-step protocol when
rebinding task's mempolicy due to cpuset update, in order to avoid a
parallel allocation seeing an empty effective nodemask and failing.

Later, commit cc9a6c8776 ("cpuset: mm: reduce large amounts of memory
barrier related damage v3") introduced a seqlock protection and removed
the synchronization point between the two update steps.  At that point
(or perhaps later), the two-step rebinding became unnecessary.

Currently it only makes sure that the update first adds new nodes in
step 1 and then removes nodes in step 2.  Without memory barriers the
effects are questionable, and even then this cannot prevent a parallel
zonelist iteration checking the nodemask at each step to observe all
nodes as unusable for allocation.  We now fully rely on the seqlock to
prevent premature OOMs and allocation failures.

We can thus remove the two-step update parts and simplify the code.

Link: http://lkml.kernel.org/r/20170517081140.30654-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Vlastimil Babka
04ec6264f2 mm, page_alloc: pass preferred nid instead of zonelist to allocator
The main allocator function __alloc_pages_nodemask() takes a zonelist
pointer as one of its parameters.  All of its callers directly or
indirectly obtain the zonelist via node_zonelist() using a preferred
node id and gfp_mask.  We can make the code a bit simpler by doing the
zonelist lookup in __alloc_pages_nodemask(), passing it a preferred node
id instead (gfp_mask is already another parameter).

There are some code size benefits thanks to removal of inlined
node_zonelist():

  bloat-o-meter add/remove: 2/2 grow/shrink: 4/36 up/down: 399/-1351 (-952)

This will also make things simpler if we proceed with converting cpusets
to zonelists.

Link: http://lkml.kernel.org/r/20170517081140.30654-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Vlastimil Babka
45816682b2 mm, mempolicy: stop adjusting current->il_next in mpol_rebind_nodemask()
The task->il_next variable stores the next allocation node id for task's
MPOL_INTERLEAVE policy.  mpol_rebind_nodemask() updates interleave and
bind mempolicies due to changing cpuset mems.  Currently it also tries
to make sure that current->il_next is valid within the updated nodemask.
This is bogus, because 1) we are updating potentially any task's
mempolicy, not just current, and 2) we might be updating a per-vma
mempolicy, not task one.

The interleave_nodes() function that uses il_next can cope fine with the
value not being within the currently allowed nodes, so this hasn't
manifested as an actual issue.

We can remove the need for updating il_next completely by changing it to
il_prev and store the node id of the previous interleave allocation
instead of the next id.  Then interleave_nodes() can calculate the next
id using the current nodemask and also store it as il_prev, except when
querying the next node via do_get_mempolicy().

Link: http://lkml.kernel.org/r/20170517081140.30654-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Vlastimil Babka
902b62810a mm, page_alloc: fix more premature OOM due to race with cpuset update
I would like to stress that this patchset aims to fix issues and cleanup
the code *within the existing documented semantics*, i.e.  patch 1
ignores mempolicy restrictions if the set of allowed nodes has no
intersection with set of nodes allowed by cpuset.  I believe discussing
potential changes of the semantics can be better done once we have a
baseline with no known bugs of the current semantics.

I've recently summarized the cpuset/mempolicy issues in a LSF/MM
proposal [1] and the discussion itself [2].  I've been trying to rewrite
the handling as proposed, with the idea that changing semantics to make
all mempolicies static wrt cpuset updates (and discarding the relative
and default modes) can be tried on top, as there's a high risk of being
rejected/reverted because somebody might still care about the removed
modes.

However I haven't yet figured out how to properly:

1) make mempolicies swappable instead of rebinding in place. I thought
   mbind() already works that way and uses refcounting to avoid
   use-after-free of the old policy by a parallel allocation, but turns
   out true refcounting is only done for shared (shmem) mempolicies, and
   the actual protection for mbind() comes from mmap_sem. Extending the
   refcounting means more overhead in allocator hot path. Also swapping
   whole mempolicies means that we have to allocate the new ones, which
   can fail, and reverting of the partially done work also means
   allocating (note that mbind() doesn't care and will just leave part
   of the range updated and part not updated when returning -ENOMEM...).

2) make cpuset's task->mems_allowed also swappable (after converting it
   from nodemask to zonelist, which is the easy part) for mostly the
   same reasons.

The good news is that while trying to do the above, I've at least
figured out how to hopefully close the remaining premature OOM's, and do
a buch of cleanups on top, removing quite some of the code that was also
supposed to prevent the cpuset update races, but doesn't work anymore
nowadays.  This should fix the most pressing concerns with this topic
and give us a better baseline before either proceeding with the original
proposal, or pushing a change of semantics that removes the problem 1)
above.  I'd be then fine with trying to change the semantic first and
rewrite later.

Patchset has been tested with the LTP cpuset01 stress test.

[1] https://lkml.kernel.org/r/4c44a589-5fd8-08d0-892c-e893bb525b71@suse.cz
[2] https://lwn.net/Articles/717797/
[3] https://marc.info/?l=linux-mm&m=149191957922828&w=2

This patch (of 6):

Commit e47483bca2 ("mm, page_alloc: fix premature OOM when racing with
cpuset mems update") has fixed known recent regressions found by LTP's
cpuset01 testcase.  I have however found that by modifying the testcase
to use per-vma mempolicies via bind(2) instead of per-task mempolicies
via set_mempolicy(2), the premature OOM still happens and the issue is
much older.

The root of the problem is that the cpuset's mems_allowed and
mempolicy's nodemask can temporarily have no intersection, thus
get_page_from_freelist() cannot find any usable zone.  The current
semantic for empty intersection is to ignore mempolicy's nodemask and
honour cpuset restrictions.  This is checked in node_zonelist(), but the
racy update can happen after we already passed the check.  Such races
should be protected by the seqlock task->mems_allowed_seq, but it
doesn't work here, because 1) mpol_rebind_mm() does not happen under
seqlock for write, and doing so would lead to deadlock, as it takes
mmap_sem for write, while the allocation can have mmap_sem for read when
it's taking the seqlock for read.  And 2) the seqlock cookie of callers
of node_zonelist() (alloc_pages_vma() and alloc_pages_current()) is
different than the one of __alloc_pages_slowpath(), so there's still a
potential race window.

This patch fixes the issue by having __alloc_pages_slowpath() check for
empty intersection of cpuset and ac->nodemask before OOM or allocation
failure.  If it's indeed empty, the nodemask is ignored and allocation
retried, which mimics node_zonelist().  This works fine, because almost
all callers of __alloc_pages_nodemask are obtaining the nodemask via
node_zonelist().  The only exception is new_node_page() from hotplug,
where the potential violation of nodemask isn't an issue, as there's
already a fallback allocation attempt without any nodemask.  If there's
a future caller that needs to have its specific nodemask honoured over
task's cpuset restrictions, we'll have to e.g.  add a gfp flag for that.

Link: http://lkml.kernel.org/r/20170517081140.30654-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Punit Agrawal
5fd27b8e7d mm: rmap: use correct helper when poisoning hugepages
Using set_pte_at() does not do the right thing when putting down
HWPOISON swap entries for hugepages on architectures that support
contiguous ptes.

Fix this problem by using set_huge_swap_pte_at() which was introduced to
fix exactly this problem.

Link: http://lkml.kernel.org/r/20170522133604.11392-7-punit.agrawal@arm.com
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Punit Agrawal
e5251fd430 mm/hugetlb: introduce set_huge_swap_pte_at() helper
set_huge_pte_at(), an architecture callback to populate hugepage ptes,
does not provide the range of virtual memory that is targeted.  This
leads to ambiguity when dealing with swap entries on architectures that
support hugepages consisting of contiguous ptes.

Fix the problem by introducing an overridable helper that is called when
populating the page tables with swap entries.  The size of the targeted
region is provided to the helper to help determine the number of entries
to be updated.

Provide a default implementation that maintains the current behaviour.

[punit.agrawal@arm.com: v4]
  Link: http://lkml.kernel.org/r/20170524115409.31309-8-punit.agrawal@arm.com
[punit.agrawal@arm.com: add an empty definition for set_huge_swap_pte_at()]
  Link: http://lkml.kernel.org/r/20170525171331.31469-1-punit.agrawal@arm.com
Link: http://lkml.kernel.org/r/20170522133604.11392-6-punit.agrawal@arm.com
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00
Punit Agrawal
9386fac34c mm/hugetlb: allow architectures to override huge_pte_clear()
When unmapping a hugepage range, huge_pte_clear() is used to clear the
page table entries that are marked as not present.  huge_pte_clear()
internally just ends up calling pte_clear() which does not correctly
deal with hugepages consisting of contiguous page table entries.

Add a size argument to address this issue and allow architectures to
override huge_pte_clear() by wrapping it in a #ifndef block.

Update s390 implementation with the size parameter as well.

Note that the change only affects huge_pte_clear() - the other generic
hugetlb functions don't need any change.

Link: http://lkml.kernel.org/r/20170522162555.4313-1-punit.agrawal@arm.com
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	[s390 bits]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:34 -07:00