Commit Graph

839617 Commits

Author SHA1 Message Date
Thomas Huth
12e9612cae KVM: selftests: Remove duplicated TEST_ASSERT in hyperv_cpuid.c
The check for entry->index == 0 is done twice. One time should
be sufficient.

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:10 +02:00
Wanpeng Li
16ba3ab4e1 KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace
Expose per-vCPU timer_advance_ns to userspace, so it is able to
query the auto-adjusted value.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:09 +02:00
Wanpeng Li
0e6edceb8f KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow
After commit c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of
timer advancement), '-1' enables adaptive tuning starting from default
advancment of 1000ns. However, we should expose an int instead of an overflow
uint module parameter.

Before patch:

/sys/module/kvm/parameters/lapic_timer_advance_ns:4294967295

After patch:

/sys/module/kvm/parameters/lapic_timer_advance_ns:-1

Fixes: c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of timer advancement)
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:09 +02:00
Yi Wang
4d25996565 kvm: vmx: Fix -Wmissing-prototypes warnings
We get a warning when build kernel W=1:
arch/x86/kvm/vmx/vmx.c:6365:6: warning: no previous prototype for ‘vmx_update_host_rsp’ [-Wmissing-prototypes]
 void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)

Add the missing declaration to fix this.

Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:08 +02:00
Wanpeng Li
541e886f79 KVM: nVMX: Fix using __this_cpu_read() in preemptible context
BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/4590
  caller is nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
  CPU: 4 PID: 4590 Comm: qemu-system-x86 Tainted: G           OE     5.1.0-rc4+ #1
  Call Trace:
   dump_stack+0x67/0x95
   __this_cpu_preempt_check+0xd2/0xe0
   nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel]
   nested_vmx_run+0xda/0x2b0 [kvm_intel]
   handle_vmlaunch+0x13/0x20 [kvm_intel]
   vmx_handle_exit+0xbd/0x660 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0xa2c/0x1e50 [kvm]
   kvm_vcpu_ioctl+0x3ad/0x6d0 [kvm]
   do_vfs_ioctl+0xa5/0x6e0
   ksys_ioctl+0x6d/0x80
   __x64_sys_ioctl+0x1a/0x20
   do_syscall_64+0x6f/0x6c0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Accessing per-cpu variable should disable preemption, this patch extends the
preemption disable region for __this_cpu_read().

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Fixes: 52017608da ("KVM: nVMX: add option to perform early consistency checks via H/W")
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:08 +02:00
Paolo Bonzini
d30b214d1d kvm: fix compilation on s390
s390 does not have memremap, even though in this particular case it
would be useful.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:07 +02:00
Jim Mattson
382409b4c4 kvm: x86: Include CPUID leaf 0x8000001e in kvm's supported CPUID
Kvm now supports extended CPUID functions through 0x8000001f.  CPUID
leaf 0x8000001e is AMD's Processor Topology Information leaf. This
contains similar information to CPUID leaf 0xb (Intel's Extended
Topology Enumeration leaf), and should be included in the output of
KVM_GET_SUPPORTED_CPUID, even though userspace is likely to override
some of this information based upon the configuration of the
particular VM.

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Fixes: 8765d75329 ("KVM: X86: Extend CPUID range to include new leaf")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:06 +02:00
Jim Mattson
32a243df82 kvm: x86: Include multiple indices with CPUID leaf 0x8000001d
Per the APM, "CPUID Fn8000_001D_E[D,C,B,A]X reports cache topology
information for the cache enumerated by the value passed to the
instruction in ECX, referred to as Cache n in the following
description. To gather information for all cache levels, software must
repeatedly execute CPUID with 8000_001Dh in EAX and ECX set to
increasing values beginning with 0 until a value of 00h is returned in
the field CacheType (EAX[4:0]) indicating no more cache descriptions
are available for this processor."

The termination condition is the same as leaf 4, so we can reuse that
code block for leaf 0x8000001d.

Fixes: 8765d75329 ("KVM: X86: Extend CPUID range to include new leaf")
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Marc Orr <marcorr@google.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:06 +02:00
Thomas Huth
319f6f97e3 KVM: selftests: Compile code with warnings enabled
So far the KVM selftests are compiled without any compiler warnings
enabled. That's quite bad, since we miss a lot of possible bugs this
way. Let's enable at least "-Wall" and some other useful warning flags
now, and fix at least the trivial problems in the code (like unused
variables).

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:05 +02:00
Paolo Bonzini
3b339e2527 kvm: selftests: avoid type punning
Avoid warnings from -Wstrict-aliasing by using memcpy.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:05 +02:00
Dan Carpenter
be7fcf1d17 KVM: selftests: Fix a condition in test_hv_cpuid()
The code is trying to check that all the padding is zeroed out and it
does this:

    entry->padding[0] == entry->padding[1] == entry->padding[2] == 0

Assume everything is zeroed correctly, then the first comparison is
true, the next comparison is false and false is equal to zero so the
overall condition is true.  This bug doesn't affect run time very
badly, but the code should instead just check that all three paddings
are zero individually.

Also the error message was copy and pasted from an earlier error and it
wasn't correct.

Fixes: 7edcb73433 ("KVM: selftests: Add hyperv_cpuid test")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:04 +02:00
Wanpeng Li
2eb06c306a KVM: Fix spinlock taken warning during host resume
WARNING: CPU: 0 PID: 13554 at kvm/arch/x86/kvm//../../../virt/kvm/kvm_main.c:4183 kvm_resume+0x3c/0x40 [kvm]
  CPU: 0 PID: 13554 Comm: step_after_susp Tainted: G           OE     5.1.0-rc4+ #1
  RIP: 0010:kvm_resume+0x3c/0x40 [kvm]
  Call Trace:
   syscore_resume+0x63/0x2d0
   suspend_devices_and_enter+0x9d1/0xa40
   pm_suspend+0x33a/0x3b0
   state_store+0x82/0xf0
   kobj_attr_store+0x12/0x20
   sysfs_kf_write+0x4b/0x60
   kernfs_fop_write+0x120/0x1a0
   __vfs_write+0x1b/0x40
   vfs_write+0xcd/0x1d0
   ksys_write+0x5f/0xe0
   __x64_sys_write+0x1a/0x20
   do_syscall_64+0x6f/0x6c0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Commit ca84d1a24 (KVM: x86: Add clock sync request to hardware enable) mentioned
that "we always hold kvm_lock when hardware_enable is called.  The one place that
doesn't need to worry about it is resume, as resuming a frozen CPU, the spinlock
won't be taken." However, commit 6706dae9 (virt/kvm: Replace spin_is_locked() with
lockdep) introduces a bug, it asserts when the lock is not held which is contrary
to the original goal.

This patch fixes it by WARN_ON when the lock is held.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Fixes: 6706dae9 ("virt/kvm: Replace spin_is_locked() with lockdep")
[Wrap with #ifdef CONFIG_LOCKDEP - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:03 +02:00
Sean Christopherson
21be4ca1ea KVM: nVMX: Clear nested_run_pending if setting nested state fails
VMX's nested_run_pending flag is subtly consumed when stuffing state to
enter guest mode, i.e. needs to be set according before KVM knows if
setting guest state is successful.  If setting guest state fails, clear
the flag as a nested run is obviously not pending.

Reported-by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:03 +02:00
Paolo Bonzini
db80927ea1 KVM: nVMX: really fix the size checks on KVM_SET_NESTED_STATE
The offset for reading the shadow VMCS is sizeof(*kvm_state)+VMCS12_SIZE,
so the correct size must be that plus sizeof(*vmcs12).  This could lead
to KVM reading garbage data from userspace and not reporting an error,
but is otherwise not sensitive.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24 21:27:02 +02:00
Paolo Bonzini
6bff2a3dc9 KVM/arm updates for 5.2-rc2
- Correctly annotate HYP-callable code to be non-traceable
 - Remove Christoffer from the MAINTAINERS file as his request
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlzn+B4VHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpD4aQQALuikHA7JTN2LUedPM8Klf7Hm0qO
 cqCdI1VmMxMJfzV2l/73JIJaB7wRPYugfsoGtrJMgpvs6nD8YMq8bpvDjwDIA2Wy
 SGMR0dNL+bhe+tWsYU9pR/9pBI9t7gSkWZY1Qv7SU4yNFOy1MV0dGtkCW5J6qhLh
 xoAUOmWdAj/KkaTRh4J9PtQeToiAtgFTlrnx/qC4iZGYT/gDOaXfPZTHerPeAEt6
 MMKU74YoV5mX6Y55qw+dspDpSrxH44IkO0IF0ml/DvoQ+PaWjn6PFsx/3ZItdRg4
 iiZefBm1dsvDGTIq1hzoEPJWahVmPad9cgsKSKhJPjHH79pWEsvRcExBmbku0Llj
 n1mmqMaKw64FLE5x/Vbbd8vHUoIdCpdBz+qWmeldHsXYHGge+c5HsoLbgHJqOFQD
 RidO+S+C9imaNYHnbehrAudjvav2bY/wbQCb+SKU2ZOXiuxXmdzlMNgiA61ylicK
 jqOvuNWISjhmVMJVK3GpkIs6pRwYeFwalNux809LqJTT9U9XnZARys/IdE16KlX/
 5FMM60r+B1aX6ge+w8MfyPw28xm8xwYnX4Mor7UfmDLCPB70w7MKCAIsHF12pCUf
 ygSnruajTgVic/2ARzum1FmY0boSdFcBRESN+eQpLriaJK3x7mHxS3qBrvd5Yj9j
 w1xfuKSCN+zGbHzq
 =v3an
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for 5.2-rc2

- Correctly annotate HYP-callable code to be non-traceable
- Remove Christoffer from the MAINTAINERS file as his request
2019-05-24 21:27:00 +02:00
James Morse
623e1528d4 KVM: arm/arm64: Move cc/it checks under hyp's Makefile to avoid instrumentation
KVM has helpers to handle the condition codes of trapped aarch32
instructions. These are marked __hyp_text and used from HYP, but they
aren't built by the 'hyp' Makefile, which has all the runes to avoid ASAN
and KCOV instrumentation.

Move this code to a new hyp/aarch32.c to avoid a hyp-panic when starting
an aarch32 guest on a host built with the ASAN/KCOV debug options.

Fixes: 021234ef37 ("KVM: arm64: Make kvm_condition_valid32() accessible from EL2")
Fixes: 8cebe750c4 ("arm64: KVM: Make kvm_skip_instr32 available to HYP")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-05-24 14:53:20 +01:00
James Morse
b7c50fab66 KVM: arm64: Move pmu hyp code under hyp's Makefile to avoid instrumentation
KVM's pmu.c contains the __hyp_text needed to switch the pmu registers
between host and guest. Because this isn't covered by the 'hyp' Makefile,
it can be built with kasan and friends when these are enabled in Kconfig.

When starting a guest, this results in:
| Kernel panic - not syncing: HYP panic:
| PS:a00003c9 PC:000083000028ada0 ESR:86000007
| FAR:000083000028ada0 HPFAR:0000000029df5300 PAR:0000000000000000
| VCPU:000000004e10b7d6
| CPU: 0 PID: 3088 Comm: qemu-system-aar Not tainted 5.2.0-rc1 #11026
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Plat
| Call trace:
|  dump_backtrace+0x0/0x200
|  show_stack+0x20/0x30
|  dump_stack+0xec/0x158
|  panic+0x1ec/0x420
|  panic+0x0/0x420
| SMP: stopping secondary CPUs
| Kernel Offset: disabled
| CPU features: 0x002,25006082
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:

This is caused by functions in pmu.c calling the instrumented
code, which isn't mapped to hyp. From objdump -r:
| RELOCATION RECORDS FOR [.hyp.text]:
| OFFSET           TYPE              VALUE
| 0000000000000010 R_AARCH64_CALL26  __sanitizer_cov_trace_pc
| 0000000000000018 R_AARCH64_CALL26  __asan_load4_noabort
| 0000000000000024 R_AARCH64_CALL26  __asan_load4_noabort

Move the affected code to a new file under 'hyp's Makefile.

Fixes: 3d91befbb3 ("arm64: KVM: Enable !VHE support for :G/:H perf event modifiers")
Cc: Andrew Murray <Andrew.Murray@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-05-24 14:53:20 +01:00
Christoffer Dall
493fcbc843 MAINTAINERS: KVM: arm/arm64: Remove myself as maintainer
I no longer have time to actively review patches and manage the tree and
it's time to make that official.

Huge thanks to the incredible Linux community and all the contributors
who have put up with me over the past years.

I also take this opportunity to remove the website link to the Columbia
web page, as that information is no longer up to date and I don't know
who manages that anymore.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-05-24 14:53:20 +01:00
Paolo Bonzini
94830f188a KVM: s390: Fixes for s390
- Fix typo in module paramter description
 - Change default poll timer to improve cpu consumption
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJc4oBDAAoJEBF7vIC1phx8j7EP/0SiT9SNsmVX6Wl7iIU8geMp
 A1mzJTTdFgS+NunEZ5FXGTnLRI5JH2iAunFlE+RHfCy3ROmScZhROfRVnHmIcJMC
 IY+LW9JGl/b5T6KIgmudMmbV9xSBT3JdWP4JOns70mJCnCwjelJL2TYDZbP4Scdh
 FJ+qBHsb4xOOqhZ8yRgHsyV/hiPnFL4KhSSLTZhP8xr/YvIE0y+8P3XR1u69JadY
 KxSjL18XatoxZhqsJPFhQLwOwV2sbPlzpL5TUhVFg1bnNO1Kn0CBh3+u1AUMn7le
 op9crEWtYvx+N3RcjxrQE1yaVQNP3hDAG4lRqY+g3KM6ENaQ/gTsv9g/QzZflSpp
 IlzomnyXi2wwSM6OvVb7Y9qNkBrTMzHsBdpmnn9U8MVkisrXmXqVMzZDRXyLYwGi
 V7CpS4gxVJ2ZsrB2ikdeJu7AOVxlsUg+cBXsgzgI8QgoBSE0iO6leWjx+IvTxYky
 1BZ3pgl1kBUdqbJbidc1tbo6iCcJ3aEYeELK+lNOHTX2B/OFIwLlKvYP9BL5oZHx
 LRoXFv7lj3LaY8Rzj+QX5UXbuZQ/Mth06BEZOfUzISfM73Jiivp7o+8criAqvbbr
 aRu93X+6anWxYbgRiJqytXcN5fBxCJeLBqMJKNB4BtEE9witMqpq74g0Dj7qye0s
 qpwj1ay4KyqFCQRSU+Yw
 =uGH/
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-master-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes for s390

- Fix typo in module parameter description
- Change default poll timer to improve cpu consumption
2019-05-20 13:41:58 +02:00
Christian Borntraeger
6e9b622d1c KVM: s390: change default halt poll time to 50us
Recent measurements indicate that using 50us results in a reduced CPU
consumption, while still providing the benefit of halt polling. Let's
use 50us instead.

Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2019-05-20 09:40:39 +02:00
Wei Yongjun
b41fb528dd KVM: s390: fix typo in parameter description
Fix typo in parameter description.

Fixes: 8b905d28ee ("KVM: s390: provide kvm_arch_no_poll function")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Message-Id: <20190504065145.53665-1-weiyongjun1@huawei.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2019-05-20 09:40:38 +02:00
Linus Torvalds
a188339ca5 Linux 5.2-rc1 2019-05-19 15:47:09 -07:00
Linus Torvalds
2e2c122001 This pull request contains the following fixes for UBIFS:
- Build errors wrt. xattrs
 - Mismerge which lead to a wrong Kconfig ifdef
 - Missing endianness conversion
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlzhsxcWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wbuJEACe3ko0SOhNd0cP9jmHKguIuE10
 FceRVwzhcpJVAJvmoMcBlI36N9NhA+n/qSV6s03a58sX1ECFfeydAyT2pKPXMWiI
 h4gtNkAedPvLMeTRmP6gUFtvtMCKYWVeeqNZZoVjS/caIsb/v3fQZORflQ2uybxv
 +zdrZlrAKl9pP4j2C20mQD9z71jlTEiTys9zppVGcpeK1ByXRpDOTUcK9evVntyy
 k72VkWAnscYupfHehrBCLU04UZ4rI3zs9sk/Vyl6sa3ZiuwFDsJUjQc+Tq2D7ZKL
 XtenVr+qQ/4BMPQGFyXUk/YQCgqlTkdRLxIhMSyn2M2qOsVad57xKxTdB1HB2USp
 pfO4Ki0EsYcPuz+okMVTDfzPARUBA1RFJT0bOLqpVGP15ou52EOsqTwenxsyZCty
 AXPXXMtZO9qO00GdD0AcolgiMJGSQYsivkeQU7qgP7/jYDdKZ1NIVLtUUUBXm36g
 RvdfYJ+gTlaKWVTIimG9BouXnkxCDFmBcBJBXCfWSZXUD7dtlDOImgMGNmBiv+07
 l7mpTarJv2TB01AHaz1obv4iS5+5py5EM0HDPAOC2htslc1AsZnnrRQRN/Pohkvy
 36JSxxfiKZTZn0s9Syqg/i/cw90wuu2E8dWqJGDtVQD77U5rMaEuNxoqVqFd7ozR
 VV2Ry7SjX6Q0NqUnTQ==
 =0wS1
 -----END PGP SIGNATURE-----

Merge tag 'upstream-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBIFS fixes from Richard Weinberger:

 - build errors wrt xattrs

 - mismerge which lead to a wrong Kconfig ifdef

 - missing endianness conversion

* tag 'upstream-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubifs: Convert xattr inum to host order
  ubifs: Use correct config name for encryption
  ubifs: Fix build error without CONFIG_UBIFS_FS_XATTR
2019-05-19 15:22:03 -07:00
Linus Torvalds
cb6f8739fb Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
 "A few final bits:

   - large changes to vmalloc, yielding large performance benefits

   - tweak the console-flush-on-panic code

   - a few fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  panic: add an option to replay all the printk message in buffer
  initramfs: don't free a non-existent initrd
  fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
  mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock
  mm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macro
  mm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macro
  mm/vmalloc.c: keep track of free blocks for vmap allocation
2019-05-19 12:15:32 -07:00
Linus Torvalds
ff8583d6e4 Kbuild updates for v5.2 (2nd)
- remove unneeded use of cc-option, cc-disable-warning, cc-ldoption
 
  - exclude tracked files from .gitignore
 
  - re-enable -Wint-in-bool-context warning
 
  - refactor samples/Makefile
 
  - stop building immediately if syncconfig fails
 
  - do not sprinkle error messages when $(CC) does not exist
 
  - move arch/alpha/defconfig to the configs subdirectory
 
  - remove crappy header search path manipulation
 
  - add comment lines to .config to clarify the end of menu blocks
 
  - check uniqueness of module names (adding new warnings intentionally)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJc4KbfAAoJED2LAQed4NsGQW4P/3Iu+hM91EXqzdLqit99ML0x
 9hI3Ap0Xd5HvPy+j0xYmAeXyBYolGOVcwGNLv2WMPxOkGFxcBTfVmBgYvugboX30
 dwLMSXnL6MWL3xFVERTmRPl1STNwqHTnGal7Pru4HGUNnYUZJ9f9BGxoh0Qz55DJ
 j5ZQ9RhnYS+DmEo6OmKJepoiM77Pf3W+prCegoZ+5ZtWnYd622p9d/rjiimR94RU
 e5SFvGpSWbaUdaigjUqpA/GYW/iLy0T36xIUe20FQKe/8wstGOq61qKuPfgvxktA
 y1ajRfQbZ/8D3lJyIU7AOZ9xgoY05lmbTEPb6eP8WygG7chKRakX/vy1+AHmDp9D
 1K0IuwVdvFN2Qh5eLYkdtjX0tEUxX+m+CRv2NS75mNY/O3sTa7Yu6ZyZfCFRvGft
 ZOU6Axskr/L0f048WLs0TGpDD8wyi9d2VZywKH+7Kr9KImeqGU0kRD/JzgvCFmcy
 6lAfkptIMIL+VgSeN1igbJz6RjlGfTFvXPymvPzAcrTsVaxGHtjia/SC32h09Nw6
 gS2vD2lv/sO0EaSUjwlwrHC0HVj3lEX3cIA0RGabbfbht1D7mnDjOyi7HWnbs80D
 RHT4LCUuuPTufqCe+MQiQb08mcOSKcd58JdjZ1nfI/z/ME1zVkioMh63TFUPOfSX
 T8zcEjd0wxdeLnyNj6fK
 =oDeB
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - remove unneeded use of cc-option, cc-disable-warning, cc-ldoption

 - exclude tracked files from .gitignore

 - re-enable -Wint-in-bool-context warning

 - refactor samples/Makefile

 - stop building immediately if syncconfig fails

 - do not sprinkle error messages when $(CC) does not exist

 - move arch/alpha/defconfig to the configs subdirectory

 - remove crappy header search path manipulation

 - add comment lines to .config to clarify the end of menu blocks

 - check uniqueness of module names (adding new warnings intentionally)

* tag 'kbuild-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (24 commits)
  kconfig: use 'else ifneq' for Makefile to improve readability
  kbuild: check uniqueness of module names
  kconfig: Terminate menu blocks with a comment in the generated config
  kbuild: add LICENSES to KBUILD_ALLDIRS
  kbuild: remove 'addtree' and 'flags' magic for header search paths
  treewide: prefix header search paths with $(srctree)/
  media: prefix header search paths with $(srctree)/
  media: remove unneeded header search paths
  alpha: move arch/alpha/defconfig to arch/alpha/configs/defconfig
  kbuild: terminate Kconfig when $(CC) or $(LD) is missing
  kbuild: turn auto.conf.cmd into a mandatory include file
  .gitignore: exclude .get_maintainer.ignore and .gitattributes
  kbuild: add all Clang-specific flags unconditionally
  kbuild: Don't try to add '-fcatch-undefined-behavior' flag
  kbuild: add some extra warning flags unconditionally
  kbuild: add -Wvla flag unconditionally
  arch: remove dangling asm-generic wrappers
  samples: guard sub-directories with CONFIG options
  kbuild: re-enable int-in-bool-context warning
  MAINTAINERS: kbuild: Add pattern for scripts/*vmlinux*
  ...
2019-05-19 11:53:58 -07:00
Linus Torvalds
f23d8719e7 Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
 "Some I2C core API additions which are kind of simple but enhance error
  checking for users a lot, especially by returning errno now.

  There are wrappers to still support the old API but it will be removed
  once all users are converted"

* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: core: add device-managed version of i2c_new_dummy
  i2c: core: improve return value handling of i2c_new_device and i2c_new_dummy
2019-05-19 11:47:03 -07:00
Linus Torvalds
c4d36b63b2 Some bug fixes, and an update to the URL's for the final version of
Unicode 12.1.0.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlzg2qIACgkQ8vlZVpUN
 gaPYfgf+IadtbeBYbKcHQzw8OZ9UxyGv7+Havs523XzhdceiflpOCe3lpSFYuX6w
 0CtRIWEOqFzQLRHUlX/sRywYLvrvg3irDuYJGTzny9MfPEI+tVCkYi877PPAOOR3
 4xay7dF63mcCpmhRxgw1fLx5I/hYpzrDnw5dGe7jjgU6SrR5FpE2GmEmbfBW087W
 zAuRDkuTv03J791Kgpq4TAMNla9F2/oedq/2B0v43+TnKfYe1SEIDm3VusUvSfoI
 ++8CfRrRJ3D7vCmtFSuieFFb91HtHMqdv0gaad2cY1DfvJLVm7gP1uPL7QahA5Dq
 VfhvFfqshqN9Jizr6BqP5GPOj2J9cw==
 =QK+R
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Some bug fixes, and an update to the URL's for the final version of
  Unicode 12.1.0"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: avoid panic during forced reboot due to aborted journal
  ext4: fix block validity checks for journal inodes using indirect blocks
  unicode: update to Unicode 12.1.0 final
  unicode: add missing check for an error return from utf8lookup()
  ext4: fix miscellaneous sparse warnings
  ext4: unsigned int compared against zero
  ext4: fix use-after-free in dx_release()
  ext4: fix data corruption caused by overlapping unaligned and aligned IO
  jbd2: fix potential double free
  ext4: zero out the unused memory region in the extent tree block
2019-05-19 11:43:16 -07:00
Linus Torvalds
d8848eefc1 minor cleanup and fixes, one for stable, also adds SEEK_HOLE support
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlzfKpcACgkQiiy9cAdy
 T1G35AwAh/uXfq+F8Zk9EqqE8akRo6X0HDLzps5ICriQdYI2r5I+hUDLoRwNn2AX
 qeife0CBfcOyib2jlGV7rYjIPJ6SNAIUPLzMvupgXfMQABt+aOeFiQ6KQz/cYgd4
 bQEaLhOiCt7REc8vmhL6hKXjbypiO4wwLPl29kEjHr/ZNQUFCg8MBCpKY5uCpCQe
 paLfVusSzKTjuxRinSvUWSApASQ2GU9ys1hid3PN6/MUt1vD+IXnchWRUXBZ4sNi
 ztZgUNyH7AvdwoY+sM0hdT+jEehl4eTtDP1ZaeGz5Dhq4Fpe86OrfXBsySscUCgq
 wpjioh67ihSg9If8ZETfYJ2srn1VhyME01l30yuJB0HgRLebWXFXe4QDCbPYnUFh
 9wWZavRUlQuiKx9x2mszHBL5L3r6mZXdOHEumm2DV8TJFTjlxDt6jhdxsZA3HAzB
 6Rz0F24F/Dez4ONdYS0eO8am1CScW1tTJU9+6xyW2/XP3YZunx8B6SxU7lu3nzNN
 jqZ7pk7o
 =JhcA
 -----END PGP SIGNATURE-----

Merge tag '5.2-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Minor cleanup and fixes, one for stable, four rdma (smbdirect)
  related. Also adds SEEK_HOLE support"

* tag '5.2-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: add support for SEEK_DATA and SEEK_HOLE
  Fixed https://bugzilla.kernel.org/show_bug.cgi?id=202935 allow write on the same file
  cifs: Allocate memory for all iovs in smb2_ioctl
  cifs: Don't match port on SMBDirect transport
  cifs:smbd Use the correct DMA direction when sending data
  cifs:smbd When reconnecting to server, call smbd_destroy() after all MIDs have been called
  cifs: use the right include for signal_pending()
  smb3: trivial cleanup to smb2ops.c
  cifs: cleanup smb2ops.c and normalize strings
  smb3: display session id in debug data
2019-05-19 11:38:18 -07:00
Linus Torvalds
1ba3b5dc14 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling updates from Ingo Molnar:
 "perf.data:

   - Streaming compression of perf ring buffer into
     PERF_RECORD_COMPRESSED user space records, resulting in ~3-5x
     perf.data file size reduction on variety of tested workloads what
     saves storage space on larger server systems where perf.data size
     can easily reach several tens or even hundreds of GiBs, especially
     when profiling with DWARF-based stacks and tracing of context
     switches.

  perf record:

   - Improve -user-regs/intr-regs suggestions to overcome errors

  perf annotate:

   - Remove hist__account_cycles() from callback, speeding up branch
     processing (perf record -b)

  perf stat:

   - Add a 'percore' event qualifier, e.g.: -e
     cpu/event=0,umask=0x3,percore=1/, that sums up the event counts for
     both hardware threads in a core.

     We can already do this with --per-core, but it's often useful to do
     this together with other metrics that are collected per hardware
     thread.

     I.e. now its possible to do this per-event, and have it mixed with
     other events not aggregated by core.

  arm64:

   - Map Brahma-B53 CPUID to cortex-a53 events.

   - Add Cortex-A57 and Cortex-A72 events.

  csky:

   - Add DWARF register mappings for libdw, allowing --call-graph=dwarf
     to work on the C-SKY arch.

  x86:

   - Add support for recording and printing XMM registers, available,
     for instance, on Icelake.

   - Add uncore_upi (Intel's "Ultra Path Interconnect" events) JSON
     support. UPI replaced the Intel QuickPath Interconnect (QPI) in
     Xeon Skylake-SP.

  Intel PT:

   - Fix instructions sampling rate.

   - Timestamp fixes.

   - Improve exported-sql-viewer GUI, allowing, for instance, to
     copy'n'paste the trees, useful for e-mailing"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
  perf stat: Support 'percore' event qualifier
  perf stat: Factor out aggregate counts printing
  perf tools: Add a 'percore' event qualifier
  perf docs: Add description for stderr
  perf intel-pt: Fix sample timestamp wrt non-taken branches
  perf intel-pt: Fix improved sample timestamp
  perf intel-pt: Fix instructions sampling rate
  perf regs x86: Add X86 specific arch__intr_reg_mask()
  perf parse-regs: Add generic support for arch__intr/user_reg_mask()
  perf parse-regs: Split parse_regs
  perf vendor events arm64: Add Cortex-A57 and Cortex-A72 events
  perf vendor events arm64: Map Brahma-B53 CPUID to cortex-a53 events
  perf vendor events arm64: Remove [[:xdigit:]] wildcard
  perf jevents: Remove unused variable
  perf test zstd: Fixup verbose mode output
  perf tests: Implement Zstd comp/decomp integration test
  perf inject: Enable COMPRESSED record decompression
  perf report: Implement perf.data record decompression
  perf record: Implement -z,--compression_level[=<n>] option
  perf report: Add stub processing of compressed events for -D
  ...
2019-05-19 11:20:22 -07:00
Linus Torvalds
a13f950ef1 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull clocksource updates from Ingo Molnar:
 "Misc clocksource/clockevent driver updates that came in a bit late but
  are ready for v5.2"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  misc: atmel_tclib: Do not probe already used TCBs
  clocksource/drivers/timer-atmel-tcb: Convert tc_clksrc_suspend|resume() to static
  clocksource/drivers/tcb_clksrc: Rename the file for consistency
  clocksource/drivers/timer-atmel-pit: Rework Kconfig option
  clocksource/drivers/tcb_clksrc: Move Kconfig option
  ARM: at91: Implement clocksource selection
  clocksource/drivers/tcb_clksrc: Use tcb as sched_clock
  clocksource/drivers/tcb_clksrc: Stop depending on atmel_tclib
  ARM: at91: move SoC specific definitions to SoC folder
  clocksource/drivers/timer-milbeaut: Cleanup common register accesses
  clocksource/drivers/timer-milbeaut: Add shutdown function
  clocksource/drivers/timer-milbeaut: Fix to enable one-shot timer
  clocksource/drivers/tegra: Rework for compensation of suspend time
  clocksource/drivers/sp804: Add COMPILE_TEST to CONFIG_ARM_TIMER_SP804
  clocksource/drivers/sun4i: Add a compatible for suniv
  dt-bindings: timer: Add Allwinner suniv timer
2019-05-19 11:11:20 -07:00
Linus Torvalds
d9351ea14d Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull IRQ chip updates from Ingo Molnar:
 "A late irqchips update:

   - New TI INTR/INTA set of drivers

   - Rewrite of the stm32mp1-exti driver as a platform driver

   - Update the IOMMU MSI mapping API to be RT friendly

   - A number of cleanups and other low impact fixes"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
  iommu/dma-iommu: Remove iommu_dma_map_msi_msg()
  irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg()
  irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg()
  irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg()
  irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg()
  iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts
  genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
  arm64: arch_k3: Enable interrupt controller drivers
  irqchip/ti-sci-inta: Add msi domain support
  soc: ti: Add MSI domain bus support for Interrupt Aggregator
  irqchip/ti-sci-inta: Add support for Interrupt Aggregator driver
  dt-bindings: irqchip: Introduce TISCI Interrupt Aggregator bindings
  irqchip/ti-sci-intr: Add support for Interrupt Router driver
  dt-bindings: irqchip: Introduce TISCI Interrupt router bindings
  gpio: thunderx: Use the default parent apis for {request,release}_resources
  genirq: Introduce irq_chip_{request,release}_resource_parent() apis
  firmware: ti_sci: Add helper apis to manage resources
  firmware: ti_sci: Add RM mapping table for am654
  firmware: ti_sci: Add support for IRQ management
  firmware: ti_sci: Add support for RM core ops
  ...
2019-05-19 10:58:45 -07:00
Linus Torvalds
39feaa3ff4 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fix from Ingo Molnar:
 "Fix an EFI-fb regression that affects certain x86 systems"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  fbdev/efifb: Ignore framebuffer memmap entries that lack any memory types
2019-05-19 10:33:26 -07:00
Linus Torvalds
1335d9a1fb Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
 "This fixes a particularly thorny munmap() bug with MPX, plus fixes a
  host build environment assumption in objtool"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Allow AR to be overridden with HOSTAR
  x86/mpx, mm/core: Fix recursive munmap() corruption
2019-05-19 10:23:24 -07:00
Linus Torvalds
4c4a5c99af ARM: SoC: late updates
This is some material that we picked up into our tree late. Most of it
 are smaller fixes and additions, some defconfig updates due to recent
 development, etc.
 
 Code-wise the largest portion is a series of PM updates for the at91
 platform, and those have been in linux-next a while through the at91
 tree before we picked them up.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAlzgimIPHG9sb2ZAbGl4
 b20ubmV0AAoJEIwa5zzehBx3ExIP/R6c8wZtvvpf7tuyJudQbTbN/C4P4ZJqGNOc
 aJ1jzax5PGQdaA60wHR9SoO3GURRRvg4FcqUsbn7m13Lo6qjbVEOHJSRou9gTUSf
 Fl/6b/WqsF8AoGCM108GhiZyq2HGW1iG6ypFhZdb5UGLLnMZIxPkE0aJ31uR0NQi
 AbYF4YB033PIWqsaCIe+uIyYwaw2Q4LJuxvBvr11BRrp4WfPqyfkJWqAS+5KmMNN
 EuEoEA5GfJaBOE0HyIlbs9O9vLdAatroTHC8x1IuyN9+tWzx/tnbNovcFu7S1cl3
 rV5inGF6uUxMGLtptWjQ3Z5rZFG1XcmQKDNF5UgVgdVutHCU/igYSsrEP3QVMwEf
 OFWRT/fwPUMSreMQ49li0ruFaR3q6kWl7SWGzBGUhBQZ29oK8BmNdsZA9tjPLYBW
 5L7MCOSAyH6J2Nk0XsqtOfZJHnevBS2NVSzLSN1LtK+qsMy9Hrnr43tfnYq6/c8h
 tYONs9I7HyXVi6e+0NILaoSan49Mxi3Trp5YySdl41xmtswhx8OVrjQVySBrH4pA
 OYY3jyExERvOgtvnXjqdOOfoB8iS05gbuwS/UukvWdygekBjyCNQ+kz8hEKajPED
 WiSMUjS68KGINx1YPTS+AwLljaASlY8BSzrjrnoOvwOirX/QelpTm7qpvbFERV0M
 hEbtbQyt
 =XCCp
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC late updates from Olof Johansson:
 "This is some material that we picked up into our tree late. Most of it
  are smaller fixes and additions, some defconfig updates due to recent
  development, etc.

  Code-wise the largest portion is a series of PM updates for the at91
  platform, and those have been in linux-next a while through the at91
  tree before we picked them up"

* tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
  arm64: dts: sprd: Add clock properties for serial devices
  Opt out of scripts/get_maintainer.pl
  ARM: ixp4xx: Remove duplicated include from common.c
  soc: ixp4xx: qmgr: Fix an NULL vs IS_ERR() check in probe
  arm64: tegra: Disable XUSB support on Jetson TX2
  arm64: tegra: Enable SMMU translation for PCI on Tegra186
  arm64: tegra: Fix insecure SMMU users for Tegra186
  arm64: tegra: Select ARM_GIC_PM
  amba: tegra-ahb: Mark PM functions as __maybe_unused
  ARM: dts: logicpd-som-lv: Fix MMC1 card detect
  ARM: mvebu: drop return from void function
  ARM: mvebu: prefix coprocessor operand with p
  ARM: mvebu: drop unnecessary label
  ARM: mvebu: fix a leaked reference by adding missing of_node_put
  ARM: socfpga_defconfig: enable LTC2497
  ARM: mvebu: kirkwood: remove error message when retrieving mac address
  ARM: at91: sama5: make ov2640 as a module
  ARM: OMAP1: ams-delta: fix early boot crash when LED support is disabled
  ARM: at91: remove HAVE_FB_ATMEL for sama5 SoC as they use DRM
  soc/fsl/qe: Fix an error code in qe_pin_request()
  ...
2019-05-19 10:16:39 -07:00
Linus Torvalds
86a78a8b8d powerpc fixes for 5.2 #2
One fix going back to stable, for a bug on 32-bit introduced when we added
 support for THREAD_INFO_IN_TASK.
 
 A fix for a typo in a recent rework of our hugetlb code that leads to crashes on
 64-bit when using hugetlbfs with a 4K PAGE_SIZE.
 
 Two fixes for our recent rework of the address layout on 64-bit hash CPUs, both
 only triggered when userspace tries to access addresses outside the user or
 kernel address ranges.
 
 Finally a fix for a recently introduced double free in an error path in our
 cacheinfo code.
 
 Thanks to:
   Aneesh Kumar K.V, Christophe Leroy, Sachin Sant, Tobin C. Harding.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJc3+WRAAoJEFHr6jzI4aWAlN4QAIP7UnQUApYzQX8YrMJEiip7
 1Jtez373pgDEX21McmaznyYRy04gtLWVRV0D5vRsTCG5RHBOVt2KPXe0PrzyjJws
 /AxS9aRuMgM+VkLkD9c6DzWsuBLP8kJMfuTbmY7+C0tByQQT9Xfp+K7/gC5kGlK4
 igyTGZvALlEVilsjTQO/FhADWisYIHM+y2/e0ocxLlK5F+TxdJKpESyUzjMOT78i
 SmAn1qufZedV7ZH91VmvLm8f8MSqg+t1NTwpAnXuS58z2eSNisRhUcP43K4XdAez
 QTGODTgUGGzLsbnIq8KJB/X4YVLwtTR2UM3p6ubTwlDuVkBfrFPUHowywDmxoPHZ
 Kh6KWlRFAL49sGmYMJpfkaEiyN+J+VLN2HMdNLW2HgnROzDSOXCXcjuAuBE9nAKe
 kvMV7Zwb4mQ0j0QUxHuugbBRzUpAcDg+PNZKKb9P/jIMWlMhGyb9fjGxC1yJfDtB
 Kq03zJ+0lzvDCr1XbcDhSv4Fu6Dxxfi7GX9BXlzHTzsGFwZKICj9sLXDXXjFh4h8
 CDNAeWtol1WE2xj315LL6WTlONUMQJ3wDAzd5VJw7SewadDzpK57vuwba88IN9hd
 2PmH7FF8VyrksKmqt19ZsF0X7n7JL3+xbOcV9OT0r0DjkkM1fC6zo+JT6YYIER6A
 FE+pFQRQJndKRzHqvyBJ
 =2265
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "One fix going back to stable, for a bug on 32-bit introduced when we
  added support for THREAD_INFO_IN_TASK.

  A fix for a typo in a recent rework of our hugetlb code that leads to
  crashes on 64-bit when using hugetlbfs with a 4K PAGE_SIZE.

  Two fixes for our recent rework of the address layout on 64-bit hash
  CPUs, both only triggered when userspace tries to access addresses
  outside the user or kernel address ranges.

  Finally a fix for a recently introduced double free in an error path
  in our cacheinfo code.

  Thanks to: Aneesh Kumar K.V, Christophe Leroy, Sachin Sant, Tobin C.
  Harding"

* tag 'powerpc-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/cacheinfo: Remove double free
  powerpc/mm/hash: Fix get_region_id() for invalid addresses
  powerpc/mm: Drop VM_BUG_ON in get_region_id()
  powerpc/mm: Fix crashes with hugepages & 4K pages
  powerpc/32s: fix flush_hash_pages() on SMP
2019-05-19 10:10:15 -07:00
Linus Torvalds
bcd1739788 A few more MIPS changes for 5.2:
- A build fix for BMIPS5000 configurations with CONFIG_HW_PERF_EVENTS=y,
   which also neatly removes some #ifdefery.
 
 - A fix to report supported ISAs correctly on older Ingenic SoCs which
   incorrectly indicate MIPSr2 support in their cop0 Config register.
 
 - Some PCI modernization for SGI IP27 systems as part of ongoing work to
   support some other SGI systems.
 
 - A fix allowing use of appended DTB files with generic kernels.
 
 - DMA mask fixes for SGI IP22 & Alchemy systems.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXN+icRUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN27UQD/b6+4PE/htS7dWoWwO+qYpMK2tGNG
 q1WJEkxxpLSZX9YBAPNFuDuIPPq1+T+1LGxVhUmIwq5LM5jpx2MkVWdUJ6AJ
 =4zY2
 -----END PGP SIGNATURE-----

Merge tag 'mips_5.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull a few more MIPS updates from Paul Burton:
 "Some SGI IP27 specific PCI rework and a batch of fixes:

   - A build fix for BMIPS5000 configurations with
     CONFIG_HW_PERF_EVENTS=y, which also neatly removes some #ifdefery.

   - A fix to report supported ISAs correctly on older Ingenic SoCs
     which incorrectly indicate MIPSr2 support in their cop0 Config
     register.

   - Some PCI modernization for SGI IP27 systems as part of ongoing work
     to support some other SGI systems.

   - A fix allowing use of appended DTB files with generic kernels.

   - DMA mask fixes for SGI IP22 & Alchemy systems"

* tag 'mips_5.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Alchemy: add DMA masks for on-chip ethernet
  MIPS: SGI-IP22: provide missing dma_mask/coherent_dma_mask
  generic: fix appended dtb support
  MIPS: SGI-IP27: abstract chipset irq from bridge
  MIPS: SGI-IP27: use generic PCI driver
  MIPS: Fix Ingenic SoCs sometimes reporting wrong ISA
  MIPS: perf: Fix build with CONFIG_CPU_BMIPS5000 enabled
2019-05-19 10:05:28 -07:00
Linus Torvalds
b0bb1269b9 RISC-V Patches for the 5.2 Merge Window, Part 1 v3
This patch set contains an assortment of RISC-V related patches that I'd
 like to target for the 5.2 merge window.  Most of the patches are
 cleanups, but there are a handful of user-visible changes:
 
 * The nosmp and nr_cpus command-line arguments are now supported, which
   work like normal.
 * The SBI console no longer installs itself as a preferred console, we
   rely on standard mechanisms (/chosen, command-line, hueristics)
   instead.
 * sfence_remove_sfence_vma{,_asid} now pass their arguments along to the
   SBI call.
 * Modules now support BUG().
 * A missing sfence.vma during boot has been added.  This bug only
   manifests during boot.
 * The arch/riscv support for SiFive's L2 cache controller has been
   merged, which should un-block the EDAC framework work.
 
 I've only tested this on QEMU again, as I didn't have time to get things
 running on the Unleashed.  The latest master from this morning merges in
 cleanly and passes the tests as well.
 
 This patch set rebased my "5.2 MW, Part 1" patch set which includes an
 erronous empty file.  It's also a rebase of my "5.2 MW, Part 2" patch
 set, in which I managed to create another file while attempting to
 remove the empty file.
 
 Sorry for all the noise!
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlzeLhUTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQXV/D/9nz8KYxNKOVIXft27mw93Qnx5joblg
 fibA7nGDuxCszSC3tfyaROJZuGKe1G24vP4RG7aVs+iwRmmFhtVdPwm7ZvIr+DfU
 a5mzwWkxhMZP8lgxMAIn7iM/NWrBm7rWdGTU0BYjHlGkQ5z3WA67rU/r/vrowhUN
 zK1U/ATLvFWDJv5rdDj8/T2rDJzWtAsuy2qlmQN30CCJoOXXgIdAj+fVG4IYoxO9
 2+NFJU4Y0a+YczWW3qaGFjTaYYt/sNr/uA8AoBNqV1NvsopK1UO3txbcfJwvZZC3
 JFU9WBjC7xuF2ihMWecIZ7XljZeqhlsP7lZDizatQ/mdL9k7+6elk1sdcNLC23dN
 VWJakudE42dISCwSh49fAbeNSl/3R5VWSlZmVO18gsmslkGa4FwuoKjklnxx7hYx
 fQfvaqMIEXy3YmKtmFneUXLdcGoWOjV0FfDh5Ye582tAmB2TzvgEJHPJI7suUA/a
 RkZHcmVJTSRBMe2fS0qkYxy/wdIDtRW2yjypssl9G6zQPPCVW+maD70m/9oVdsgm
 IL8MpoDxW0uAYsV8Ctt1/+Ux+BObMADIml/1HPQyBRA0qhorQQWk0TcbjEXeIShs
 OOG8byAQUJx98z62zrKQ53+Pxdevcja6uKxu3f0yEHxl19dBJdT2BM6rjs3sO1hi
 c3tX/U8o39H0Kg==
 =mZwx
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.2-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux

Pull RISC-V updates from Palmer Dabbelt:
 "This contains an assortment of RISC-V related patches that I'd like to
  target for the 5.2 merge window. Most of the patches are cleanups, but
  there are a handful of user-visible changes:

   - The nosmp and nr_cpus command-line arguments are now supported,
     which work like normal.

   - The SBI console no longer installs itself as a preferred console,
     we rely on standard mechanisms (/chosen, command-line, hueristics)
     instead.

   - sfence_remove_sfence_vma{,_asid} now pass their arguments along to
     the SBI call.

   - Modules now support BUG().

   - A missing sfence.vma during boot has been added. This bug only
     manifests during boot.

   - The arch/riscv support for SiFive's L2 cache controller has been
     merged, which should un-block the EDAC framework work.

  I've only tested this on QEMU again, as I didn't have time to get
  things running on the Unleashed. The latest master from this morning
  merges in cleanly and passes the tests as well"

* tag 'riscv-for-linus-5.2-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: (31 commits)
  riscv: fix locking violation in page fault handler
  RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs
  RISC-V: Add DT documentation for SiFive L2 Cache Controller
  RISC-V: Avoid using invalid intermediate translations
  riscv: Support BUG() in kernel module
  riscv: Add the support for c.ebreak check in is_valid_bugaddr()
  riscv: support trap-based WARN()
  riscv: fix sbi_remote_sfence_vma{,_asid}.
  riscv: move switch_mm to its own file
  riscv: move flush_icache_{all,mm} to cacheflush.c
  tty: Don't force RISCV SBI console as preferred console
  RISC-V: Access CSRs using CSR numbers
  RISC-V: Add interrupt related SCAUSE defines in asm/csr.h
  RISC-V: Use tabs to align macro values in asm/csr.h
  RISC-V: Fix minor checkpatch issues.
  RISC-V: Support nr_cpus command line option.
  RISC-V: Implement nosmp commandline option.
  RISC-V: Add RISC-V specific arch_match_cpu_phys_id
  riscv: vdso: drop unnecessary cc-ldoption
  riscv: call pm_power_off from machine_halt / machine_power_off
  ...
2019-05-19 09:56:36 -07:00
Masahiro Yamada
fc2694ec1a kconfig: use 'else ifneq' for Makefile to improve readability
'ifeq ... else ifneq ... endif' notation is supported by GNU Make 3.81
or later, which is the requirement for building the kernel since
commit 37d69ee308 ("docs: bump minimal GNU Make version to 3.81").

Use it to improve the readability.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-05-19 09:34:35 +09:00
Feng Tang
de6da1e8bc panic: add an option to replay all the printk message in buffer
Currently on panic, kernel will lower the loglevel and print out pending
printk msg only with console_flush_on_panic().

Add an option for users to configure the "panic_print" to replay all
dmesg in buffer, some of which they may have never seen due to the
loglevel setting, which will help panic debugging .

[feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()]
  Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com
[feng.tang@intel.com: use logbuf lock to protect the console log index]
  Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18 15:52:26 -07:00
Steven Price
5d59aa8f9c initramfs: don't free a non-existent initrd
Since commit 54c7a8916a ("initramfs: free initrd memory if opening
/initrd.image fails"), the kernel has unconditionally attempted to free
the initrd even if it doesn't exist.

In the non-existent case this causes a boot-time splat if
CONFIG_DEBUG_VIRTUAL is enabled due to a call to virt_to_phys() with a
NULL address.

Instead we should check that the initrd actually exists and only attempt
to free it if it does.

Link: http://lkml.kernel.org/r/20190516143125.48948-1-steven.price@arm.com
Fixes: 54c7a8916a ("initramfs: free initrd memory if opening /initrd.image fails")
Signed-off-by: Steven Price <steven.price@arm.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18 15:52:26 -07:00
Jiufei Xue
ec084de929 fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
switch may not go to the workqueue after synchronize_rcu().  Thus
previous scheduled switches was not finished even flushing the
workqueue, which will cause a NULL pointer dereferenced followed below.

  VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds.  Have a nice day...
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
    evict+0xb3/0x180
    iput+0x1b0/0x230
    inode_switch_wbs_work_fn+0x3c0/0x6a0
    worker_thread+0x4e/0x490
    ? process_one_work+0x410/0x410
    kthread+0xe6/0x100
    ret_from_fork+0x39/0x50

Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
pending callbacks to finish.  And inc isw_nr_in_flight after call_rcu()
in inode_switch_wbs() to make more sense.

Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.com
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: Tejun Heo <tj@kernel.org>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18 15:52:26 -07:00
Mel Gorman
60fce36afa mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock
syzbot reported the following error from a tree with a head commit of
baf76f0c58 ("slip: make slhc_free() silently accept an error pointer")

  BUG: unable to handle kernel paging request at ffffea0003348000
  #PF error: [normal kernel read fault]
  PGD 12c3f9067 P4D 12c3f9067 PUD 12c3f8067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 28916 Comm: syz-executor.2 Not tainted 5.1.0-rc6+ #89
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:314 [inline]
  RIP: 0010:PageCompound include/linux/page-flags.h:186 [inline]
  RIP: 0010:isolate_freepages_block+0x1c0/0xd40 mm/compaction.c:579
  Code: 01 d8 ff 4d 85 ed 0f 84 ef 07 00 00 e8 29 00 d8 ff 4c 89 e0 83 85 38 ff
  ff ff 01 48 c1 e8 03 42 80 3c 38 00 0f 85 31 0a 00 00 <4d> 8b 2c 24 31 ff 49
  c1 ed 10 41 83 e5 01 44 89 ee e8 3a 01 d8 ff
  RSP: 0018:ffff88802b31eab8 EFLAGS: 00010246
  RAX: 1ffffd4000669000 RBX: 00000000000cd200 RCX: ffffc9000a235000
  RDX: 000000000001ca5e RSI: ffffffff81988cc7 RDI: 0000000000000001
  RBP: ffff88802b31ebd8 R08: ffff88805af700c0 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0003348000
  R13: 0000000000000000 R14: ffff88802b31f030 R15: dffffc0000000000
  FS:  00007f61648dc700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffea0003348000 CR3: 0000000037c64000 CR4: 00000000001426e0
  Call Trace:
   fast_isolate_around mm/compaction.c:1243 [inline]
   fast_isolate_freepages mm/compaction.c:1418 [inline]
   isolate_freepages mm/compaction.c:1438 [inline]
   compaction_alloc+0x1aee/0x22e0 mm/compaction.c:1550

There is no reproducer and it is difficult to hit -- 1 crash every few
days.  The issue is very similar to the fix in commit 6b0868c820
("mm/compaction.c: correct zone boundary handling when resetting pageblock
skip hints").  When isolating free pages around a target pageblock, the
boundary handling is off by one and can stray into the next pageblock.
Triggering the syzbot error requires that the end of pageblock is section
or zone aligned, and that the next section is unpopulated.

A more subtle consequence of the bug is that pageblocks were being
improperly used as migration targets which potentially hurts fragmentation
avoidance in the long-term one page at a time.

A debugging patch revealed that it's definitely possible to stray outside
of a pageblock which is not intended.  While syzbot cannot be used to
verify this patch, it was confirmed that the debugging warning no longer
triggers with this patch applied.  It has also been confirmed that the THP
allocation stress tests are not degraded by this patch.

Link: http://lkml.kernel.org/r/20190510182124.GI18914@techsingularity.net
Fixes: e332f741a8 ("mm, compaction: be selective about what pageblocks to clear skip hints")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: syzbot+d84c80f9fe26a0f7a734@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> # v5.1+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18 15:52:26 -07:00
Uladzislau Rezki (Sony)
a6cf4e0fe3 mm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macro
This macro adds some debug code to check that vmap allocations are
happened in ascending order.

By default this option is set to 0 and not active.  It requires
recompilation of the kernel to activate it.  Set to 1, compile the
kernel.

[urezki@gmail.com: v4]
  Link: http://lkml.kernel.org/r/20190406183508.25273-4-urezki@gmail.com
Link: http://lkml.kernel.org/r/20190402162531.10888-4-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18 15:52:26 -07:00
Uladzislau Rezki (Sony)
bb850f4dae mm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macro
This macro adds some debug code to check that the augment tree is
maintained correctly, meaning that every node contains valid
subtree_max_size value.

By default this option is set to 0 and not active.  It requires
recompilation of the kernel to activate it.  Set to 1, compile the
kernel.

[urezki@gmail.com: v4]
  Link: http://lkml.kernel.org/r/20190406183508.25273-3-urezki@gmail.com
Link: http://lkml.kernel.org/r/20190402162531.10888-3-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18 15:52:26 -07:00
Uladzislau Rezki (Sony)
68ad4a3304 mm/vmalloc.c: keep track of free blocks for vmap allocation
Patch series "improve vmap allocation", v3.

Objective
---------

Please have a look for the description at:

  https://lkml.org/lkml/2018/10/19/786

but let me also summarize it a bit here as well.

The current implementation has O(N) complexity. Requests with different
permissive parameters can lead to long allocation time. When i say
"long" i mean milliseconds.

Description
-----------

This approach organizes the KVA memory layout into free areas of the
1-ULONG_MAX range, i.e.  an allocation is done over free areas lookups,
instead of finding a hole between two busy blocks.  It allows to have
lower number of objects which represent the free space, therefore to have
less fragmented memory allocator.  Because free blocks are always as large
as possible.

It uses the augment tree where all free areas are sorted in ascending
order of va->va_start address in pair with linked list that provides
O(1) access to prev/next elements.

Since the tree is augment, we also maintain the "subtree_max_size" of VA
that reflects a maximum available free block in its left or right
sub-tree.  Knowing that, we can easily traversal toward the lowest (left
most path) free area.

Allocation: ~O(log(N)) complexity.  It is sequential allocation method
therefore tends to maximize locality.  The search is done until a first
suitable block is large enough to encompass the requested parameters.
Bigger areas are split.

I copy paste here the description of how the area is split, since i
described it in https://lkml.org/lkml/2018/10/19/786

<snip>

A free block can be split by three different ways.  Their names are
FL_FIT_TYPE, LE_FIT_TYPE/RE_FIT_TYPE and NE_FIT_TYPE, i.e.  they
correspond to how requested size and alignment fit to a free block.

FL_FIT_TYPE - in this case a free block is just removed from the free
list/tree because it fully fits.  Comparing with current design there is
an extra work with rb-tree updating.

LE_FIT_TYPE/RE_FIT_TYPE - left/right edges fit.  In this case what we do
is just cutting a free block.  It is as fast as a current design.  Most of
the vmalloc allocations just end up with this case, because the edge is
always aligned to 1.

NE_FIT_TYPE - Is much less common case.  Basically it happens when
requested size and alignment does not fit left nor right edges, i.e.  it
is between them.  In this case during splitting we have to build a
remaining left free area and place it back to the free list/tree.

Comparing with current design there are two extra steps.  First one is we
have to allocate a new vmap_area structure.  Second one we have to insert
that remaining free block to the address sorted list/tree.

In order to optimize a first case there is a cache with free_vmap objects.
Instead of allocating from slab we just take an object from the cache and
reuse it.

Second one is pretty optimized.  Since we know a start point in the tree
we do not do a search from the top.  Instead a traversal begins from a
rb-tree node we split.
<snip>

De-allocation.  ~O(log(N)) complexity.  An area is not inserted straight
away to the tree/list, instead we identify the spot first, checking if it
can be merged around neighbors.  The list provides O(1) access to
prev/next, so it is pretty fast to check it.  Summarizing.  If merged then
large coalesced areas are created, if not the area is just linked making
more fragments.

There is one more thing that i should mention here.  After modification of
VA node, its subtree_max_size is updated if it was/is the biggest area in
its left or right sub-tree.  Apart of that it can also be populated back
to upper levels to fix the tree.  For more details please have a look at
the __augment_tree_propagate_from() function and the description.

Tests and stressing
-------------------

I use the "test_vmalloc.sh" test driver available under
"tools/testing/selftests/vm/" since 5.1-rc1 kernel.  Just trigger "sudo
./test_vmalloc.sh" to find out how to deal with it.

Tested on different platforms including x86_64/i686/ARM64/x86_64_NUMA.
Regarding last one, i do not have any physical access to NUMA system,
therefore i emulated it.  The time of stressing is days.

If you run the test driver in "stress mode", you also need the patch that
is in Andrew's tree but not in Linux 5.1-rc1.  So, please apply it:

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=e0cf7749bade6da318e98e934a24d8b62fab512c

After massive testing, i have not identified any problems like memory
leaks, crashes or kernel panics.  I find it stable, but more testing would
be good.

Performance analysis
--------------------

I have used two systems to test.  One is i5-3320M CPU @ 2.60GHz and
another is HiKey960(arm64) board.  i5-3320M runs on 4.20 kernel, whereas
Hikey960 uses 4.15 kernel.  I have both system which could run on 5.1-rc1
as well, but the results have not been ready by time i an writing this.

Currently it consist of 8 tests.  There are three of them which correspond
to different types of splitting(to compare with default).  We have 3
ones(see above).  Another 5 do allocations in different conditions.

a) sudo ./test_vmalloc.sh performance

When the test driver is run in "performance" mode, it runs all available
tests pinned to first online CPU with sequential execution test order.  We
do it in order to get stable and repeatable results.  Take a look at time
difference in "long_busy_list_alloc_test".  It is not surprising because
the worst case is O(N).

# i5-3320M
How many cycles all tests took:
CPU0=646919905370(default) cycles vs CPU0=193290498550(patched) cycles

# See detailed table with results here:
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_default.txt
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_patched.txt

# Hikey960 8x CPUs
How many cycles all tests took:
CPU0=3478683207 cycles vs CPU0=463767978 cycles

# See detailed table with results here:
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_default.txt
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_patched.txt

b) time sudo ./test_vmalloc.sh test_repeat_count=1

With this configuration, all tests are run on all available online CPUs.
Before running each CPU shuffles its tests execution order.  It gives
random allocation behaviour.  So it is rough comparison, but it puts in
the picture for sure.

# i5-3320M
<default>            vs            <patched>
real    101m22.813s                real    0m56.805s
user    0m0.011s                   user    0m0.015s
sys     0m5.076s                   sys     0m0.023s

# See detailed table with results here:
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_default.txt
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_patched.txt

# Hikey960 8x CPUs
<default>            vs            <patched>
real    unknown                    real    4m25.214s
user    unknown                    user    0m0.011s
sys     unknown                    sys     0m0.670s

I did not manage to complete this test on "default Hikey960" kernel
version.  After 24 hours it was still running, therefore i had to cancel
it.  That is why real/user/sys are "unknown".

This patch (of 3):

Currently an allocation of the new vmap area is done over busy list
iteration(complexity O(n)) until a suitable hole is found between two busy
areas.  Therefore each new allocation causes the list being grown.  Due to
over fragmented list and different permissive parameters an allocation can
take a long time.  For example on embedded devices it is milliseconds.

This patch organizes the KVA memory layout into free areas of the
1-ULONG_MAX range.  It uses an augment red-black tree that keeps blocks
sorted by their offsets in pair with linked list keeping the free space in
order of increasing addresses.

Nodes are augmented with the size of the maximum available free block in
its left or right sub-tree.  Thus, that allows to take a decision and
traversal toward the block that will fit and will have the lowest start
address, i.e.  it is sequential allocation.

Allocation: to allocate a new block a search is done over the tree until a
suitable lowest(left most) block is large enough to encompass: the
requested size, alignment and vstart point.  If the block is bigger than
requested size - it is split.

De-allocation: when a busy vmap area is freed it can either be merged or
inserted to the tree.  Red-black tree allows efficiently find a spot
whereas a linked list provides a constant-time access to previous and next
blocks to check if merging can be done.  In case of merging of
de-allocated memory chunk a large coalesced area is created.

Complexity: ~O(log(N))

[urezki@gmail.com: v3]
  Link: http://lkml.kernel.org/r/20190402162531.10888-2-urezki@gmail.com
[urezki@gmail.com: v4]
  Link: http://lkml.kernel.org/r/20190406183508.25273-2-urezki@gmail.com
Link: http://lkml.kernel.org/r/20190321190327.11813-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18 15:52:26 -07:00
Ingo Molnar
62e1c09418 perf/core improvements and fixes:
perf.data:
 
   Alexey Budankov:
 
   - Streaming compression of perf ring buffer into PERF_RECORD_COMPRESSED
     user space records, resulting in ~3-5x perf.data file size reduction
     on variety of tested workloads what saves storage space on larger
     server systems where perf.data size can easily reach several tens or
     even hundreds of GiBs, especially when profiling with DWARF-based
     stacks and tracing of context switches.
 
 perf record:
 
   Arnaldo Carvalho de Melo
 
   - Improve -user-regs/intr-regs suggestions to overcome errors.
 
 perf annotate:
 
   Jin Yao:
 
   - Remove hist__account_cycles() from callback, speeding up branch processing
     (perf record -b).
 
 perf stat:
 
   - Add a 'percore' event qualifier, e.g.: -e cpu/event=0,umask=0x3,percore=1/,
     that sums up the event counts for both hardware threads in a core.
 
     We can already do this with --per-core, but it's often useful to do
     this together with other metrics that are collected per hardware thread.
 
     I.e. now its possible to do this per-event, and have it mixed with other
     events not aggregated by core.
 
 core libraries:
 
   Donald Yandt:
 
   - Check for errors when doing fgets(/proc/version).
 
   Jiri Olsa:
 
   - Speed up report for perf compiled with linbunwind.
 
 tools headers:
 
   Arnaldo Carvalho de Melo
 
   - Update memcpy_64.S, x86's kvm.h and pt_regs.h.
 
 arm64:
 
   Florian Fainelli:
 
   - Map Brahma-B53 CPUID to cortex-a53 events.
 
   - Add Cortex-A57 and Cortex-A72 events.
 
 csky:
 
   Mao Han:
 
   - Add DWARF register mappings for libdw, allowing --call-graph=dwarf to work
     on the C-SKY arch.
 
 x86:
 
   Andi Kleen/Kan Liang:
 
   - Add support for recording and printing XMM registers, available, for
     instance, on Icelake.
 
   Kan Liang:
 
   - Add uncore_upi (Intel's "Ultra Path Interconnect" events) JSON support.
     UPI replaced the Intel QuickPath Interconnect (QPI) in Xeon Skylake-SP.
 
 Intel PT:
 
   Adrian Hunter
 
   . Fix instructions sampling rate.
 
   . Timestamp fixes.
 
   . Improve exported-sql-viewer GUI, allowing, for instance, to copy'n'paste
     the trees, useful for e-mailing.
 
 Documentation:
 
   Thomas Richter:
 
   - Add description for 'perf --debug stderr=1', which redirects stderr to stdout.
 
 libtraceevent:
 
   Tzvetomir Stoyanov:
 
   - Add man pages for the various APIs.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXN8KEwAKCRCyPKLppCJ+
 JxbAAPsE4bDD7UyTJgVWVXFyqMkaFURv9eGtmhbH/EhnZ/ADfwEAoUG6C+G2DLre
 IMtgTXSlRShQucQbum5Rcp9lcXGL+gY=
 =mGtw
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-5.2-20190517' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf.data:

  Alexey Budankov:

  - Streaming compression of perf ring buffer into PERF_RECORD_COMPRESSED
    user space records, resulting in ~3-5x perf.data file size reduction
    on variety of tested workloads what saves storage space on larger
    server systems where perf.data size can easily reach several tens or
    even hundreds of GiBs, especially when profiling with DWARF-based
    stacks and tracing of context switches.

perf record:

  Arnaldo Carvalho de Melo

  - Improve -user-regs/intr-regs suggestions to overcome errors.

perf annotate:

  Jin Yao:

  - Remove hist__account_cycles() from callback, speeding up branch processing
    (perf record -b).

perf stat:

  - Add a 'percore' event qualifier, e.g.: -e cpu/event=0,umask=0x3,percore=1/,
    that sums up the event counts for both hardware threads in a core.

    We can already do this with --per-core, but it's often useful to do
    this together with other metrics that are collected per hardware thread.

    I.e. now its possible to do this per-event, and have it mixed with other
    events not aggregated by core.

core libraries:

  Donald Yandt:

  - Check for errors when doing fgets(/proc/version).

  Jiri Olsa:

  - Speed up report for perf compiled with linbunwind.

tools headers:

  Arnaldo Carvalho de Melo

  - Update memcpy_64.S, x86's kvm.h and pt_regs.h.

arm64:

  Florian Fainelli:

  - Map Brahma-B53 CPUID to cortex-a53 events.

  - Add Cortex-A57 and Cortex-A72 events.

csky:

  Mao Han:

  - Add DWARF register mappings for libdw, allowing --call-graph=dwarf to work
    on the C-SKY arch.

x86:

  Andi Kleen/Kan Liang:

  - Add support for recording and printing XMM registers, available, for
    instance, on Icelake.

  Kan Liang:

  - Add uncore_upi (Intel's "Ultra Path Interconnect" events) JSON support.
    UPI replaced the Intel QuickPath Interconnect (QPI) in Xeon Skylake-SP.

Intel PT:

  Adrian Hunter

  . Fix instructions sampling rate.

  . Timestamp fixes.

  . Improve exported-sql-viewer GUI, allowing, for instance, to copy'n'paste
    the trees, useful for e-mailing.

Documentation:

  Thomas Richter:

  - Add description for 'perf --debug stderr=1', which redirects stderr to stdout.

libtraceevent:

  Tzvetomir Stoyanov:

  - Add man pages for the various APIs.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-18 10:24:43 +02:00
Masahiro Yamada
3a48a91901 kbuild: check uniqueness of module names
In the recent build test of linux-next, Stephen saw a build error
caused by a broken .tmp_versions/*.mod file:

  https://lkml.org/lkml/2019/5/13/991

drivers/net/phy/asix.ko and drivers/net/usb/asix.ko have the same
basename, and there is a race in generating .tmp_versions/asix.mod

Kbuild has not checked this before, and it suddenly shows up with
obscure error messages when this kind of race occurs.

Non-unique module names cause various sort of problems, but it is
not trivial to catch them by eyes.

Hence, this script.

It checks not only real modules, but also built-in modules (i.e.
controlled by tristate CONFIG option, but currently compiled with =y).
Non-unique names for built-in modules also cause problems because
/sys/modules/ would fall over.

For the latest kernel, I tested "make allmodconfig all" (or more
quickly "make allyesconfig modules"), and it detected the following:

warning: same basename if the following are built as modules:
  drivers/regulator/88pm800.ko
  drivers/mfd/88pm800.ko
warning: same basename if the following are built as modules:
  drivers/gpu/drm/bridge/adv7511/adv7511.ko
  drivers/media/i2c/adv7511.ko
warning: same basename if the following are built as modules:
  drivers/net/phy/asix.ko
  drivers/net/usb/asix.ko
warning: same basename if the following are built as modules:
  fs/coda/coda.ko
  drivers/media/platform/coda/coda.ko
warning: same basename if the following are built as modules:
  drivers/net/phy/realtek.ko
  drivers/net/dsa/realtek.ko

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
2019-05-18 15:35:02 +09:00
Alexander Popov
aff11cd983 kconfig: Terminate menu blocks with a comment in the generated config
Currently menu blocks start with a pretty header but end with nothing in
the generated config. So next config options stick together with the
options from the menu block.

Let's terminate menu blocks in the generated config with a comment and
a newline if needed. Example:

...
CONFIG_BPF_STREAM_PARSER=y
CONFIG_NET_FLOW_LIMIT=y

#
# Network testing
#
CONFIG_NET_PKTGEN=y
CONFIG_NET_DROP_MONITOR=y
# end of Network testing
# end of Networking options

CONFIG_HAMRADIO=y
...

Signed-off-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-05-18 15:31:24 +09:00
Masahiro Yamada
233c741dcb kbuild: add LICENSES to KBUILD_ALLDIRS
For *-pkg targets, the LICENSES directory should be included in the
source tarball.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-05-18 11:49:57 +09:00
Masahiro Yamada
cdd750bfb1 kbuild: remove 'addtree' and 'flags' magic for header search paths
The 'addtree' and 'flags' in scripts/Kbuild.include are so compilecated
and ugly.

As I mentioned in [1], Kbuild should stop automatic prefixing of header
search path options.

I fixed up (almost) all Makefiles in the kernel. Now 'addtree' and
'flags' have been removed.

Kbuild still caters to add $(srctree)/$(src) and $(objtree)/$(obj)
to the header search path for O= building, but never touches extra
compiler options from ccflags-y etc.

[1]: https://patchwork.kernel.org/patch/9632347/

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-05-18 11:49:57 +09:00