kernel_optimize_test

Author	SHA1	Message	Date
Liu Shixin	10f32b8c9e	mm/page_alloc: fix counting of managed_pages [ Upstream commit f7ec104458e00d27a190348ac3a513f3df3699a4 ] commit `f63661566f` ("mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty") clears out zone->lowmem_reserve[] if zone is empty. But when zone is not empty and sysctl_lowmem_reserve_ratio[i] is set to zero, zone_managed_pages(zone) is not counted in the managed_pages either. This is inconsistent with the description of lowmem_reserve, so fix it. Link: https://lkml.kernel.org/r/20210527125707.3760259-1-liushixin2@huawei.com Fixes: `f63661566f` ("mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reported-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:14 +02:00
Lorenzo Stoakes	d7deea31ed	mm: page_alloc: refactor setup_per_zone_lowmem_reserve() [ Upstream commit 470c61d70299b1826f56ff5fede10786798e3c14 ] setup_per_zone_lowmem_reserve() iterates through each zone setting zone->lowmem_reserve[j] = 0 (where j is the zone's index) then iterates backwards through all preceding zones, setting lower_zone->lowmem_reserve[j] = sum(managed pages of higher zones) / lowmem_reserve_ratio[idx] for each (where idx is the lower zone's index). If the lower zone has no managed pages or its ratio is 0 then all of its lowmem_reserve[] entries are effectively zeroed. As these arrays are only assigned here and all lowmem_reserve[] entries for index < this zone's index are implicitly assumed to be 0 (as these are specifically output in show_free_areas() and zoneinfo_show_print() for example) there is no need to additionally zero index == this zone's index too. This patch avoids zeroing unnecessarily. Rather than iterating through zones and setting lowmem_reserve[j] for each lower zone this patch reverse the process and populates each zone's lowmem_reserve[] values in ascending order. This clarifies what is going on especially in the case of zero managed pages or ratio which is now explicitly shown to clear these values. Link: https://lkml.kernel.org/r/20201129162758.115907-1-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:14 +02:00
Waiman Long	5458985533	mm: memcg/slab: properly set up gfp flags for objcg pointer array [ Upstream commit 41eb5df1cbc9b302fc263ad7c9f38cfc38b4df61 ] Patch series "mm: memcg/slab: Fix objcg pointer array handling problem", v4. Since the merging of the new slab memory controller in v5.9, the page structure stores a pointer to objcg pointer array for slab pages. When the slab has no used objects, it can be freed in free_slab() which will call kfree() to free the objcg pointer array in memcg_alloc_page_obj_cgroups(). If it happens that the objcg pointer array is the last used object in its slab, that slab may then be freed which may caused kfree() to be called again. With the right workload, the slab cache may be set up in a way that allows the recursive kfree() calling loop to nest deep enough to cause a kernel stack overflow and panic the system. In fact, we have a reproducer that can cause kernel stack overflow on a s390 system involving kmalloc-rcl-256 and kmalloc-rcl-128 slabs with the following kfree() loop recursively called 74 times: [ 285.520739] [<000000000ec432fc>] kfree+0x4bc/0x560 [ 285.520740] [<000000000ec43466>] __free_slab+0xc6/0x228 [ 285.520741] [<000000000ec41fc2>] __slab_free+0x3c2/0x3e0 [ 285.520742] [<000000000ec432fc>] kfree+0x4bc/0x560 : While investigating this issue, I also found an issue on the allocation side. If the objcg pointer array happen to come from the same slab or a circular dependency linkage is formed with multiple slabs, those affected slabs can never be freed again. This patch series addresses these two issues by introducing a new set of kmalloc-cg-<n> caches split from kmalloc-<n> caches. The new set will only contain non-reclaimable and non-dma objects that are accounted in memory cgroups whereas the old set are now for unaccounted objects only. By making this split, all the objcg pointer arrays will come from the kmalloc-<n> caches, but those caches will never hold any objcg pointer array. As a result, deeply nested kfree() call and the unfreeable slab problems are now gone. This patch (of 4): Since the merging of the new slab memory controller in v5.9, the page structure may store a pointer to obj_cgroup pointer array for slab pages. Currently, only the __GFP_ACCOUNT bit is masked off. However, the array is not readily reclaimable and doesn't need to come from the DMA buffer. So those GFP bits should be masked off as well. Do the flag bit clearing at memcg_alloc_page_obj_cgroups() to make sure that it is consistently applied no matter where it is called. Link: https://lkml.kernel.org/r/20210505200610.13943-1-longman@redhat.com Link: https://lkml.kernel.org/r/20210505200610.13943-2-longman@redhat.com Fixes: `286e04b8ed` ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:14 +02:00
Miaohe Lin	8e4af3917b	mm/shmem: fix shmem_swapin() race with swapoff [ Upstream commit 2efa33fc7f6ec94a3a538c1a264273c889be2b36 ] When I was investigating the swap code, I found the below possible race window: CPU 1 CPU 2 ----- ----- shmem_swapin swap_cluster_readahead if (likely(si->flags & (SWP_BLKDEV \| SWP_FS_OPS))) { swapoff .. si->swap_file = NULL; .. struct inode *inode = si->swap_file->f_mapping->host;[oops!] Close this race window by using get/put_swap_device() to guard against concurrent swapoff. Link: https://lkml.kernel.org/r/20210426123316.806267-5-linmiaohe@huawei.com Fixes: `8fd2e0b505` ("mm: swap: check if swap backing device is congested or not") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alex Shi <alexs@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:14 +02:00
Miaohe Lin	a5dcdfe4cb	swap: fix do_swap_page() race with swapoff [ Upstream commit 2799e77529c2a25492a4395db93996e3dacd762d ] When I was investigating the swap code, I found the below possible race window: CPU 1 CPU 2 ----- ----- do_swap_page if (data_race(si->flags & SWP_SYNCHRONOUS_IO) swap_readpage if (data_race(sis->flags & SWP_FS_OPS)) { swapoff .. p->swap_file = NULL; .. struct file swap_file = sis->swap_file; struct address_space mapping = swap_file->f_mapping;[oops!] Note that for the pages that are swapped in through swap cache, this isn't an issue. Because the page is locked, and the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() can not proceed until the page has been unlocked. Fix this race by using get/put_swap_device() to guard against concurrent swapoff. Link: https://lkml.kernel.org/r/20210426123316.806267-3-linmiaohe@huawei.com Fixes: `0bcac06f27` ("mm,swap: skip swapcache for swapin of synchronous device") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Alex Shi <alexs@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:13 +02:00
Anshuman Khandual	29ae2c9c9c	mm/debug_vm_pgtable: ensure THP availability via has_transparent_hugepage() [ Upstream commit 65ac1a60a57e2c55f2ac37f27095f6b012295e81 ] On certain platforms, THP support could not just be validated via the build option CONFIG_TRANSPARENT_HUGEPAGE. Instead has_transparent_hugepage() also needs to be called upon to verify THP runtime support. Otherwise the debug test will just run into unusable THP helpers like in the case of a 4K hash config on powerpc platform [1]. This just moves all pfn_pmd() and pfn_pud() after THP runtime validation with has_transparent_hugepage() which prevents the mentioned problem. [1] https://bugzilla.kernel.org/show_bug.cgi?id=213069 Link: https://lkml.kernel.org/r/1621397588-19211-1-git-send-email-anshuman.khandual@arm.com Fixes: `787d563b86` ("mm/debug_vm_pgtable: fix kernel crash by checking for THP support") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:13 +02:00
Anshuman Khandual	7abf6e5763	mm/debug_vm_pgtable/basic: iterate over entire protection_map[] [ Upstream commit 2e326c07bbe1eabeece4047ab5972ef34b15679b ] Currently the basic tests just validate various page table transformations after starting with vm_get_page_prot(VM_READ\|VM_WRITE\|VM_EXEC) protection. Instead scan over the entire protection_map[] for better coverage. It also makes sure that all these basic page table tranformations checks hold true irrespective of the starting protection value for the page table entry. There is also a slight change in the debug print format for basic tests to capture the protection value it is being tested with. The modified output looks something like [pte_basic_tests ]: Validating PTE basic () [pte_basic_tests ]: Validating PTE basic (read) [pte_basic_tests ]: Validating PTE basic (write) [pte_basic_tests ]: Validating PTE basic (read\|write) [pte_basic_tests ]: Validating PTE basic (exec) [pte_basic_tests ]: Validating PTE basic (read\|exec) [pte_basic_tests ]: Validating PTE basic (write\|exec) [pte_basic_tests ]: Validating PTE basic (read\|write\|exec) [pte_basic_tests ]: Validating PTE basic (shared) [pte_basic_tests ]: Validating PTE basic (read\|shared) [pte_basic_tests ]: Validating PTE basic (write\|shared) [pte_basic_tests ]: Validating PTE basic (read\|write\|shared) [pte_basic_tests ]: Validating PTE basic (exec\|shared) [pte_basic_tests ]: Validating PTE basic (read\|exec\|shared) [pte_basic_tests ]: Validating PTE basic (write\|exec\|shared) [pte_basic_tests ]: Validating PTE basic (read\|write\|exec\|shared) This adds a missing argument 'struct mm_struct *' in pud_basic_tests() test . This never got exposed before as PUD based THP is available only on X86 platform where mm_pmd_folded(mm) call gets macro replaced without requiring the mm_struct i.e __is_defined(__PAGETABLE_PMD_FOLDED). Link: https://lkml.kernel.org/r/1611137241-26220-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Reviewed-by: Steven Price <steven.price@arm.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:13 +02:00
Anshuman Khandual	27634d63ca	mm/debug_vm_pgtable/basic: add validation for dirtiness after write protect [ Upstream commit bb5c47ced46797409f4791d0380db3116d93134c ] Patch series "mm/debug_vm_pgtable: Some minor updates", v3. This series contains some cleanups and new test suggestions from Catalin from an earlier discussion. https://lore.kernel.org/linux-mm/20201123142237.GF17833@gaia/ This patch (of 2): This adds validation tests for dirtiness after write protect conversion for each page table level. There are two new separate test types involved here. The first test ensures that a given page table entry does not become dirty after pxx_wrprotect(). This is important for platforms like arm64 which transfers and drops the hardware dirty bit (!PTE_RDONLY) to the software dirty bit while making it an write protected one. This test ensures that no fresh page table entry could be created with hardware dirty bit set. The second test ensures that a given page table entry always preserve the dirty information across pxx_wrprotect(). This adds two previously missing PUD level basic tests and while here fixes pxx_wrprotect() related typos in the documentation file. Link: https://lkml.kernel.org/r/1611137241-26220-1-git-send-email-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/1611137241-26220-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:13 +02:00
Jan Kara	c872674da7	dax: fix ENOMEM handling in grab_mapping_entry() [ Upstream commit 1a14e3779dd58c16b30e56558146e5cc850ba8b0 ] grab_mapping_entry() has a bug in handling of ENOMEM condition. Suppose we have a PMD entry at index i which we are downgrading to a PTE entry. grab_mapping_entry() will set pmd_downgrade to true, lock the entry, clear the entry in xarray, and decrement mapping->nrpages. The it will call: entry = dax_make_entry(pfn_to_pfn_t(0), flags); dax_lock_entry(xas, entry); which inserts new PTE entry into xarray. However this may fail allocating the new node. We handle this by: if (xas_nomem(xas, mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM)) goto retry; however pmd_downgrade stays set to true even though 'entry' returned from get_unlocked_entry() will be NULL now. And we will go again through the downgrade branch. This is mostly harmless except that mapping->nrpages is decremented again and we temporarily have an invalid entry stored in xarray. Fix the problem by setting pmd_downgrade to false each time we lookup the entry we work with so that it matches the entry we found. Link: https://lkml.kernel.org/r/20210622160015.18004-1-jack@suse.cz Fixes: `b15cd80068` ("dax: Convert page fault handlers to XArray") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:13 +02:00
Dan Carpenter	c015295b28	ocfs2: fix snprintf() checking [ Upstream commit 54e948c60cc843b6e84dc44496edc91f51d2a28e ] The snprintf() function returns the number of bytes which would have been printed if the buffer was large enough. In other words it can return ">= remain" but this code assumes it returns "== remain". The run time impact of this bug is not very severe. The next iteration through the loop would trigger a WARN() when we pass a negative limit to snprintf(). We would then return success instead of -E2BIG. The kernel implementation of snprintf() will never return negatives so there is no need to check and I have deleted that dead code. Link: https://lkml.kernel.org/r/20210511135350.GV1955@kadam Fixes: `a860f6eb4c` ("ocfs2: sysfile interfaces for online file check") Fixes: `74ae4e104d` ("ocfs2: Create stack glue sysfs files.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:13 +02:00
Ming Lei	512106ae23	blk-mq: update hctx->dispatch_busy in case of real scheduler [ Upstream commit cb9516be7708a2a18ec0a19fe3a225b5b3bc92c7 ] Commit `6e6fcbc27e` ("blk-mq: support batching dispatch in case of io") starts to support io batching submission by using hctx->dispatch_busy. However, blk_mq_update_dispatch_busy() isn't changed to update hctx->dispatch_busy in that commit, so fix the issue by updating hctx->dispatch_busy in case of real scheduler. Reported-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Fixes: `6e6fcbc27e` ("blk-mq: support batching dispatch in case of io") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210625020248.1630497-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:13 +02:00
Rafael J. Wysocki	3e33b1329c	cpufreq: Make cpufreq_online() call driver->offline() on errors [ Upstream commit 3b7180573c250eb6e2a7eec54ae91f27472332ea ] In the CPU removal path the ->offline() callback provided by the driver is always invoked before ->exit(), but in the cpufreq_online() error path it is not, so ->exit() is expected to somehow know the context in which it has been called and act accordingly. That is less than straightforward, so make cpufreq_online() invoke the driver's ->offline() callback, if present, on errors before ->exit() too. This only potentially affects intel_pstate. Fixes: `91a12e91dc` ("cpufreq: Allow light-weight tear down and bring up of CPUs") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:13 +02:00
Nathan Chancellor	cc0b1776fd	ACPI: bgrt: Fix CFI violation [ Upstream commit f37ccf8fce155d08ae2a4fb3db677911ced0c21a ] clang's Control Flow Integrity requires that every indirect call has a valid target, which is based on the type of the function pointer. The _show() functions in this file are written as if they will be called from dev_attr_show(); however, they will be called from sysfs_kf_seq_show() because the files were created by sysfs_create_group() and the sysfs ops are based on kobj_sysfs_ops because of kobject_add_and_create(). Because the _show() functions do not match the type of the show() member in struct kobj_attribute, there is a CFI violation. $ cat /sys/firmware/acpi/bgrt/{status,type,version,{x,y}offset}} 1 0 1 522 307 $ dmesg \| grep "CFI failure" [ 267.761825] CFI failure (target: type_show.d5e1ad21498a5fd14edbc5c320906598.cfi_jt+0x0/0x8): [ 267.762246] CFI failure (target: xoffset_show.d5e1ad21498a5fd14edbc5c320906598.cfi_jt+0x0/0x8): [ 267.762584] CFI failure (target: status_show.d5e1ad21498a5fd14edbc5c320906598.cfi_jt+0x0/0x8): [ 267.762973] CFI failure (target: yoffset_show.d5e1ad21498a5fd14edbc5c320906598.cfi_jt+0x0/0x8): [ 267.763330] CFI failure (target: version_show.d5e1ad21498a5fd14edbc5c320906598.cfi_jt+0x0/0x8): Convert these functions to the type of the show() member in struct kobj_attribute so that there is no more CFI violation. Because these functions are all so similar, combine them into a macro. Fixes: `d1ff4b1cdb` ("ACPI: Add support for exposing BGRT data") Link: https://github.com/ClangBuiltLinux/linux/issues/1406 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:12 +02:00
Dwaipayan Ray	3cbe01ac28	ACPI: Use DEVICE_ATTR_<RW\|RO\|WO> macros [ Upstream commit 0f39ee8324e75c9d370e84a61323ceb194641a18 ] Instead of open coding DEVICE_ATTR(), use the DEVICE_ATTR_RW(), DEVICE_ATTR_RO() and DEVICE_ATTR_WO() macros wherever possible. This required a few functions to be renamed but the functionality itself is unchanged. Signed-off-by: Dwaipayan Ray <dwaipayanray1@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:12 +02:00
Zhang Yi	d3dd2fe274	blk-wbt: make sure throttle is enabled properly [ Upstream commit 76a8040817b4b9c69b53f9b326987fa891b4082a ] After commit `a79050434b` ("blk-rq-qos: refactor out common elements of blk-wbt"), if throttle was disabled by wbt_disable_default(), we could not enable again, fix this by set enable_state back to WBT_STATE_ON_DEFAULT. Fixes: `a79050434b` ("blk-rq-qos: refactor out common elements of blk-wbt") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210619093700.920393-3-yi.zhang@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:12 +02:00
Zhang Yi	1c2f21a8a0	blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled() [ Upstream commit 1d0903d61e9645c6330b94247b96dd873dfc11c8 ] Now that we disable wbt by simply zero out rwb->wb_normal in wbt_disable_default() when switch elevator to bfq, but it's not safe because it will become false positive if we change queue depth. If it become false positive between wbt_wait() and wbt_track() when submit write request, it will lead to drop rqw->inflight to -1 in wbt_done(), which will end up trigger IO hung. Fix this issue by introduce a new state which mean the wbt was disabled. Fixes: `a79050434b` ("blk-rq-qos: refactor out common elements of blk-wbt") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210619093700.920393-2-yi.zhang@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:12 +02:00
Xiaofei Tan	e0afab5181	ACPI: APEI: fix synchronous external aborts in user-mode [ Upstream commit ccb5ecdc2ddeaff744ee075b54cdff8a689e8fa7 ] Before commit `8fcc4ae6fa` ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work"), do_sea() would unconditionally signal the affected task from the arch code. Since that change, the GHES driver sends the signals. This exposes a problem as errors the GHES driver doesn't understand or doesn't handle effectively are silently ignored. It will cause the errors get taken again, and circulate endlessly. User-space task get stuck in this loop. Existing firmware on Kunpeng9xx systems reports cache errors with the 'ARM Processor Error' CPER records. Do memory failure handling for ARM Processor Error Section just like for Memory Error Section. Fixes: `8fcc4ae6fa` ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work") Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:12 +02:00
Matti Vaittinen	f626452df8	extcon: extcon-max8997: Fix IRQ freeing at error path [ Upstream commit 610bdc04830a864115e6928fc944f1171dfff6f3 ] If reading MAX8997_MUIC_REG_STATUS1 fails at probe the driver exits without freeing the requested IRQs. Free the IRQs prior returning if reading the status fails. Fixes: `3e34c81989` ("extcon: max8997: Avoid forcing UART path on drive probe") Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Link: https://lore.kernel.org/r/27ee4a48ee775c3f8c9d90459c18b6f2b15edc76.1623146580.git.matti.vaittinen@fi.rohmeurope.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:12 +02:00
Tony Lindgren	45b399e309	clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFG [ Upstream commit 9517c577f9f722270584cfb1a7b4e1354e408658 ] As we are using cpu_pm to save and restore context, we must also save and restore the timer sysconfig register TIOCP_CFG. This is needed because we are not calling PM runtime functions at all with cpu_pm. Fixes: `b34677b099` ("clocksource/drivers/timer-ti-dm: Implement cpu_pm notifier for context save and restore") Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: Adam Ford <aford173@gmail.com> Cc: Andreas Kemnade <andreas@kemnade.info> Cc: Lokesh Vutla <lokeshvutla@ti.com> Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210415085506.56828-1-tony@atomide.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:12 +02:00
Christoph Hellwig	0317b728d8	mark pstore-blk as broken [ Upstream commit d07f3b081ee632268786601f55e1334d1f68b997 ] pstore-blk just pokes directly into the pagecache for the block device without going through the file operations for that by faking up it's own file operations that do not match the block device ones. As this breaks the control of the block layer of it's page cache, and even now just works by accident only the best thing is to just disable this driver. Fixes: `17639f67c1` ("pstore/blk: Introduce backend for block devices") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210608161327.1537919-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:12 +02:00
Krzysztof Wilczyński	296fbe2608	ACPI: sysfs: Fix a buffer overrun problem with description_show() [ Upstream commit 888be6067b97132c3992866bbcf647572253ab3f ] Currently, a device description can be obtained using ACPI, if the _STR method exists for a particular device, and then exposed to the userspace via a sysfs object as a string value. If the _STR method is available for a given device then the data (usually a Unicode string) is read and stored in a buffer (of the ACPI_TYPE_BUFFER type) with a pointer to said buffer cached in the struct acpi_device_pnp for later access. The description_show() function is responsible for exposing the device description to the userspace via a corresponding sysfs object and internally calls the utf16s_to_utf8s() function with a pointer to the buffer that contains the Unicode string so that it can be converted from UTF16 encoding to UTF8 and thus allowing for the value to be safely stored and later displayed. When invoking the utf16s_to_utf8s() function, the description_show() function also sets a limit of the data that can be saved into a provided buffer as a result of the character conversion to be a total of PAGE_SIZE, and upon completion, the utf16s_to_utf8s() function returns an integer value denoting the number of bytes that have been written into the provided buffer. Following the execution of the utf16s_to_utf8s() a newline character will be added at the end of the resulting buffer so that when the value is read in the userspace through the sysfs object then it would include newline making it more accessible when working with the sysfs file system in the shell, etc. Normally, this wouldn't be a problem, but if the function utf16s_to_utf8s() happens to return the number of bytes written to be precisely PAGE_SIZE, then we would overrun the buffer and write the newline character outside the allotted space which can have undefined consequences or result in a failure. To fix this buffer overrun, ensure that there always is enough space left for the newline character to be safely appended. Fixes: `d1efe3c324` ("ACPI: Add new sysfs interface to export device description") Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:11 +02:00
Mario Limonciello	ce47ae8961	nvme-pci: look for StorageD3Enable on companion ACPI device instead [ Upstream commit e21e0243e7b0f1c2a21d21f4d115f7b37175772a ] The documentation around the StorageD3Enable property hints that it should be made on the PCI device. This is where newer AMD systems set the property and it's required for S0i3 support. So rather than look for nodes of the root port only present on Intel systems, switch to the companion ACPI device for all systems. David Box from Intel indicated this should work on Intel as well. Link: https://lore.kernel.org/linux-nvme/YK6gmAWqaRmvpJXb@google.com/T/#m900552229fa455867ee29c33b854845fce80ba70 Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Fixes: `df4f9bc4fb` ("nvme-pci: add support for ACPI StorageD3Enable property") Suggested-by: Liang Prike <Prike.Liang@amd.com> Acked-by: Raul E Rangel <rrangel@chromium.org> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:11 +02:00
Ming Lei	3ffe41f25f	block: avoid double io accounting for flush request [ Upstream commit 84da7acc3ba53af26f15c4b0ada446127b7a7836 ] For flush request, rq->end_io() may be called two times, one is from timeout handling(blk_mq_check_expired()), another is from normal completion(__blk_mq_end_request()). Move blk_account_io_flush() after flush_rq->ref drops to zero, so io accounting can be done just once for flush request. Fixes: `b686631865` ("block: add iostat counters for flush requests") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: John Garry <john.garry@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210511152236.763464-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:11 +02:00
Rafael J. Wysocki	17e77feadd	ACPI: PM / fan: Put fan device IDs into separate header file [ Upstream commit b9370dceabb7841c5e65ce4ee4405b9db5231fc4 ] The ACPI fan device IDs are shared between the fan driver and the device power management code. The former is modular, so it needs to include the table of device IDs for module autoloading and the latter needs that list to avoid attaching the generic ACPI PM domain to fan devices (which doesn't make sense) possibly before the fan driver module is loaded. Unfortunately, that requires the list of fan device IDs to be updated in two places which is prone to mistakes, so put it into a symbol definition in a separate header file so there is only one copy of it in case it needs to be updated again in the future. Fixes: `b9ea0bae26` ("ACPI: PM: Avoid attaching ACPI PM domain to certain devices") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:11 +02:00
YueHaibing	4dcb59d6a2	PM / devfreq: Add missing error code in devfreq_add_device() [ Upstream commit 18b380ed61f892ed06838d1f1a5124d966292ed3 ] Set err code in the error path before jumping to the end of the function. Fixes: `4dc3bab868` ("PM / devfreq: Add support delayed timer for polling mode") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:11 +02:00
Philipp Zabel	a61f8a2e45	media: video-mux: Skip dangling endpoints [ Upstream commit 95778c2d0979618e3349b1d2324ec282a5a6adbf ] i.MX6 device tree include files contain dangling endpoints for the board device tree writers' convenience. These are still included in many existing device trees. Treat dangling endpoints as non-existent to support them. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: `612b385efb` ("media: video-mux: Create media links in bound notifier") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:11 +02:00
Ezequiel Garcia	62c666805a	media: v4l2-async: Clean v4l2_async_notifier_add_fwnode_remote_subdev [ Upstream commit c1cf3d896d124e3e00794f9bfbde49f0fc279e3f ] Change v4l2_async_notifier_add_fwnode_remote_subdev semantics so it allocates the struct v4l2_async_subdev pointer. This makes the API consistent: the v4l2-async subdevice addition functions have now a unified usage model. This model is simpler, as it makes v4l2-async responsible for the allocation and release of the subdevice descriptor, and no longer something the driver has to worry about. On the user side, the change makes the API simpler for the drivers to use and less error-prone. Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Helen Koike <helen.koike@collabora.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:11 +02:00
Zhaoyang Huang	6bfcb61789	psi: Fix race between psi_trigger_create/destroy [ Upstream commit 8f91efd870ea5d8bc10b0fcc9740db51cd4c0c83 ] Race detected between psi_trigger_destroy/create as shown below, which cause panic by accessing invalid psi_system->poll_wait->wait_queue_entry and psi_system->poll_timer->entry->next. Under this modification, the race window is removed by initialising poll_wait and poll_timer in group_init which are executed only once at beginning. psi_trigger_destroy() psi_trigger_create() mutex_lock(trigger_lock); rcu_assign_pointer(poll_task, NULL); mutex_unlock(trigger_lock); mutex_lock(trigger_lock); if (!rcu_access_pointer(group->poll_task)) { timer_setup(poll_timer, poll_timer_fn, 0); rcu_assign_pointer(poll_task, task); } mutex_unlock(trigger_lock); synchronize_rcu(); del_timer_sync(poll_timer); <-- poll_timer has been reinitialized by psi_trigger_create() So, trigger_lock/RCU correctly protects destruction of group->poll_task but misses this race affecting poll_timer and poll_wait. Fixes: `461daba06b` ("psi: eliminate kthread_worker from psi trigger scheduling mechanism") Co-developed-by: ziwei.dai <ziwei.dai@unisoc.com> Signed-off-by: ziwei.dai <ziwei.dai@unisoc.com> Co-developed-by: ke.wang <ke.wang@unisoc.com> Signed-off-by: ke.wang <ke.wang@unisoc.com> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/1623371374-15664-1-git-send-email-huangzhaoyang@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:10 +02:00
Herbert Xu	8d7debe744	crypto: nx - Fix RCU warning in nx842_OF_upd_status [ Upstream commit 2a96726bd0ccde4f12b9b9a9f61f7b1ac5af7e10 ] The function nx842_OF_upd_status triggers a sparse RCU warning when it directly dereferences the RCU-protected devdata. This appears to be an accident as there was another variable of the same name that was passed in from the caller. After it was removed (because the main purpose of using it, to update the status member was itself removed) the global variable unintenionally stood in as its replacement. This patch restores the devdata parameter. Fixes: `90fd73f912` ("crypto: nx - remove pSeries NX 'status' field") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:10 +02:00
Mirko Vogt	c43082d284	spi: spi-sun6i: Fix chipselect/clock bug [ Upstream commit 0d7993b234c9fad8cb6bec6adfaa74694ba85ecb ] The current sun6i SPI implementation initializes the transfer too early, resulting in SCK going high before the transfer. When using an additional (gpio) chipselect with sun6i, the chipselect is asserted at a time when clock is high, making the SPI transfer fail. This is due to SUN6I_GBL_CTL_BUS_ENABLE being written into SUN6I_GBL_CTL_REG at an early stage. Moving that to the transfer function, hence, right before the transfer starts, mitigates that problem. Fixes: `3558fe900e` (spi: sunxi: Add Allwinner A31 SPI controller driver) Signed-off-by: Mirko Vogt <mirko-dev\|linux@nanl.de> Signed-off-by: Ralf Schlatterbeck <rsc@runtux.com> Link: https://lore.kernel.org/r/20210614144507.y3udezjfbko7eavv@runtux.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:10 +02:00
Peter Zijlstra	f18f7a2276	lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING [ Upstream commit c0c2c0dad6a06e0c05e9a52d65f932bd54364c97 ] When PROVE_RAW_LOCK_NESTING=y many of the selftests FAILED because HARDIRQ context is out-of-bounds for spinlocks. Instead make the default hardware context the threaded hardirq context, which preserves the old locking rules. The wait-type specific locking selftests will have a non-threaded HARDIRQ variant. Fixes: `de8f5e4f2d` ("lockdep: Introduce wait-type checks") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20210617190313.322096283@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:10 +02:00
Peter Zijlstra	fca9e784a3	lockdep: Fix wait-type for empty stack [ Upstream commit f8b298cc39f0619544c607eaef09fd0b2afd10f3 ] Even the very first lock can violate the wait-context check, consider the various IRQ contexts. Fixes: `de8f5e4f2d` ("lockdep: Introduce wait-type checks") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/r/20210617190313.256987481@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:10 +02:00
Qais Yousef	ca47a4fa89	sched/uclamp: Fix uclamp_tg_restrict() [ Upstream commit 0213b7083e81f4acd69db32cb72eb4e5f220329a ] Now cpu.uclamp.min acts as a protection, we need to make sure that the uclamp request of the task is within the allowed range of the cgroup, that is it is clamp()'ed correctly by tg->uclamp[UCLAMP_MIN] and tg->uclamp[UCLAMP_MAX]. As reported by Xuewen [1] we can have some corner cases where there's inversion between uclamp requested by task (p) and the uclamp values of the taskgroup it's attached to (tg). Following table demonstrates 2 corner cases: \| p \| tg \| effective -----------+-----+------+----------- CASE 1 -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 60% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- CASE 2 -----------+-----+------+----------- uclamp_min \| 0% \| 30% \| 30% -----------+-----+------+----------- uclamp_max \| 20% \| 50% \| 20% -----------+-----+------+----------- With this fix we get: \| p \| tg \| effective -----------+-----+------+----------- CASE 1 -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 50% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- CASE 2 -----------+-----+------+----------- uclamp_min \| 0% \| 30% \| 30% -----------+-----+------+----------- uclamp_max \| 20% \| 50% \| 30% -----------+-----+------+----------- Additionally uclamp_update_active_tasks() must now unconditionally update both UCLAMP_MIN/MAX because changing the tg's UCLAMP_MAX for instance could have an impact on the effective UCLAMP_MIN of the tasks. \| p \| tg \| effective -----------+-----+------+----------- old -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 50% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- new -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 60% -----------+-----+------+----------- uclamp_max \| 80% \|70% \| 70% -----------+-----+------+----------- [1] https://lore.kernel.org/lkml/CAB8ipk_a6VFNjiEnHRHkUMBKbA+qzPQvhtNjJ_YNzQhqV_o8Zw@mail.gmail.com/ Fixes: 0c18f2ecfcc2 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min") Reported-by: Xuewen Yan <xuewen.yan94@gmail.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210617165155.3774110-1-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:10 +02:00
Vincent Donnefort	aea030cefc	sched/rt: Fix Deadline utilization tracking during policy change [ Upstream commit d7d607096ae6d378b4e92d49946d22739c047d4c ] DL keeps track of the utilization on a per-rq basis with the structure avg_dl. This utilization is updated during task_tick_dl(), put_prev_task_dl() and set_next_task_dl(). However, when the current running task changes its policy, set_next_task_dl() which would usually take care of updating the utilization when the rq starts running DL tasks, will not see a such change, leaving the avg_dl structure outdated. When that very same task will be dequeued later, put_prev_task_dl() will then update the utilization, based on a wrong last_update_time, leading to a huge spike in the DL utilization signal. The signal would eventually recover from this issue after few ms. Even if no DL tasks are run, avg_dl is also updated in __update_blocked_others(). But as the CPU capacity depends partly on the avg_dl, this issue has nonetheless a significant impact on the scheduler. Fix this issue by ensuring a load update when a running task changes its policy to DL. Fixes: `3727e0e` ("sched/dl: Add dl_rq utilization tracking") Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/1624271872-211872-3-git-send-email-vincent.donnefort@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:09 +02:00
Vincent Donnefort	c576472a05	sched/rt: Fix RT utilization tracking during policy change [ Upstream commit fecfcbc288e9f4923f40fd23ca78a6acdc7fdf6c ] RT keeps track of the utilization on a per-rq basis with the structure avg_rt. This utilization is updated during task_tick_rt(), put_prev_task_rt() and set_next_task_rt(). However, when the current running task changes its policy, set_next_task_rt() which would usually take care of updating the utilization when the rq starts running RT tasks, will not see a such change, leaving the avg_rt structure outdated. When that very same task will be dequeued later, put_prev_task_rt() will then update the utilization, based on a wrong last_update_time, leading to a huge spike in the RT utilization signal. The signal would eventually recover from this issue after few ms. Even if no RT tasks are run, avg_rt is also updated in __update_blocked_others(). But as the CPU capacity depends partly on the avg_rt, this issue has nonetheless a significant impact on the scheduler. Fix this issue by ensuring a load update when a running task changes its policy to RT. Fixes: `371bf427` ("sched/rt: Add rt_rq utilization tracking") Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/1624271872-211872-2-git-send-email-vincent.donnefort@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:09 +02:00
Joerg Roedel	67f66d48bd	x86/sev: Split up runtime #VC handler for correct state tracking [ Upstream commit be1a5408868af341f61f93c191b5e346ee88c82a ] Split up the #VC handler code into a from-user and a from-kernel part. This allows clean and correct state tracking, as the #VC handler needs to enter NMI-state when raised from kernel mode and plain IRQ state when raised from user-mode. Fixes: 62441a1fb532 ("x86/sev-es: Correctly track IRQ states in runtime #VC handler") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210618115409.22735-3-joro@8bytes.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:09 +02:00
Joerg Roedel	2e1003f3ee	x86/sev: Make sure IRQs are disabled while GHCB is active [ Upstream commit d187f217335dba2b49fc9002aab2004e04acddee ] The #VC handler only cares about IRQs being disabled while the GHCB is active, as it must not be interrupted by something which could cause another #VC while it holds the GHCB (NMI is the exception for which the backup GHCB exits). Make sure nothing interrupts the code path while the GHCB is active by making sure that callers of __sev_{get,put}_ghcb() have disabled interrupts upfront. [ bp: Massage commit message. ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210618115409.22735-2-joro@8bytes.org Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:09 +02:00
David Sterba	eefebcda89	btrfs: clear log tree recovering status if starting transaction fails [ Upstream commit 1aeb6b563aea18cd55c73cf666d1d3245a00f08c ] When a log recovery is in progress, lots of operations have to take that into account, so we keep this status per tree during the operation. Long time ago error handling revamp patch `79787eaab4` ("btrfs: replace many BUG_ONs with proper error handling") removed clearing of the status in an error branch. Add it back as was intended in `e02119d5a7` ("Btrfs: Add a write ahead tree log to optimize synchronous operations"). There are probably no visible effects, log replay is done only during mount and if it fails all structures are cleared so the stale status won't be kept. Fixes: `79787eaab4` ("btrfs: replace many BUG_ONs with proper error handling") Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:09 +02:00
Axel Lin	aec3a574c6	regulator: hi655x: Fix pass wrong pointer to config.driver_data [ Upstream commit 61eb1b24f9e4f4e0725aa5f8164a932c933f3339 ] Current code sets config.driver_data to a zero initialized regulator which is obviously wrong. Fix it. Fixes: `4618119b9b` ("regulator: hi655x: enable regulator for hi655x PMIC") Signed-off-by: Axel Lin <axel.lin@ingics.com> Link: https://lore.kernel.org/r/20210620132715.60215-1-axel.lin@ingics.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:09 +02:00
Alexandru Elisei	96275c8f6c	KVM: arm64: Don't zero the cycle count register when PMCR_EL0.P is set [ Upstream commit 2a71fabf6a1bc9162a84e18d6ab991230ca4d588 ] According to ARM DDI 0487G.a, page D13-3895, setting the PMCR_EL0.P bit to 1 has the following effect: "Reset all event counters accessible in the current Exception level, not including PMCCNTR_EL0, to zero." Similar behaviour is described for AArch32 on page G8-7022. Make it so. Fixes: `c01d6a1802` ("KVM: arm64: pmu: Only handle supported event counters") Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210618105139.83795-1-alexandru.elisei@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:08 +02:00
Tuan Phan	e5154bf217	perf/arm-cmn: Fix invalid pointer when access dtc object sharing the same IRQ number [ Upstream commit 4e16f283edc289820e9b2d6f617ed8e514ee8396 ] When multiple dtcs share the same IRQ number, the irq_friend which used to refer to dtc object gets calculated incorrect which leads to invalid pointer. Fixes: `0ba64770a2` ("perf: Add Arm CMN-600 PMU driver") Signed-off-by: Tuan Phan <tuanphan@os.amperecomputing.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/1623946129-3290-1-git-send-email-tuanphan@os.amperecomputing.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:08 +02:00
Kai Huang	31dcfec19d	KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level() [ Upstream commit 57a3e96d6d17ae5ac9861ef34af024a627f1c3bb ] Currently tdp_mmu_map_handle_target_level() returns 0, which is RET_PF_RETRY, when page fault is actually fixed. This makes kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of RET_PF_FIXED. Fix by initializing ret to RET_PF_FIXED. Note that kvm_mmu_page_fault() resumes guest on both RET_PF_RETRY and RET_PF_FIXED, which means in practice returning the two won't make difference, so this fix alone won't be necessary for stable tree. Fixes: `bb18842e21` ("kvm: x86/mmu: Add TDP MMU PF handler") Reviewed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <f9e8956223a586cd28c090879a8ff40f5eb6d609.1623717884.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:08 +02:00
Sean Christopherson	64d31137b1	KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch [ Upstream commit 272b0a998d084e7667284bdd2d0c675c6a2d11de ] Drop bogus logic that incorrectly clobbers the accessed/dirty enabling status of the nested MMU on an EPTP switch. When nested EPT is enabled, walk_mmu points at L2's _legacy_ page tables, not L1's EPT for L2. This is likely a benign bug, as mmu->ept_ad is never consumed (since the MMU is not a nested EPT MMU), and stuffing mmu_role.base.ad_disabled will never propagate into future shadow pages since the nested MMU isn't used to map anything, just to walk L2's page tables. Note, KVM also does a full MMU reload, i.e. the guest_mmu will be recreated using the new EPTP, and thus any change in A/D enabling will be properly recognized in the relevant MMU. Fixes: `41ab937274` ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:08 +02:00
Sean Christopherson	bac38bd7c4	KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap [ Upstream commit 0e75225dfa4c5d5d51291f54a3d2d5895bad38da ] Use BIT_ULL() instead of an open-coded shift to check whether or not a function is enabled in L1's VMFUNC bitmap. This is a benign bug as KVM supports only bit 0, and will fail VM-Enter if any other bits are set, i.e. bits 63:32 are guaranteed to be zero. Note, "function" is bounded by hardware as VMFUNC will #UD before taking a VM-Exit if the function is greater than 63. Before: if ((vmcs12->vm_function_control & (1 << function)) == 0) 0x000000000001a916 <+118>: mov $0x1,%eax 0x000000000001a91b <+123>: shl %cl,%eax 0x000000000001a91d <+125>: cltq 0x000000000001a91f <+127>: and 0x128(%rbx),%rax After: if (!(vmcs12->vm_function_control & BIT_ULL(function & 63))) 0x000000000001a955 <+117>: mov 0x128(%rbx),%rdx 0x000000000001a95c <+124>: bt %rax,%rdx Fixes: `27c42a1bb8` ("KVM: nVMX: Enable VMFUNC for the L1 hypervisor") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:08 +02:00
Sean Christopherson	b2c5af71ce	KVM: nVMX: Sync all PGDs on nested transition with shadow paging [ Upstream commit 07ffaf343e34b555c9e7ea39a9c81c439a706f13 ] Trigger a full TLB flush on behalf of the guest on nested VM-Enter and VM-Exit when VPID is disabled for L2. kvm_mmu_new_pgd() syncs only the current PGD, which can theoretically leave stale, unsync'd entries in a previous guest PGD, which could be consumed if L2 is allowed to load CR3 with PCID_NOFLUSH=1. Rename KVM_REQ_HV_TLB_FLUSH to KVM_REQ_TLB_FLUSH_GUEST so that it can be utilized for its obvious purpose of emulating a guest TLB flush. Note, there is no change the actual TLB flush executed by KVM, even though the fast PGD switch uses KVM_REQ_TLB_FLUSH_CURRENT. When VPID is disabled for L2, vpid02 is guaranteed to be '0', and thus nested_get_vpid02() will return the VPID that is shared by L1 and L2. Generate the request outside of kvm_mmu_new_pgd(), as getting the common helper to correctly identify which requested is needed is quite painful. E.g. using KVM_REQ_TLB_FLUSH_GUEST when nested EPT is in play is wrong as a TLB flush from the L1 kernel's perspective does not invalidate EPT mappings. And, by using KVM_REQ_TLB_FLUSH_GUEST, nVMX can do future simplification by moving the logic into nested_vmx_transition_tlb_flush(). Fixes: `41fab65e7c` ("KVM: nVMX: Skip MMU sync on nested VMX transition when possible") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:08 +02:00
Guenter Roeck	5ac406b81c	hwmon: (max31790) Fix fan speed reporting for fan7..12 [ Upstream commit cbbf244f0515af3472084f22b6213121b4a63835 ] Fans 7..12 do not have their own set of configuration registers. So far the code ignored that and read beyond the end of the configuration register range to get the tachometer period. This resulted in more or less random fan speed values for those fans. The datasheet is quite vague when it comes to defining the tachometer period for fans 7..12. Experiments confirm that the period is the same for both fans associated with a given set of configuration registers. Fixes: `54187ff9d7` ("hwmon: (max31790) Convert to use new hwmon registration API") Fixes: `195a4b4298` ("hwmon: Driver for Maxim MAX31790") Cc: Jan Kundrát <jan.kundrat@cesnet.cz> Reviewed-by: Jan Kundrát <jan.kundrat@cesnet.cz> Cc: Václav Kubernát <kubernat@cesnet.cz> Reviewed-by: Jan Kundrát <jan.kundrat@cesnet.cz> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20210526154022.3223012-2-linux@roeck-us.net Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:08 +02:00
Guenter Roeck	e02d52b7e9	hwmon: (max31722) Remove non-standard ACPI device IDs [ Upstream commit 97387c2f06bcfd79d04a848d35517b32ee6dca7c ] Valid Maxim Integrated ACPI device IDs would start with MXIM, not with MAX1. On top of that, ACPI device IDs reflecting chip names are almost always invalid. Remove the invalid ACPI IDs. Fixes: `04e1e70afe` ("hwmon: (max31722) Add support for MAX31722/MAX31723 temperature sensors") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:07 +02:00
Guenter Roeck	5c00e99497	hwmon: (lm70) Revert "hwmon: (lm70) Add support for ACPI" [ Upstream commit ac61c8aae446b9c0fe18981fe721d4a43e283ad6 ] This reverts commit `b58bd4c6df`. None of the ACPI IDs introduced with the reverted patch is a valid ACPI device ID. Any ACPI users of this driver are advised to use PRP0001 and a devicetree-compatible device identification. Fixes: `b58bd4c6df` ("hwmon: (lm70) Add support for ACPI") Cc: Andrej Picej <andpicej@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:07 +02:00
Stephen Boyd	5cfc66b454	hwmon: (lm70) Use device_get_match_data() [ Upstream commit 6e09d75513d2670b7ab91ab3584fc5bcf2675a75 ] Use the more modern API to get the match data out of the of match table. This saves some code, lines, and nicely avoids referencing the match table when it is undefined with configurations where CONFIG_OF=n. Signed-off-by: Stephen Boyd <swboyd@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jean Delvare <jdelvare@suse.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: <linux-hwmon@vger.kernel.org> [robh: rework to use device_get_match_data()] Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:07 +02:00
Dillon Min	c9f8416e43	media: s5p-g2d: Fix a memory leak on ctx->fh.m2m_ctx [ Upstream commit 5d11e6aad1811ea293ee2996cec9124f7fccb661 ] The m2m_ctx resources was allocated by v4l2_m2m_ctx_init() in g2d_open() should be freed from g2d_release() when it's not used. Fix it Fixes: `918847341a` ("[media] v4l: add G2D driver for s5p device family") Signed-off-by: Dillon Min <dillon.minfei@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-07-14 16:56:07 +02:00

1 2 3 4 5 ...

975281 Commits