kernel_optimize_test

History

Christoph Lameter 7cc36bbddd vmstat: on-demand vmstat workers V8 vmstat workers are used for folding counter differentials into the zone, per node and global counters at certain time intervals. They currently run at defined intervals on all processors which will cause some holdoff for processors that need minimal intrusion by the OS. The current vmstat_update mechanism depends on a deferrable timer firing every other second by default which registers a work queue item that runs on the local CPU, with the result that we have 1 interrupt and one additional schedulable task on each CPU every 2 seconds If a workload indeed causes VM activity or multiple tasks are running on a CPU, then there are probably bigger issues to deal with. However, some workloads dedicate a CPU for a single CPU bound task. This is done in high performance computing, in high frequency financial applications, in networking (Intel DPDK, EZchip NPS) and with the advent of systems with more and more CPUs over time, this may become more and more common to do since when one has enough CPUs one cares less about efficiently sharing a CPU with other tasks and more about efficiently monopolizing a CPU per task. The difference of having this timer firing and workqueue kernel thread scheduled per second can be enormous. An artificial test measuring the worst case time to do a simple "i++" in an endless loop on a bare metal system and under Linux on an isolated CPU with dynticks and with and without this patch, have Linux match the bare metal performance (~700 cycles) with this patch and loose by couple of orders of magnitude (~200k cycles) without it[*]. The loss occurs for something that just calculates statistics. For networking applications, for example, this could be the difference between dropping packets or sustaining line rate. Statistics are important and useful, but it would be great if there would be a way to not cause statistics gathering produce a huge performance difference. This patche does just that. This patch creates a vmstat shepherd worker that monitors the per cpu differentials on all processors. If there are differentials on a processor then a vmstat worker local to the processors with the differentials is created. That worker will then start folding the diffs in regular intervals. Should the worker find that there is no work to be done then it will make the shepherd worker monitor the differentials again. With this patch it is possible then to have periods longer than 2 seconds without any OS event on a "cpu" (hardware thread). The patch shows a very minor increased in system performance. hackbench -s 512 -l 2000 -g 15 -f 25 -P Results before the patch: Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 4.992 Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 4.971 Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 5.063 Hackbench after the patch: Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 4.973 Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 4.990 Running in process mode with 15 groups using 50 file descriptors each (== 750 tasks) Each sender will pass 2000 messages of 512 bytes Time: 4.993 [fengguang.wu@intel.com: cpu_stat_off can be static] Signed-off-by: Christoph Lameter <cl@linux.com> Reviewed-by: Gilad Ben-Yossef <gilad@benyossef.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Max Krasnyansky <maxk@qti.qualcomm.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2014-10-09 22:26:02 -04:00
..
backing-dev.c	mm: clean up zone flags	2014-10-09 22:25:57 -04:00
balloon_compaction.c	mm/balloon_compaction: add vmstat counters and kpageflags bit	2014-10-09 22:26:01 -04:00
bootmem.c	mm/bootmem.c: use include/linux/ headers	2014-10-09 22:26:00 -04:00
cleancache.c
cma.c	mm: cma: adjust address limit to avoid hitting low/high memory boundary	2014-10-09 22:25:53 -04:00
compaction.c	mm/balloon_compaction: redesign ballooned pages management	2014-10-09 22:26:01 -04:00
debug-pagealloc.c
debug.c	mm/debug.c: use pr_emerg()	2014-10-09 22:25:59 -04:00
dmapool.c	mm/dmapool.c: fixed a brace coding style issue	2014-10-09 22:26:00 -04:00
early_ioremap.c	mm: create generic early_ioremap() support	2014-04-07 16:36:15 -07:00
fadvise.c
failslab.c
filemap_xip.c
filemap.c	mm/filemap.c: remove trailing whitespace	2014-10-09 22:26:00 -04:00
fremap.c	mm: mark remap_file_pages() syscall as deprecated	2014-06-06 16:08:17 -07:00
frontswap.c	swap: change swap_list_head to plist, add swap_avail_head	2014-06-04 16:54:07 -07:00
gup.c	mm: introduce a general RCU get_user_pages_fast()	2014-10-09 22:26:00 -04:00
highmem.c	mm/highmem: make kmap cache coloring aware	2014-08-06 18:01:22 -07:00
huge_memory.c	mm: use VM_BUG_ON_MM where possible	2014-10-09 22:25:58 -04:00
hugetlb_cgroup.c	hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked	2014-08-29 16:28:16 -07:00
hugetlb.c	mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA	2014-10-09 22:25:57 -04:00
hwpoison-inject.c	mm/hwpoison-inject.c: remove unnecessary null test before debugfs_remove_recursive	2014-08-06 18:01:19 -07:00
init-mm.c
internal.h	mm, compaction: pass gfp mask to compact_control	2014-10-09 22:25:55 -04:00
interval_tree.c	mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA	2014-10-09 22:25:57 -04:00
iov_iter.c	fuse: honour max_read and max_write in direct_io mode	2014-09-26 21:16:51 -04:00
Kconfig	mm/balloon_compaction: add vmstat counters and kpageflags bit	2014-10-09 22:26:01 -04:00
Kconfig.debug
kmemcheck.c	mm/slab_common: move kmem_cache definition to internal header	2014-10-09 22:25:50 -04:00
kmemleak-test.c	mm/kmemleak-test.c: use pr_fmt for logging	2014-06-06 16:08:18 -07:00
kmemleak.c	mm: introduce kmemleak_update_trace()	2014-06-06 16:08:17 -07:00
ksm.c	mm: ksm use pr_err instead of printk	2014-10-09 22:26:00 -04:00
list_lru.c
maccess.c
madvise.c	mm: update the description for madvise_remove	2014-08-06 18:01:18 -07:00
Makefile	mm/balloon_compaction: add vmstat counters and kpageflags bit	2014-10-09 22:26:01 -04:00
memblock.c	mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()	2014-09-10 15:42:12 -07:00
memcontrol.c	memcg: zap memcg_can_account_kmem	2014-10-09 22:26:00 -04:00
memory_hotplug.c	memory-hotplug: add sysfs valid_zones attribute	2014-10-09 22:25:52 -04:00
memory-failure.c	hwpoison: fix race with changing page during offlining	2014-08-06 18:01:19 -07:00
memory.c	mm: softdirty: keep bit when zapping file pte	2014-09-26 08:10:35 -07:00
mempolicy.c	mm: mempolicy: skip inaccessible VMAs when setting MPOL_MF_LAZY	2014-10-09 22:26:02 -04:00
mempool.c	mm/mempool.c: update the kmemleak stack trace for mempool allocations	2014-06-06 16:08:17 -07:00
migrate.c	mm/balloon_compaction: redesign ballooned pages management	2014-10-09 22:26:01 -04:00
mincore.c
mlock.c	mm: use VM_BUG_ON_MM where possible	2014-10-09 22:25:58 -04:00
mm_init.c
mmap.c	mm: use VM_BUG_ON_MM where possible	2014-10-09 22:25:58 -04:00
mmu_context.c
mmu_notifier.c	kvm: Fix page ageing bugs	2014-09-24 14:07:58 +02:00
mmzone.c
mprotect.c	mm: move mmu notifier call from change_protection to change_pmd_range	2014-04-07 16:35:50 -07:00
mremap.c	mm/mremap.c: use linux headers	2014-10-09 22:26:00 -04:00
msync.c	msync: fix incorrect fstart calculation	2014-07-03 09:21:53 -07:00
nobootmem.c	mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()	2014-09-10 15:42:12 -07:00
nommu.c	arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area	2014-08-08 15:57:27 -07:00
oom_kill.c	mm: clean up zone flags	2014-10-09 22:25:57 -04:00
page_alloc.c	mm: move debug code out of page_alloc.c	2014-10-09 22:25:58 -04:00
page_cgroup.c
page_io.c	fix __swap_writepage() compile failure on old gcc versions	2014-06-14 19:30:48 -05:00
page_isolation.c
page-writeback.c	mm/page-writeback.c: use min3/max3 macros to avoid shadow warnings	2014-10-09 22:25:57 -04:00
pagewalk.c	mm: use VM_BUG_ON_MM where possible	2014-10-09 22:25:58 -04:00
percpu-km.c
percpu-vm.c	percpu: perform tlb flush after pcpu_map_pages() failure	2014-08-15 16:06:10 -04:00
percpu.c	percpu: free percpu allocation info for uniprocessor system	2014-08-16 08:59:02 -04:00
pgtable-generic.c	mm: actually clear pmd_numa before invalidating	2014-08-29 16:28:15 -07:00
process_vm_access.c	start adding the tag to iov_iter	2014-05-06 17:32:49 -04:00
quicklist.c
readahead.c	mm/readahead.c: remove unused file_ra_state from count_history_pages	2014-08-06 18:01:15 -07:00
rmap.c	mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA	2014-10-09 22:25:57 -04:00
shmem.c	include/linux/migrate.h: remove migrate_page #define	2014-10-09 22:25:56 -04:00
slab_common.c	memcg: move memcg_update_cache_size() to slab_common.c	2014-10-09 22:25:59 -04:00
slab.c	mm/slab.c: use __seq_open_private() instead of seq_open()	2014-10-09 22:25:57 -04:00
slab.h	mm/slab: use percpu allocator for cpu cache	2014-10-09 22:25:51 -04:00
slob.c	mm/sl[ao]b: always track caller in kmalloc_(node_)track_caller()	2014-10-09 22:25:50 -04:00
slub.c	mm/slab_common: commonize slab merge logic	2014-10-09 22:25:51 -04:00
sparse-vmemmap.c
sparse.c	mm: use macros from compiler.h instead of __attribute__((...))	2014-04-07 16:35:54 -07:00
swap_state.c	mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache	2014-10-09 22:25:59 -04:00
swap.c	mm: memcontrol: do not kill uncharge batching in free_pages_and_swap_cache	2014-10-09 22:25:59 -04:00
swapfile.c	mm: memcontrol: rewrite uncharge API	2014-08-08 15:57:17 -07:00
truncate.c	mm: memcontrol: rewrite uncharge API	2014-08-08 15:57:17 -07:00
util.c	proc/maps: make vm_is_stack() logic namespace-friendly	2014-10-09 22:25:50 -04:00
vmacache.c	mm,vmacache: optimize overflow system-wide flushing	2014-06-04 16:53:57 -07:00
vmalloc.c	mm/vmalloc.c: use seq_open_private() instead of seq_open()	2014-10-09 22:25:56 -04:00
vmpressure.c
vmscan.c	mm: memcontrol: fix transparent huge page allocations under pressure	2014-10-09 22:25:59 -04:00
vmstat.c	vmstat: on-demand vmstat workers V8	2014-10-09 22:26:02 -04:00
workingset.c
zbud.c	mm/zpool: use prefixed module loading	2014-08-29 16:28:16 -07:00
zpool.c	mm/zpool: use prefixed module loading	2014-08-29 16:28:16 -07:00
zsmalloc.c	mm/zpool: use prefixed module loading	2014-08-29 16:28:16 -07:00
zswap.c	mm/zswap.c: add __init to zswap_entry_cache_destroy()	2014-08-08 15:57:18 -07:00