kernel_optimize_test/include/trace/events
Michal Hocko d379f01de0 oom, trace: add oom detection tracepoints
should_reclaim_retry is the central decision point for declaring the
OOM.  It might be really useful to expose data used for this decision
making when debugging an unexpected oom situations.

Say we have an OOM report:
[   52.264001] mem_eater invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=0, order=0, oom_score_adj=0
[   52.267549] CPU: 3 PID: 3148 Comm: mem_eater Tainted: G        W       4.8.0-oomtrace3-00006-gb21338b386d2 #1024

Now we can check the tracepoint data to see how we have ended up in this
situation:
       mem_eater-3148  [003] ....    52.432801: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11134 min_wmark=11084 no_progress_loops=1 wmark_check=1
       mem_eater-3148  [003] ....    52.433269: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11103 min_wmark=11084 no_progress_loops=1 wmark_check=1
       mem_eater-3148  [003] ....    52.433712: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11100 min_wmark=11084 no_progress_loops=2 wmark_check=1
       mem_eater-3148  [003] ....    52.434067: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11097 min_wmark=11084 no_progress_loops=3 wmark_check=1
       mem_eater-3148  [003] ....    52.434414: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11094 min_wmark=11084 no_progress_loops=4 wmark_check=1
       mem_eater-3148  [003] ....    52.434761: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11091 min_wmark=11084 no_progress_loops=5 wmark_check=1
       mem_eater-3148  [003] ....    52.435108: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11087 min_wmark=11084 no_progress_loops=6 wmark_check=1
       mem_eater-3148  [003] ....    52.435478: reclaim_retry_zone: node=0 zone=DMA32 order=0 reclaimable=51 available=11084 min_wmark=11084 no_progress_loops=7 wmark_check=0
       mem_eater-3148  [003] ....    52.435478: reclaim_retry_zone: node=0 zone=DMA order=0 reclaimable=0 available=1126 min_wmark=179 no_progress_loops=7 wmark_check=0

The above shows that we can quickly deduce that the reclaim stopped
making any progress (see no_progress_loops increased in each round) and
while there were still some 51 reclaimable pages they couldn't be
dropped for some reason (vmscan trace points would tell us more about
that part).  available will represent reclaimable + free_pages scaled
down per no_progress_loops factor.  This is essentially an optimistic
estimate of how much memory we would have when reclaiming everything.
This can be compared to min_wmark to get a rought idea but the
wmark_check tells the result of the watermark check which is more
precise (includes lowmem reserves, considers the order etc.).  As we can
see no zone is eligible in the end and that is why we have triggered the
oom in this situation.

Please note that higher order requests might fail on the wmark_check
even when there is much more memory available than min_wmark - e.g.
when the memory is fragmented.  A follow up tracepoint will help to
debug those situations.

Link: http://lkml.kernel.org/r/20161220130135.15719-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:27 -08:00
..
9p.h net/9p/tracing: Export enums in tracepoints to userspace 2015-04-08 09:39:59 -04:00
afs.h afs: Refcount the afs_call struct 2017-01-09 11:10:02 +00:00
alarmtimer.h ktime: Get rid of the union 2016-12-25 17:21:22 +01:00
asoc.h ASoC: trace: fix printing jack name 2016-02-26 10:52:48 +09:00
bcache.h block: better op and flags encoding 2016-10-28 08:48:16 -06:00
block.h block: cleanup tracing 2017-01-27 15:08:35 -07:00
bpf.h bpf: add initial bpf tracepoints 2017-01-25 13:17:47 -05:00
btrfs.h btrfs: make tracepoint format strings more compact 2017-01-09 11:27:07 +01:00
cgroup.h kernfs: handle null pointers while printing node name and path 2017-02-10 16:02:26 +01:00
clk.h clk: Add tracepoints for hardware operations 2015-03-12 12:18:51 -07:00
cma.h mm: cma: add trace events for CMA allocations and freeings 2015-04-15 16:35:19 -07:00
compaction.h mm, trace: extract COMPACTION_STATUS and ZONE_TYPE to a common header 2017-02-22 16:41:27 -08:00
context_tracking.h
cpuhp.h cpu/hotplug: Add multi instance support 2016-09-02 20:05:05 +02:00
devlink.h devlink: fix trace format string 2016-07-14 22:16:05 -07:00
dma_fence.h dma-buf: Rename struct fence to dma_fence 2016-10-25 14:40:39 +02:00
ext4.h don't bother with ->d_inode->i_sb - it's always equal to ->d_sb 2016-04-10 17:11:51 -04:00
f2fs.h for-f2fs-4.10 2016-12-14 09:07:36 -08:00
fib6.h ipv6, trace: fix tos reporting on fib6_table_lookup 2016-03-20 13:44:34 -04:00
fib.h net: Make table id type u32 2015-09-01 14:32:44 -07:00
filelock.h locks: sprinkle some tracepoints around the file locking code 2016-01-08 11:38:13 -05:00
filemap.h tracing, mm: Record pfn instead of pointer to struct page 2015-04-13 11:44:52 -03:00
fs_dax.h mm, dax: change pmd_fault() to take only vmf parameter 2017-02-22 16:41:26 -08:00
gpio.h tracing: gpio: Add Kconfig option for enabling/disabling trace events 2015-10-20 21:56:10 -04:00
host1x.h gpu: host1x: Use struct host1x_bo pointers in traces 2014-11-13 16:11:32 +01:00
hswadsp.h
huge_memory.h mm, thp: convert from optimistic swapin collapsing to conservative 2016-07-26 16:19:19 -07:00
i2c.h tracing: Have the reg function allow to fail 2016-12-09 09:13:30 -05:00
intel_ish.h HID: intel-ish-hid: ipc layer 2016-08-17 11:13:07 +02:00
intel-sst.h tracing: Add TRACE_SYSTEM_VAR to intel-sst 2015-04-07 12:31:12 -04:00
iommu.h iommu: Change trace unmap api to report unmapped size 2015-01-19 15:19:31 +01:00
ipi.h tracepoint: add generic tracepoint definitions for IPI tracing 2014-08-07 20:40:40 -04:00
irq.h irq: Fix typo in tracepoint.xml 2016-09-29 10:03:38 +02:00
jbd2.h
kmem.h Nothing major this round. Mostly small clean ups and fixes. 2016-03-24 10:52:25 -07:00
kvm.h KVM: x86: add KVM_CAP_X2APIC_API 2016-07-14 09:03:57 +02:00
libata.h ata: Handle ATA NCQ NO-DATA commands correctly 2016-07-15 08:08:13 -04:00
lock.h
mce.h x86/mce/AMD: Save MCA_IPID in MCE struct on SMCA systems 2016-09-13 15:23:12 +02:00
mdio.h net/phy: add trace events for mdio accesses 2016-11-24 11:55:43 -05:00
migrate.h mm: tracing: Export enums in tracepoints to user space 2015-04-08 09:40:01 -04:00
mmc.h mmc: core: Provide tracepoints for request processing 2016-05-02 10:33:11 +02:00
mmflags.h mm, trace: extract COMPACTION_STATUS and ZONE_TYPE to a common header 2017-02-22 16:41:27 -08:00
module.h tracing: %pF is only for function pointers 2015-03-25 08:57:22 -04:00
napi.h net: fixup for tracepoint napi:napi_poll 2016-07-15 15:55:01 -07:00
net.h net: rename vlan_tx_* helpers since "tx" is misleading there 2015-01-13 17:51:08 -05:00
nilfs2.h nilfs2: add tracepoints for analyzing reading and writing metadata files 2015-11-06 17:50:42 -08:00
nmi.h
oom.h oom, trace: add oom detection tracepoints 2017-02-22 16:41:27 -08:00
page_isolation.h mm/page_isolation: fix tracepoint to mirror check function behavior 2016-04-01 17:03:37 -05:00
page_ref.h mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
pagemap.h mm: pagemap: avoid unnecessary overhead when tracepoints are deactivated 2014-08-06 18:01:20 -07:00
power_cpu_migrate.h
power.h cpufreq: intel_pstate: Add io_boost trace 2016-09-16 23:55:30 +02:00
printk.h printk, tracing: Avoiding unneeded blank lines 2016-07-15 15:52:41 -04:00
random.h tracing: %pF is only for function pointers 2015-03-25 08:57:22 -04:00
rcu.h rcu: Check cond_resched_rcu_qs() state less often to reduce GP overhead 2017-01-23 11:44:18 -08:00
regulator.h
rpm.h
rxrpc.h rxrpc: Add some more tracing 2017-01-05 11:39:12 +00:00
sched.h sched/core: Fix trace_sched_switch() 2015-10-06 17:08:15 +02:00
scsi.h scsi-trace: define ZBC_IN and ZBC_OUT 2016-04-11 16:57:09 -04:00
signal.h
skb.h
sock.h
spi.h
spmi.h spmi: add command tracepoints for SPMI 2015-08-05 12:27:09 -07:00
sunrpc.h SUNRPC: Add tracepoints for dropped and deferred requests 2016-07-13 15:53:43 -04:00
sunvnet.h sunvnet: Add support for perf LDC event tracing 2016-02-07 14:13:05 -05:00
swiotlb.h swiotlb: Add swiotlb=noforce debug option 2016-12-19 09:05:20 -05:00
syscalls.h
target.h target: Minimize SCSI header #include directives 2015-06-02 08:03:25 -07:00
task.h tracing: Don't make assumptions about length of string on task rename 2015-08-31 10:47:14 -04:00
thermal_power_allocator.h thermal: consistently use int for temperatures 2015-08-03 23:15:50 +08:00
thermal.h thermal: trace: migrating thermal traces to use TRACE_DEFINE_ENUM() macros 2016-03-15 07:51:40 +08:00
thp.h powerpc/thp: Add tracepoints to track hugepage invalidate 2014-08-13 18:20:42 +10:00
timer.h timers/itimer: Convert internal cputime_t units to nsec 2017-02-01 09:13:55 +01:00
tlb.h tracing: Remove duplicate checks for online CPUs 2016-03-08 11:19:28 -05:00
udp.h
ufs.h scsi: ufs: add trace event for ufs commands 2017-01-05 18:10:04 -05:00
v4l2.h [media] media: videobuf2: Move timestamp to vb2_buffer 2015-12-18 13:53:31 -02:00
vb2.h [media] media: videobuf2: Move timestamp to vb2_buffer 2015-12-18 13:53:31 -02:00
vmscan.h mm, vmscan: add classzone information to tracepoints 2016-07-28 16:07:41 -07:00
vsock_virtio_transport_common.h VSOCK: Introduce virtio_vsock_common.ko 2016-08-02 02:57:29 +03:00
wbt.h blk-wbt: add general throttling mechanism 2016-11-10 13:53:32 -07:00
workqueue.h
writeback.h mm: move vmscan writes and file write accounting to the node 2016-07-28 16:07:41 -07:00
xdp.h bpf: add initial bpf tracepoints 2017-01-25 13:17:47 -05:00
xen.h x86: expose number of page table levels on Kconfig level 2015-04-14 16:49:02 -07:00