From 3af229f2071f5b5cb31664be6109561fbe19c861 Mon Sep 17 00:00:00 2001 From: Nishanth Aravamudan Date: Tue, 10 Mar 2015 16:50:59 -0700 Subject: [PATCH] powerpc/numa: Reset node_possible_map to only node_online_map Raghu noticed an issue with excessive memory allocation on power with a simple cgroup test, specifically, in mem_cgroup_css_alloc -> for_each_node -> alloc_mem_cgroup_per_zone_info(), which ends up blowing up the kmalloc-2048 slab (to the order of 200MB for 400 cgroup directories). The underlying issue is that NODES_SHIFT on power is 8 (256 NUMA nodes possible), which defines node_possible_map, which in turn defines the value of nr_node_ids in setup_nr_node_ids and the iteration of for_each_node. In practice, we never see a system with 256 NUMA nodes, and in fact, we do not support node hotplug on power in the first place, so the nodes that are online when we come up are the nodes that will be present for the lifetime of this kernel. So let's, at least, drop the NUMA possible map down to the online map at runtime. This is similar to what x86 does in its initialization routines. mem_cgroup_css_alloc should also be fixed to only iterate over memory-populated nodes and handle hotplug, but that is a separate change. Signed-off-by: Nishanth Aravamudan Cc: Tejun Heo Cc: David Rientjes Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Anton Blanchard Cc: Raghavendra K T Signed-off-by: Michael Ellerman --- arch/powerpc/mm/numa.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index c68471c33731..5e80621d9324 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -958,6 +958,13 @@ void __init initmem_init(void) memblock_dump_all(); + /* + * Reduce the possible NUMA nodes to the online NUMA nodes, + * since we do not support node hotplug. This ensures that we + * lower the maximum NUMA node ID to what is actually present. + */ + nodes_and(node_possible_map, node_possible_map, node_online_map); + for_each_online_node(nid) { unsigned long start_pfn, end_pfn;