From ad2c8144418c6a81cefe65379fd47bbe8344cef2 Mon Sep 17 00:00:00 2001 From: Joonsoo Kim Date: Thu, 9 Oct 2014 15:26:13 -0700 Subject: [PATCH] topology: add support for node_to_mem_node() to determine the fallback node Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that on ppc LPARs with memoryless nodes, a large amount of memory was consumed by slabs and was marked unreclaimable. He tracked it down to slab deactivations in the SLUB core when we allocate remotely, leading to poor efficiency always when memoryless nodes are present. After much discussion, Joonsoo provided a few patches that help significantly. They don't resolve the problem altogether: - memory hotplug still needs testing, that is when a memoryless node becomes memory-ful, we want to dtrt - there are other reasons for going off-node than memoryless nodes, e.g., fully exhausted local nodes Neither case is resolved with this series, but I don't think that should block their acceptance, as they can be explored/resolved with follow-on patches. The series consists of: [1/3] topology: add support for node_to_mem_node() to determine the fallback node [2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node - Joonsoo's patches to cache the nearest node with memory for each NUMA node [3/3] Partial revert of 81c98869faa5 (""kthread: ensure locality of task_struct allocations") - At Tejun's request, keep the knowledge of memoryless node fallback to the allocator core. This patch (of 3): We need to determine the fallback node in slub allocator if the allocation target node is memoryless node. Without it, the SLUB wrongly select the node which has no memory and can't use a partial slab, because of node mismatch. Introduced function, node_to_mem_node(X), will return a node Y with memory that has the nearest distance. If X is memoryless node, it will return nearest distance node, but, if X is normal node, it will return itself. We will use this function in following patch to determine the fallback node. Signed-off-by: Joonsoo Kim Signed-off-by: Nishanth Aravamudan Cc: David Rientjes Cc: Han Pingtian Cc: Pekka Enberg Cc: Paul Mackerras Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: Anton Blanchard Cc: Christoph Lameter Cc: Wanpeng Li Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/topology.h | 17 +++++++++++++++++ mm/page_alloc.c | 1 + 2 files changed, 18 insertions(+) diff --git a/include/linux/topology.h b/include/linux/topology.h index dda6ee521e74..909b6e43b694 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -119,11 +119,20 @@ static inline int numa_node_id(void) * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem(). */ DECLARE_PER_CPU(int, _numa_mem_); +extern int _node_numa_mem_[MAX_NUMNODES]; #ifndef set_numa_mem static inline void set_numa_mem(int node) { this_cpu_write(_numa_mem_, node); + _node_numa_mem_[numa_node_id()] = node; +} +#endif + +#ifndef node_to_mem_node +static inline int node_to_mem_node(int node) +{ + return _node_numa_mem_[node]; } #endif @@ -146,6 +155,7 @@ static inline int cpu_to_mem(int cpu) static inline void set_cpu_numa_mem(int cpu, int node) { per_cpu(_numa_mem_, cpu) = node; + _node_numa_mem_[cpu_to_node(cpu)] = node; } #endif @@ -159,6 +169,13 @@ static inline int numa_mem_id(void) } #endif +#ifndef node_to_mem_node +static inline int node_to_mem_node(int node) +{ + return node; +} +#endif + #ifndef cpu_to_mem static inline int cpu_to_mem(int cpu) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index eee961958021..f3bc59f2ed52 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node); */ DEFINE_PER_CPU(int, _numa_mem_); /* Kernel "local memory" node */ EXPORT_PER_CPU_SYMBOL(_numa_mem_); +int _node_numa_mem_[MAX_NUMNODES]; #endif /* -- 2.30.2