x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off()
authorRik van Riel <riel@surriel.com>
Mon, 16 Jul 2018 19:03:37 +0000 (15:03 -0400)
committerIngo Molnar <mingo@kernel.org>
Tue, 17 Jul 2018 07:35:34 +0000 (09:35 +0200)
Song Liu noticed switch_mm_irqs_off() taking a lot of CPU time in recent
kernels,using 1.8% of a 48 CPU system during a netperf to localhost run.
Digging into the profile, we noticed that cpumask_clear_cpu and
cpumask_set_cpu together take about half of the CPU time taken by
switch_mm_irqs_off().

However, the CPUs running netperf end up switching back and forth
between netperf and the idle task, which does not require changes
to the mm_cpumask. Furthermore, the init_mm cpumask ends up being
the most heavily contended one in the system.

Simply skipping changes to mm_cpumask(&init_mm) reduces overhead.

Reported-and-tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: kernel-team@fb.com
Cc: luto@kernel.org
Link: http://lkml.kernel.org/r/20180716190337.26133-8-riel@surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
arch/x86/mm/tlb.c

index 493559cae2d5a325d1ac15d49f52eb6f841de2e4..f086195f644cec8cde68a8af7054dd1c22d40da3 100644 (file)
@@ -310,15 +310,22 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
                        sync_current_stack_to_mm(next);
                }
 
-               /* Stop remote flushes for the previous mm */
-               VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu, mm_cpumask(real_prev)) &&
-                               real_prev != &init_mm);
-               cpumask_clear_cpu(cpu, mm_cpumask(real_prev));
+               /*
+                * Stop remote flushes for the previous mm.
+                * Skip kernel threads; we never send init_mm TLB flushing IPIs,
+                * but the bitmap manipulation can cause cache line contention.
+                */
+               if (real_prev != &init_mm) {
+                       VM_WARN_ON_ONCE(!cpumask_test_cpu(cpu,
+                                               mm_cpumask(real_prev)));
+                       cpumask_clear_cpu(cpu, mm_cpumask(real_prev));
+               }
 
                /*
                 * Start remote flushes and then read tlb_gen.
                 */
-               cpumask_set_cpu(cpu, mm_cpumask(next));
+               if (next != &init_mm)
+                       cpumask_set_cpu(cpu, mm_cpumask(next));
                next_tlb_gen = atomic64_read(&next->context.tlb_gen);
 
                choose_new_asid(next, next_tlb_gen, &new_asid, &need_flush);