powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
Currently we have 2 segments that are bolted for the kernel linear
mapping (ie 0xc000... addresses). This is 0 to 1TB and also the kernel
stacks. Anything accessed outside of these regions may need to be
faulted in. (In practice machines with TM always have 1T segments)
If a machine has < 2TB of memory we never fault on the kernel linear
mapping as these two segments cover all physical memory. If a machine
has > 2TB of memory, there may be structures outside of these two
segments that need to be faulted in. This faulting can occur when
running as a guest as the hypervisor may remove any SLB that's not
bolted.
When we treclaim and trecheckpoint we have a window where we need to
run with the userspace GPRs. This means that we no longer have a valid
stack pointer in r1. For this window we therefore clear MSR RI to
indicate that any exceptions taken at this point won't be able to be
handled. This means that we can't take segment misses in this RI=0
window.
In this RI=0 region, we currently access the thread_struct for the
process being context switched to or from. This thread_struct access
may cause a segment fault since it's not guaranteed to be covered by
the two bolted segment entries described above.
We've seen this with a crash when running as a guest with > 2TB of
memory on PowerVM:
Unrecoverable exception 4100 at
c00000000004f138
Oops: Unrecoverable exception, sig: 6 [#1]
SMP NR_CPUS=2048 NUMA pSeries
CPU: 1280 PID: 7755 Comm: kworker/1280:1 Tainted: G X 4.4.13-46-default #1
task:
c000189001df4210 ti:
c000189001d5c000 task.ti:
c000189001d5c000
NIP:
c00000000004f138 LR:
0000000010003a24 CTR:
0000000010001b20
REGS:
c000189001d5f730 TRAP: 4100 Tainted: G X (4.4.13-46-default)
MSR:
8000000100001031 <SF,ME,IR,DR,LE> CR:
24000048 XER:
00000000
CFAR:
c00000000004ed18 SOFTE: 0
GPR00:
ffffffffc58d7b60 c000189001d5f9b0 00000000100d7d00 000000003a738288
GPR04:
0000000000002781 0000000000000006 0000000000000000 c0000d1f4d889620
GPR08:
000000000000c350 00000000000008ab 00000000000008ab 00000000100d7af0
GPR12:
00000000100d7ae8 00003ffe787e67a0 0000000000000000 0000000000000211
GPR16:
0000000010001b20 0000000000000000 0000000000800000 00003ffe787df110
GPR20:
0000000000000001 00000000100d1e10 0000000000000000 00003ffe787df050
GPR24:
0000000000000003 0000000000010000 0000000000000000 00003fffe79e2e30
GPR28:
00003fffe79e2e68 00000000003d0f00 00003ffe787e67a0 00003ffe787de680
NIP [
c00000000004f138] restore_gprs+0xd0/0x16c
LR [
0000000010003a24] 0x10003a24
Call Trace:
[
c000189001d5f9b0] [
c000189001d5f9f0] 0xc000189001d5f9f0 (unreliable)
[
c000189001d5fb90] [
c00000000001583c] tm_recheckpoint+0x6c/0xa0
[
c000189001d5fbd0] [
c000000000015c40] __switch_to+0x2c0/0x350
[
c000189001d5fc30] [
c0000000007e647c] __schedule+0x32c/0x9c0
[
c000189001d5fcb0] [
c0000000007e6b58] schedule+0x48/0xc0
[
c000189001d5fce0] [
c0000000000deabc] worker_thread+0x22c/0x5b0
[
c000189001d5fd80] [
c0000000000e7000] kthread+0x110/0x130
[
c000189001d5fe30] [
c000000000009538] ret_from_kernel_thread+0x5c/0xa4
Instruction dump:
7cb103a6 7cc0e3a6 7ca222a6 78a58402 38c00800 7cc62838 08860000 7cc000a6
38a00006 78c60022 7cc62838 0b060000 <
e8c701a0>
7ccff120 e8270078 e8a70098
---[ end trace
602126d0a1dedd54 ]---
This fixes this by copying the required data from the thread_struct to
the stack before we clear MSR RI. Then once we clear RI, we only access
the stack, guaranteeing there's no segment miss.
We also tighten the region over which we set RI=0 on the treclaim()
path. This may have a slight performance impact since we're adding an
mtmsr instruction.
Fixes: 090b9284d725 ("powerpc/tm: Clear MSR RI in non-recoverable TM code")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>