Christophe Leroy [Mon, 6 Aug 2018 15:09:11 +0000 (15:09 +0000)]
powerpc/cpm1: fix compilation error with CONFIG_PPC_EARLY_DEBUG_CPM
commit
e8cb7a55eb8dc ("powerpc: remove superflous inclusions of
asm/fixmap.h") removed inclusion of asm/fixmap.h from files not
including objects from that file.
However, asm/mmu-8xx.h includes call to __fix_to_virt(). The proper
way would be to include asm/fixmap.h in asm/mmu-8xx.h but it creates
an inclusion loop.
So we have to leave asm/fixmap.h in sysdep/cpm_common.c for
CONFIG_PPC_EARLY_DEBUG_CPM
CC arch/powerpc/sysdev/cpm_common.o
In file included from ./arch/powerpc/include/asm/mmu.h:340:0,
from ./arch/powerpc/include/asm/reg_8xx.h:8,
from ./arch/powerpc/include/asm/reg.h:29,
from ./arch/powerpc/include/asm/processor.h:13,
from ./arch/powerpc/include/asm/thread_info.h:28,
from ./include/linux/thread_info.h:38,
from ./arch/powerpc/include/asm/ptrace.h:159,
from ./arch/powerpc/include/asm/hw_irq.h:12,
from ./arch/powerpc/include/asm/irqflags.h:12,
from ./include/linux/irqflags.h:16,
from ./include/asm-generic/cmpxchg-local.h:6,
from ./arch/powerpc/include/asm/cmpxchg.h:537,
from ./arch/powerpc/include/asm/atomic.h:11,
from ./include/linux/atomic.h:5,
from ./include/linux/mutex.h:18,
from ./include/linux/kernfs.h:13,
from ./include/linux/sysfs.h:16,
from ./include/linux/kobject.h:20,
from ./include/linux/device.h:16,
from ./include/linux/node.h:18,
from ./include/linux/cpu.h:17,
from ./include/linux/of_device.h:5,
from arch/powerpc/sysdev/cpm_common.c:21:
arch/powerpc/sysdev/cpm_common.c: In function ‘udbg_init_cpm’:
./arch/powerpc/include/asm/mmu-8xx.h:218:25: error: implicit declaration of function ‘__fix_to_virt’ [-Werror=implicit-function-declaration]
#define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
VIRT_IMMR_BASE);
^
./arch/powerpc/include/asm/mmu-8xx.h:218:39: error: ‘FIX_IMMR_BASE’ undeclared (first use in this function)
#define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
VIRT_IMMR_BASE);
^
./arch/powerpc/include/asm/mmu-8xx.h:218:39: note: each undeclared identifier is reported only once for each function it appears in
#define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
VIRT_IMMR_BASE);
^
cc1: all warnings being treated as errors
make[1]: *** [arch/powerpc/sysdev/cpm_common.o] Error 1
Fixes: e8cb7a55eb8dc ("powerpc: remove superflous inclusions of asm/fixmap.h")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Dan Carpenter [Wed, 8 Aug 2018 11:57:24 +0000 (14:57 +0300)]
powerpc: Fix size calculation using resource_size()
The problem is the the calculation should be "end - start + 1" but the
plus one is missing in this calculation.
Fixes: 8626816e905e ("powerpc: add support for MPIC message register API")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rashmica Gupta [Fri, 3 Aug 2018 06:06:01 +0000 (16:06 +1000)]
Documentation: Update documentation on ppc-memtrace
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rashmica Gupta [Fri, 3 Aug 2018 06:06:00 +0000 (16:06 +1000)]
powerpc/powernv: Allow memory that has been hot-removed to be hot-added
This patch allows the memory removed by memtrace to be readded to the
kernel. So now you don't have to reboot your system to add the memory
back to the kernel or to have a different amount of memory removed.
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Breno Leitao [Tue, 7 Aug 2018 14:15:39 +0000 (11:15 -0300)]
selftests/powerpc: Kill child processes on SIGINT
There are some powerpc selftests, as tm/tm-unavailable, that run for a long
period (>120 seconds), and if it is interrupted, as pressing CRTL-C
(SIGINT), the foreground process (harness) dies but the child process and
threads continue to execute (with PPID = 1 now) in background.
In this case, you'd think the whole test exited, but there are remaining
threads and processes being executed in background. Sometimes these
zombies processes are doing annoying things, as consuming the whole CPU or
dumping things to STDOUT.
This patch fixes this problem by attaching an empty signal handler to
SIGINT in the harness process. This handler will interrupt (EINTR) the
parent process waitpid() call, letting the code to follow through the
normal flow, which will kill all the processes in the child process group.
This patch also fixes a typo.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Tue, 10 Apr 2018 07:16:10 +0000 (17:16 +1000)]
powerpc/powernv/opal: Use standard interrupts property when available
For (bad) historical reasons, OPAL used to create a non-standard pair
of properties "opal-interrupts" and "opal-interrupts-names" for
representing the list of interrupts it wants Linux to request on its
behalf.
Among other issues, the opal-interrupts doesn't have a way to carry
the type of interrupts, and they were assumed to be all level
sensitive.
This is wrong on some recent systems where some of them are edge
sensitive causing warnings in the XIVE code and possible misbehaviours
if they need to be retriggered (typically the NPU2 TCE error
interrupts).
This makes Linux switch to using the standard "interrupts" and
"interrupt-names" properties instead when they are available, using
standard of_irq helpers, which can carry all the desired type
information.
Newer versions of OPAL will generate those properties in addition to
the legacy ones.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fixup prefix logic to check strlen(r->name). Reinstate setting
of start = 0 in opal_event_shutdown() to avoid double free warnings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Thu, 7 Jun 2018 10:10:22 +0000 (10:10 +0000)]
powerpc: Allow CPU selection of e300core variants
GCC supports -mcpu=e300c2 and -mcpu=e300c3
This patch gives the opportunity to tune kernel to one of
those two types.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Thu, 7 Jun 2018 10:10:20 +0000 (10:10 +0000)]
powerpc: Allow CPU selection also on PPC32
This patch extends to PPC32 the capability to select the exact
CPU type.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Thu, 7 Jun 2018 10:10:18 +0000 (10:10 +0000)]
powerpc: Make CPU selection logic generic in Makefile
At the time being, when adding a new CPU for selection, both
Kconfig.cputype and Makefile have to be modified.
This patch moves into Kconfig.cputype the name of the CPU to me
passed to the -mcpu= argument.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Rename the option to TARGET_CPU to echo the gcc documentation]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rodrigo R. Galvao [Mon, 6 Aug 2018 16:42:03 +0000 (13:42 -0300)]
powerpc/Makefiles: Convert ifeq to ifdef where possible
In Makefiles if we're testing a CONFIG_FOO symbol for equality with 'y'
we can instead just use ifdef. The latter reads easily, so convert to
it where possible.
Signed-off-by: Rodrigo R. Galvao <rosattig@linux.vnet.ibm.com>
Reviewed-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Fri, 3 Aug 2018 10:13:06 +0000 (20:13 +1000)]
powerpc/64: Copy as much as possible in __copy_tofrom_user
In __copy_tofrom_user, if we encounter an exception on a store, we
stop copying and return the number of bytes not copied. However,
if the store is wider than one byte and is to an unaligned address,
it is possible that the store operand overlaps a page boundary
and the exception occurred on the latter part of the store operand,
meaning that it would be possible to copy a few more bytes. Since
copy_to_user is generally expected to copy as much as possible,
it would be better to copy those extra few bytes. This adds code
to do that. Since this edge case is not performance-critical,
the code has been written to be compact rather than as fast as
possible.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Fri, 3 Aug 2018 10:13:05 +0000 (20:13 +1000)]
selftests/powerpc/64: Test exception cases in copy_tofrom_user
This adds a set of test cases to test the behaviour of
copy_tofrom_user when exceptions are encountered accessing the
source or destination. Currently, copy_tofrom_user does not always
copy as many bytes as possible when an exception occurs on a store
to the destination, and that is reflected in failures in these tests.
Based on a test program from Anton Blanchard.
[paulus@ozlabs.org - test all three paths, wrote commit description,
made EX_TABLE create an exception table.]
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Fri, 3 Aug 2018 10:13:04 +0000 (20:13 +1000)]
selftests/powerpc/64: Test all paths through copy routines
The hand-coded assembler 64-bit copy routines include feature sections
that select one code path or another depending on which CPU we are
executing on. The self-tests for these copy routines end up testing
just one path. This adds a mechanism for selecting any desired code
path at compile time, and makes 2 or 3 versions of each test, each
using a different code path, so as to cover all the possible paths.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Add -mcpu=power4 to CFLAGS for older compilers]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Paul Mackerras [Fri, 3 Aug 2018 10:13:03 +0000 (20:13 +1000)]
powerpc/64: Make exception table clearer in __copy_tofrom_user_base
This aims to make the generation of exception table entries for the
loads and stores in __copy_tofrom_user_base clearer and easier to
verify. Instead of having a series of local labels on the loads and
stores, with a series of corresponding labels later for the exception
handlers, we now use macros to generate exception table entries at the
point of each load and store that could potentially trap. We do this
with the macros lex (load exception) and stex (store exception).
These macros are used right before the load or store to which they
apply.
Some complexity is introduced by the fact that we have some more work
to do after hitting an exception, because we need to calculate and
return the number of bytes not copied. The code uses r3 as the
current pointer into the destination buffer, that is, the address of
the first byte of the destination that has not been modified.
However, at various points in the copy loops, r3 can be 4, 8, 16 or 24
bytes behind that point.
To express this offset in an understandable way, we define a symbol
r3_offset which is updated at various points so that it equal to the
difference between the address of the first unmodified byte of the
destination and the value in r3. (In fact it only needs to be
accurate at the point of each lex or stex macro invocation.)
The rules for updating r3_offset are as follows:
* It starts out at 0
* An addi r3,r3,N instruction decreases r3_offset by N
* A store instruction (stb, sth, stw, std) to N(r3)
increases r3_offset by the width of the store (1, 2, 4, 8)
* A store with update instruction (stbu, sthu, stwu, stdu) to N(r3)
sets r3_offset to the width of the store.
There is some trickiness to the way that the lex and stex macros and
the associated exception handlers work. I would have liked to use
the current value of r3_offset in the name of the symbol used as
the exception handler, as in ".Lld_exc_$(r3_offset)" and then
have symbols .Lld_exc_0, .Lld_exc_8, .Lld_exc_16 etc. corresponding
to the offsets that needed to be added to r3. However, I couldn't
see a way to do that with gas.
Instead, the exception handler address is .Lld_exc - r3_offset or
.Lst_exc - r3_offset, that is, the distance ahead of .Lld_exc/.Lst_exc
that we start executing is equal to the amount that we need to add to
r3. This works because r3_offset is always a small multiple of 4,
and our instructions are 4 bytes long. This means that before
.Lld_exc and .Lst_exc, we have a sequence of instructions that
increments r3 by 4, 8, 16 or 24 depending on where we start. The
sequence increments r3 by 4 per instruction (on average).
We also replace the exception table for the 4k copy loop by a
macro per load or store. These loads and stores all use exactly
the same exception handler, which simply resets the argument registers
r3, r4 and r5 to there original values and re-does the whole copy
using the slower loop.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
zhong jiang [Sat, 4 Aug 2018 14:25:00 +0000 (22:25 +0800)]
powerpc/powermac: of_node_put() is not needed after iterator
for_each_node_by_name() iterators only exit normally when the loop
cursor is NULL, So there is no need to call of_node_put().
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Haren Myneni [Wed, 13 Jun 2018 07:32:40 +0000 (00:32 -0700)]
crypto/nx: Initialize 842 high and normal RxFIFO control registers
NX increments readOffset by FIFO size in receive FIFO control register
when CRB is read. But the index in RxFIFO has to match with the
corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
may be processing incorrect CRBs and can cause CRB timeout.
VAS FIFO offset is 0 when the receive window is opened during
initialization. When the module is reloaded or in kexec boot, readOffset
in FIFO control register may not match with VAS entry. This patch adds
nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
control register for both high and normal FIFOs.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
[mpe: Fixup uninitialized variable warning]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Haren Myneni [Wed, 13 Jun 2018 07:28:57 +0000 (00:28 -0700)]
powerpc/powernv: Export opal_check_token symbol
Export opal_check_token symbol for modules to check the availability
of OPAL calls before using them.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Randy Dunlap [Sun, 15 Jul 2018 17:34:46 +0000 (10:34 -0700)]
powerpc/platforms/85xx: fix t1042rdb_diu.c build errors & warning
Fix build errors and warnings in t1042rdb_diu.c by adding header files
and MODULE_LICENSE().
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: data definition has no type or storage class
early_initcall(t1042rdb_diu_init);
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: error: type defaults to 'int' in declaration of 'early_initcall' [-Werror=implicit-int]
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: parameter names (without types) in function declaration
and
WARNING: modpost: missing MODULE_LICENSE() in arch/powerpc/platforms/85xx/t1042rdb_diu.o
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Anju T Sudhakar [Fri, 18 May 2018 07:35:25 +0000 (13:05 +0530)]
powerpc/perf: Remove sched_task function defined for thread-imc
Call trace observed while running perf-fuzzer:
CPU: 43 PID: 9088 Comm: perf_fuzzer Not tainted 4.13.0-32-generic #35~lp1746225
task:
c000003f776ac900 task.stack:
c000003f77728000
NIP:
c000000000299b70 LR:
c0000000002a4534 CTR:
c00000000029bb80
REGS:
c000003f7772b760 TRAP: 0700 Not tainted (4.13.0-32-generic)
MSR:
900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
CR:
24008822 XER:
00000000
CFAR:
c000000000299a70 SOFTE: 0
GPR00:
c0000000002a4534 c000003f7772b9e0 c000000001606200 c000003fef858908
GPR04:
c000003f776ac900 0000000000000001 ffffffffffffffff 0000003fee730000
GPR08:
0000000000000000 0000000000000000 c0000000011220d8 0000000000000002
GPR12:
c00000000029bb80 c000000007a3d900 0000000000000000 0000000000000000
GPR16:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20:
0000000000000000 0000000000000000 c000003f776ad090 c000000000c71354
GPR24:
c000003fef716780 0000003fee730000 c000003fe69d4200 c000003f776ad330
GPR28:
c0000000011220d8 0000000000000001 c0000000014c6108 c000003fef858900
NIP [
c000000000299b70] perf_pmu_sched_task+0x170/0x180
LR [
c0000000002a4534] __perf_event_task_sched_in+0xc4/0x230
Call Trace:
perf_iterate_sb+0x158/0x2a0 (unreliable)
__perf_event_task_sched_in+0xc4/0x230
finish_task_switch+0x21c/0x310
__schedule+0x304/0xb80
schedule+0x40/0xc0
do_wait+0x254/0x2e0
kernel_wait4+0xa0/0x1a0
SyS_wait4+0x64/0xc0
system_call+0x58/0x6c
Instruction dump:
3beafea0 7faa4800 409eff18 e8010060 eb610028 ebc10040 7c0803a6 38210050
eb81ffe0 eba1ffe8 ebe1fff8 4e800020 <
0fe00000>
4bffffbc 60000000 60420000
---[ end trace
8c46856d314c1811 ]---
The context switch call-backs for thread-imc are defined in sched_task function.
So when thread-imc events are grouped with software pmu events,
perf_pmu_sched_task hits the WARN_ON_ONCE condition, since software PMUs are
assumed not to have a sched_task defined.
Patch to move the thread_imc enable/disable opal call back from sched_task to
event_[add/del] function
Fixes: f74c89bd80fb ("powerpc/perf: Add thread IMC PMU support")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Fri, 27 Jul 2018 11:48:17 +0000 (21:48 +1000)]
powerpc/64s: Fix page table fragment refcount race vs speculative references
The page table fragment allocator uses the main page refcount racily
with respect to speculative references. A customer observed a BUG due
to page table page refcount underflow in the fragment allocator. This
can be caused by the fragment allocator set_page_count stomping on a
speculative reference, and then the speculative failure handler
decrements the new reference, and the underflow eventually pops when
the page tables are freed.
Fix this by using a dedicated field in the struct page for the page
table fragment allocator.
Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
Cc: stable@vger.kernel.org # v3.10+
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Parth Y Shah [Fri, 3 Aug 2018 10:20:38 +0000 (15:50 +0530)]
misc: cxl: changed asterisk position
Resolved <"foo* bar" should be "foo *bar"> error
Signed-off-by: Parth Y Shah <sparth1292@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Darren Stevens [Fri, 3 Aug 2018 11:15:10 +0000 (21:15 +1000)]
powerpc/pasemi: Use pr_err/pr_warn... for kernel messages
Pasemi code still uses printk(KERN_ERR/KERN_WARN ... change these to
pr_err(, pr_warn(... to match other powerpc arch code.
No functional changes.
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
[mpe: Unsplit some strings while we're at it]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Murilo Opsfelder Araujo [Wed, 1 Aug 2018 21:33:20 +0000 (18:33 -0300)]
powerpc/traps: Show instructions on exceptions
Call show_user_instructions() in arch/powerpc/kernel/traps.c to dump
instructions at faulty location, useful to debugging.
Before this patch, an unhandled signal message looked like:
pandafault[10524]: segfault (11) at
100007d0 nip
1000061c lr
7fffbd295100 code 2 in pandafault[
10000000+10000]
After this patch, it looks like:
pandafault[10524]: segfault (11) at
100007d0 nip
1000061c lr
7fffbd295100 code 2 in pandafault[
10000000+10000]
pandafault[10524]: code:
4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
pandafault[10524]: code:
392988d0 f93f0020 e93f0020 39400048 <
99490000>
39200000 7d234b78 383f0040
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Murilo Opsfelder Araujo [Wed, 1 Aug 2018 21:33:19 +0000 (18:33 -0300)]
powerpc: Add show_user_instructions()
show_user_instructions() is a slightly modified version of
show_instructions() that allows userspace instruction dump.
This will be useful within show_signal_msg() to dump userspace
instructions of the faulty location.
Here is a sample of what show_user_instructions() outputs:
pandafault[10850]: code:
4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
pandafault[10850]: code:
392988d0 f93f0020 e93f0020 39400048 <
99490000>
39200000 7d234b78 383f0040
The current->comm and current->pid printed can serve as a glue that
links the instructions dump to its originator, allowing messages to be
interleaved in the logs.
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Murilo Opsfelder Araujo [Wed, 1 Aug 2018 21:33:18 +0000 (18:33 -0300)]
powerpc/traps: Print VMA for unhandled signals
This adds VMA address in the message printed for unhandled signals,
similarly to what other architectures, like x86, print.
Before this patch, a page fault looked like:
pandafault[61470]: unhandled signal 11 at
100007d0 nip
1000061c lr
7fff8d185100 code 2
After this patch, a page fault looks like:
pandafault[6303]: segfault 11 at
100007d0 nip
1000061c lr
7fff93c55100 code 2 in pandafault[
10000000+10000]
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Murilo Opsfelder Araujo [Wed, 1 Aug 2018 21:33:17 +0000 (18:33 -0300)]
powerpc/traps: Use %lx format in show_signal_msg()
Use %lx format to print registers. This avoids having two different
formats and avoids checking for MSR_64BIT, improving readability of the
function.
Even though we could have used %px, which is functionally equivalent to %lx
as per Documentation/core-api/printk-formats.rst, it is not semantically
correct because the data printed are not pointers. And using %px requires
casting data to (void *).
Besides that, %lx matches the format used in show_regs().
Before this patch:
pandafault[4808]: unhandled signal 11 at
0000000010000718 nip
0000000010000574 lr
00007fff935e7a6c code 2
After this patch:
pandafault[4732]: unhandled signal 11 at
10000718 nip
10000574 lr
7fff86697a6c code 2
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Murilo Opsfelder Araujo [Wed, 1 Aug 2018 21:33:16 +0000 (18:33 -0300)]
powerpc/traps: Use an explicit ratelimit state for show_signal_msg()
Replace printk_ratelimited() by printk() and a default rate limit
burst to limit displaying unhandled signals messages.
This will allow us to call print_vma_addr() in a future patch, which
does not work with printk_ratelimited().
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Murilo Opsfelder Araujo [Wed, 1 Aug 2018 21:33:15 +0000 (18:33 -0300)]
powerpc/traps: Print unhandled signals in a separate function
Isolate the logic of printing unhandled signals out of _exception_pkey().
No functional change, only code rearrangement.
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Tue, 31 Jul 2018 12:08:42 +0000 (22:08 +1000)]
selftests/powerpc: Add more version checks to alignment_handler test
The alignment_handler is documented to only work on Power8/Power9, but
we can make it run on older CPUs by guarding more of the tests with
feature checks.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Michael Ellerman [Tue, 31 Jul 2018 12:08:41 +0000 (22:08 +1000)]
selftests/powerpc: Skip earlier in alignment_handler test
Currently the alignment_handler test prints "Can't open /dev/fb0"
about 80 times per run, which is a little annoying.
Refactor it to check earlier if it can open /dev/fb0 and skip if not,
this results in each test printing something like:
test: test_alignment_handler_vsx_206
tags: git_version:
v4.18-rc3-134-gfb21a48904aa
[SKIP] Test skipped on line 291
skip: test_alignment_handler_vsx_206
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Michael Ellerman [Thu, 26 Jul 2018 12:42:44 +0000 (22:42 +1000)]
powerpc/64s: Make rfi_flush_fallback a little more robust
Because rfi_flush_fallback runs immediately before the return to
userspace it currently runs with the user r1 (stack pointer). This
means if we oops in there we will report a bad kernel stack pointer in
the exception entry path, eg:
Bad kernel stack pointer
7ffff7150e40 at
c0000000000023b4
Oops: Bad kernel stack pointer, sig: 6 [#1]
LE SMP NR_CPUS=32 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 1246 Comm: klogd Not tainted
4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3 #7
NIP:
c0000000000023b4 LR:
0000000010053e00 CTR:
0000000000000040
REGS:
c0000000fffe7d40 TRAP: 4100 Not tainted (
4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3)
MSR:
9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE> CR:
44000442 XER:
20000000
CFAR:
c00000000000bac8 IRQMASK:
c0000000f1e66a80
GPR00:
0000000002000000 00007ffff7150e40 00007fff93a99900 0000000000000020
...
NIP [
c0000000000023b4] rfi_flush_fallback+0x34/0x80
LR [
0000000010053e00] 0x10053e00
Although the NIP tells us where we were, and the TRAP number tells us
what happened, it would still be nicer if we could report the actual
exception rather than barfing about the stack pointer.
We an do that fairly simply by loading the kernel stack pointer on
entry and restoring the user value before returning. That way we see a
regular oops such as:
Unrecoverable exception 4100 at
c00000000000239c
Oops: Unrecoverable exception, sig: 6 [#1]
LE SMP NR_CPUS=32 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 1251 Comm: klogd Not tainted
4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty #40
NIP:
c00000000000239c LR:
0000000010053e00 CTR:
0000000000000040
REGS:
c0000000f1e17bb0 TRAP: 4100 Not tainted (
4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty)
MSR:
9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE> CR:
44000442 XER:
20000000
CFAR:
c00000000000bac8 IRQMASK: 0
...
NIP [
c00000000000239c] rfi_flush_fallback+0x3c/0x80
LR [
0000000010053e00] 0x10053e00
Call Trace:
[
c0000000f1e17e30] [
c00000000000b9e4] system_call+0x5c/0x70 (unreliable)
Note this shouldn't make the kernel stack pointer vulnerable to a
meltdown attack, because it should be flushed from the cache before we
return to userspace. The user r1 value will be in the cache, because
we load it in the return path, but that is harmless.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Michael Ellerman [Mon, 23 Jul 2018 15:07:56 +0000 (01:07 +1000)]
powerpc/powernv: Query firmware for count cache flush settings
Look for fw-features properties to determine the appropriate settings
for the count cache flush, and then call the generic powerpc code to
set it up based on the security feature flags.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 23 Jul 2018 15:07:55 +0000 (01:07 +1000)]
powerpc/pseries: Query hypervisor for count cache flush settings
Use the existing hypercall to determine the appropriate settings for
the count cache flush, and then call the generic powerpc code to set
it up based on the security feature flags.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 23 Jul 2018 15:07:54 +0000 (01:07 +1000)]
powerpc/64s: Add support for software count cache flush
Some CPU revisions support a mode where the count cache needs to be
flushed by software on context switch. Additionally some revisions may
have a hardware accelerated flush, in which case the software flush
sequence can be shortened.
If we detect the appropriate flag from firmware we patch a branch
into _switch() which takes us to a count cache flush sequence.
That sequence in turn may be patched to return early if we detect that
the CPU supports accelerating the flush sequence in hardware.
Add debugfs support for reporting the state of the flush, as well as
runtime disabling it.
And modify the spectre_v2 sysfs file to report the state of the
software flush.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 23 Jul 2018 15:07:53 +0000 (01:07 +1000)]
powerpc/64s: Add new security feature flags for count cache flush
Add security feature flags to indicate the need for software to flush
the count cache on context switch, and for the presence of a hardware
assisted count cache flush.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Mon, 23 Jul 2018 15:07:52 +0000 (01:07 +1000)]
powerpc/asm: Add a patch_site macro & helpers for patching instructions
Add a macro and some helper C functions for patching single asm
instructions.
The gas macro means we can do something like:
1: nop
patch_site 1b, patch__foo
Which is less visually distracting than defining a GLOBAL symbol at 1,
and also doesn't pollute the symbol table which can confuse eg. perf.
These are obviously similar to our existing feature sections, but are
not automatically patched based on CPU/MMU features, rather they are
designed to be manually patched by C code at some arbitrary point.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diana Craciun [Fri, 27 Jul 2018 23:06:39 +0000 (09:06 +1000)]
Documentation: Add nospectre_v1 parameter
Currently only supported on powerpc.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diana Craciun [Fri, 27 Jul 2018 23:06:38 +0000 (09:06 +1000)]
powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit platforms
Used barrier_nospec to sanitize the syscall table.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diana Craciun [Fri, 27 Jul 2018 23:06:37 +0000 (09:06 +1000)]
powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E
Implement the barrier_nospec as a isync;sync instruction sequence.
The implementation uses the infrastructure built for BOOK3S 64.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diana Craciun [Fri, 27 Jul 2018 23:06:36 +0000 (09:06 +1000)]
powerpc/64: Make meltdown reporting Book3S 64 specific
In a subsequent patch we will enable building security.c for Book3E.
However the NXP platforms are not vulnerable to Meltdown, so make the
Meltdown vulnerability reporting PPC_BOOK3S_64 specific.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Fri, 27 Jul 2018 23:06:35 +0000 (09:06 +1000)]
powerpc/64: Call setup_barrier_nospec() from setup_arch()
Currently we require platform code to call setup_barrier_nospec(). But
if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case
then we can call it in setup_arch().
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Fri, 27 Jul 2018 23:06:34 +0000 (09:06 +1000)]
powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC
Add a config symbol to encode which platforms support the
barrier_nospec speculation barrier. Currently this is just Book3S 64
but we will add Book3E in a future patch.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diana Craciun [Fri, 27 Jul 2018 23:06:33 +0000 (09:06 +1000)]
powerpc/64: Make stf barrier PPC_BOOK3S_64 specific.
NXP Book3E platforms are not vulnerable to speculative store
bypass, so make the mitigations PPC_BOOK3S_64 specific.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diana Craciun [Fri, 27 Jul 2018 23:06:32 +0000 (09:06 +1000)]
powerpc/64: Disable the speculation barrier from the command line
The speculation barrier can be disabled from the command line
with the parameter: "nospectre_v1".
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:17 +0000 (23:07 +1000)]
powerpc/64s: Don't use __MASKABLE_EXCEPTION unnecessarily
We only need to use __MASKABLE_EXCEPTION in one of the four cases for
hardware interrupt, so use the helper macros in the other cases.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:16 +0000 (23:07 +1000)]
powerpc/64s: Drop unused loc parameter to MASKABLE_EXCEPTION macros
We pass the "loc" (location) parameter to MASKABLE_EXCEPTION and
friends, but it's not used, so drop it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:15 +0000 (23:07 +1000)]
powerpc/64s: Remove PSERIES naming from the MASKABLE macros
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:14 +0000 (23:07 +1000)]
powerpc/64s: Drop _MASKABLE_RELON_EXCEPTION_PSERIES()
_MASKABLE_RELON_EXCEPTION_PSERIES() does nothing useful, update all
callers to use __MASKABLE_RELON_EXCEPTION_PSERIES() directly.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:13 +0000 (23:07 +1000)]
powerpc/64s: Drop _MASKABLE_EXCEPTION_PSERIES()
_MASKABLE_EXCEPTION_PSERIES() does nothing useful, update all callers
to use __MASKABLE_EXCEPTION_PSERIES() directly.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:12 +0000 (23:07 +1000)]
powerpc/64s: Rename EXCEPTION_PROLOG_PSERIES to EXCEPTION_PROLOG
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:11 +0000 (23:07 +1000)]
powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIES
To just EXCEPTION_RELON_PROLOG().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:10 +0000 (23:07 +1000)]
powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIES_1
The EXCEPTION_RELON_PROLOG_PSERIES_1() macro does the same job as
EXCEPTION_PROLOG_2 (which we just recently created), except for
"RELON" (relocation on) exceptions.
So rename it as such.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:09 +0000 (23:07 +1000)]
powerpc/64s: Remove PSERIES from the NORI macros
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:08 +0000 (23:07 +1000)]
powerpc/64s: Rename EXCEPTION_PROLOG_PSERIES_1 to EXCEPTION_PROLOG_2
As with the other patches in this series, we are removing the
"PSERIES" from the name as it's no longer meaningful.
In this case it's not simply a case of removing the "PSERIES" as that
would result in a clash with the existing EXCEPTION_PROLOG_1.
Instead we name this one EXCEPTION_PROLOG_2, as it's usually used in
sequence after 0 and 1.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:07 +0000 (23:07 +1000)]
powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES_OOL to STD_RELON_EXCEPTION_OOL
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:06 +0000 (23:07 +1000)]
powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES to STD_RELON_EXCEPTION
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:05 +0000 (23:07 +1000)]
powerpc/64s: Rename STD_EXCEPTION_PSERIES_OOL to STD_EXCEPTION_OOL
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:04 +0000 (23:07 +1000)]
powerpc/64s: Rename STD_EXCEPTION_PSERIES to STD_EXCEPTION
The "PSERIES" in STD_EXCEPTION_PSERIES is to differentiate the macros
from the legacy iSeries versions, which are called
STD_EXCEPTION_ISERIES. It is not anything to do with pseries vs
powernv or powermac etc.
We removed the legacy iSeries code in 2012, in commit 8ee3e0d69623x
("powerpc: Remove the main legacy iSerie platform code").
So remove "PSERIES" from the macros.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:03 +0000 (23:07 +1000)]
powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_RELON_PROLOG_PSERIES()
EXCEPTION_RELON_PROLOG_PSERIES() only has two users,
STD_RELON_EXCEPTION_PSERIES() and STD_RELON_EXCEPTION_HV() both of
which "call" SET_SCRATCH0(), so just move SET_SCRATCH0() into
EXCEPTION_RELON_PROLOG_PSERIES().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 13:07:02 +0000 (23:07 +1000)]
powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES()
EXCEPTION_PROLOG_PSERIES() only has two users, STD_EXCEPTION_PSERIES()
and STD_EXCEPTION_HV() both of which "call" SET_SCRATCH0(), so just
move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Darren Stevens [Wed, 25 Jul 2018 20:55:18 +0000 (21:55 +0100)]
powerpc/pasemi: Search for PCI root bus by compatible property
Pasemi arch code finds the root of the PCI-e bus by searching the
device-tree for a node called 'pxp'. But the root bus has a compatible
property of 'pasemi,rootbus' so search for that instead.
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Wed, 1 Aug 2018 09:01:16 +0000 (09:01 +0000)]
selftests/powerpc: Update strlen() test to test the new assembly function for PPC32
This patch adds a test for testing the new assembly strlen() for PPC32
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Fix 64-bit build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Wed, 1 Aug 2018 09:01:14 +0000 (09:01 +0000)]
powerpc/lib: Implement strlen() in assembly for PPC32
The generic implementation of strlen() reads strings byte per byte.
This patch implements strlen() in assembly based on a read of entire
words, in the same spirit as what some other arches and glibc do.
On a 8xx the time spent in strlen is reduced by 3/4 for long strings.
strlen() selftest on an 8xx provides the following values:
Before the patch (ie with the generic strlen() in lib/string.c):
len 256 : time = 1.195055
len 016 : time = 0.083745
len 008 : time = 0.046828
len 004 : time = 0.028390
After the patch:
len 256 : time = 0.272185 ==> 78% improvment
len 016 : time = 0.040632 ==> 51% improvment
len 008 : time = 0.033060 ==> 29% improvment
len 004 : time = 0.029149 ==> 2% degradation
On a 832x:
Before the patch:
len 256 : time = 0.236125
len 016 : time = 0.018136
len 008 : time = 0.011000
len 004 : time = 0.007229
After the patch:
len 256 : time = 0.094950 ==> 60% improvment
len 016 : time = 0.013357 ==> 26% improvment
len 008 : time = 0.010586 ==> 4% improvment
len 004 : time = 0.008784
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Wed, 1 Aug 2018 09:01:12 +0000 (09:01 +0000)]
selftests/powerpc: Add test for strlen()
This patch adds a test for strlen()
string.c contains a copy of strlen() from lib/string.c
The test first tests the correctness of strlen() by comparing
the result with libc strlen(). It tests all cases of alignment.
It them tests the duration of an aligned strlen() on a 4 bytes string,
on a 16 bytes string and on a 256 bytes string.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Drop change log from copy of string.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Wed, 1 Aug 2018 09:01:10 +0000 (09:01 +0000)]
selftests/powerpc: Add test for 32 bits memcmp
This patch renames memcmp test to memcmp_64 and adds a memcmp_32 test
for testing the 32 bits version of memcmp()
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Fix 64-bit build by adding build_32bit test]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Mahesh Salgaonkar [Wed, 4 Jul 2018 17:57:21 +0000 (23:27 +0530)]
powerpc/pseries: Defer the logging of rtas error to irq work queue.
rtas_log_buf is a buffer to hold RTAS event data that are communicated
to kernel by hypervisor. This buffer is then used to pass RTAS event
data to user through proc fs. This buffer is allocated from
vmalloc (non-linear mapping) area.
On Machine check interrupt, register r3 points to RTAS extended event
log passed by hypervisor that contains the MCE event. The pseries
machine check handler then logs this error into rtas_log_buf. The
rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
page fault (vector 0x300) while accessing it. Since machine check
interrupt handler runs in NMI context we can not afford to take any
page fault. Page faults are not honored in NMI context and causes
kernel panic. Apart from that, as Nick pointed out,
pSeries_log_error() also takes a spin_lock while logging error which
is not safe in NMI context. It may endup in deadlock if we get another
MCE before releasing the lock. Fix this by deferring the logging of
rtas error to irq work queue.
Current implementation uses two different buffers to hold rtas error
log depending on whether extended log is provided or not. This makes
bit difficult to identify which buffer has valid data that needs to
logged later in irq work. Simplify this using single buffer, one per
paca, and copy rtas log to it irrespective of whether extended log is
provided or not. Allocate this buffer below RMA region so that it can
be accessed in real mode mce handler.
Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Cc: stable@vger.kernel.org # v4.14+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Mahesh Salgaonkar [Wed, 4 Jul 2018 17:57:02 +0000 (23:27 +0530)]
powerpc/pseries: Avoid using the size greater than RTAS_ERROR_LOG_MAX.
The global mce data buffer that used to copy rtas error log is of 2048
(RTAS_ERROR_LOG_MAX) bytes in size. Before the copy we read
extended_log_length from rtas error log header, then use max of
extended_log_length and RTAS_ERROR_LOG_MAX as a size of data to be copied.
Ideally the platform (phyp) will never send extended error log with
size > 2048. But if that happens, then we have a risk of buffer overrun
and corruption. Fix this by using min_t instead.
Fixes: d368514c3097 ("powerpc: Fix corruption when grabbing FWNMI data")
Reported-by: Michal Suchanek <msuchanek@suse.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 11 Apr 2018 05:18:01 +0000 (15:18 +1000)]
powerpc/xive: Remove xive_kexec_teardown_cpu()
It's identical to xive_teardown_cpu() so just use the latter
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Benjamin Herrenschmidt [Wed, 11 Apr 2018 05:18:00 +0000 (15:18 +1000)]
powerpc/xive: Remove now useless pr_debug statements
Those overly verbose statement in the setup of the pool VP
aren't particularly useful (esp. considering we don't actually
use the pool, we configure it bcs HW requires it only). So
remove them which improves the code readability.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 25 Jul 2018 09:54:28 +0000 (19:54 +1000)]
powerpc/64s: free page table caches at exit_mmap time
The kernel page table caches are tied to init_mm, so there is no
more need for them after userspace is finished.
destroy_context() gets called when we drop the last reference for an
mm, which can be much later than the task exit due to other lazy mm
references to it. We can free the page table cache pages on task exit
because they only cache the userspace page tables and kernel threads
should not access user space addresses.
The mapping for kernel threads itself is maintained in init_mm and
page table cache for that is attached to init_mm.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Merge change log additions from Aneesh]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Wed, 25 Jul 2018 13:58:06 +0000 (23:58 +1000)]
powerpc/64s/radix: tlb do not flush on page size when fullmm
When the mm is being torn down there will be a full PID flush so
there is no need to flush the TLB on page size changes.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 12:24:59 +0000 (22:24 +1000)]
selftests/powerpc: Give some tests longer to run
Some of these long running tests can time out on heavily loaded
systems, give them longer to run.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 12:24:58 +0000 (22:24 +1000)]
selftests/powerpc: Only run some tests on ppc64le
These tests are currently failing on (some) big endian systems. Until
we can fix that, skip them unless we're on ppc64le.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 26 Jul 2018 12:24:57 +0000 (22:24 +1000)]
selftests/powerpc: Add a helper for checking if we're on ppc64le
Some of our selftests have only been tested on ppc64le and crash or
behave weirdly on ppc64/ppc32. So add a helper for checking the UTS
machine.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Tue, 24 Jul 2018 14:03:46 +0000 (00:03 +1000)]
powerpc: Add a checkpatch wrapper with our preferred settings
This makes it easy to run checkpatch with settings that I like.
Usage is eg:
$ ./arch/powerpc/tools/checkpatch.sh -g origin/master..
To check all commits since origin/master.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Michael Ellerman [Mon, 6 Aug 2018 10:30:49 +0000 (20:30 +1000)]
powerpc/64: Disable irq restore warning for now
We recently added a warning in arch_local_irq_restore() to check that
the soft masking state matches reality.
Unfortunately it trips in a few places, which are not entirely trivial
to fix. The key problem is if we're doing function_graph tracing of
restore_math(), the warning pops and then seems to recurse. It's not
entirely clear because the system continuously oopses on all CPUs,
with the output interleaved and unreadable.
It's also been observed on a G5 coming out of idle.
Until we can fix those cases disable the warning for now.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reza Arbab [Fri, 3 Aug 2018 04:03:36 +0000 (23:03 -0500)]
powerpc/powernv: Fix concurrency issue with npu->mmio_atsd_usage
We've encountered a performance issue when multiple processors stress
{get,put}_mmio_atsd_reg(). These functions contend for
mmio_atsd_usage, an unsigned long used as a bitmask.
The accesses to mmio_atsd_usage are done using test_and_set_bit_lock()
and clear_bit_unlock(). As implemented, both of these will require
a (successful) stwcx to that same cache line.
What we end up with is thread A, attempting to unlock, being slowed by
other threads repeatedly attempting to lock. A's stwcx instructions
fail and retry because the memory reservation is lost every time a
different thread beats it to the punch.
There may be a long-term way to fix this at a larger scale, but for
now resolve the immediate problem by gating our call to
test_and_set_bit_lock() with one to test_bit(), which is obviously
implemented without using a store.
Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christoph Hellwig [Mon, 30 Jul 2018 07:37:21 +0000 (09:37 +0200)]
powerpc: Do not redefine NEED_DMA_MAP_STATE
kernel/dma/Kconfig already defines NEED_DMA_MAP_STATE, just select it
from CONFIG_PPC using the same condition as an if guard.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[mpe: Move it under PPC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Guenter Roeck [Tue, 31 Jul 2018 01:44:14 +0000 (18:44 -0700)]
powerpc/4xx: Fix error return path in ppc4xx_msi_probe()
An arbitrary error in ppc4xx_msi_probe() quite likely results in a
crash similar to the following, seen after dma_alloc_coherent()
returned an error.
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc001bff0
Oops: Kernel access of bad area, sig: 11 [#1]
BE Canyonlands
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Tainted: G W
4.18.0-rc6-00010-gff33d1030a6c #1
NIP:
c001bff0 LR:
c001c418 CTR:
c01faa7c
REGS:
cf82db40 TRAP: 0300 Tainted: G W
(
4.18.0-rc6-00010-gff33d1030a6c)
MSR:
00029000 <CE,EE,ME> CR:
28002024 XER:
00000000
DEAR:
00000000 ESR:
00000000
GPR00:
c001c418 cf82dbf0 cf828000 cf8de400 00000000 00000000 000000c4 000000c4
GPR08:
c0481ea4 00000000 00000000 000000c4 22002024 00000000 c00025e8 00000000
GPR16:
00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a
GPR24:
00029000 0000000c 00000000 cf8de410 c0494d60 c0494d60 cf8bebc0 00000001
NIP [
c001bff0] ppc4xx_of_msi_remove+0x48/0xa0
LR [
c001c418] ppc4xx_msi_probe+0x294/0x3b8
Call Trace:
[
cf82dbf0] [
00029000] 0x29000 (unreliable)
[
cf82dc10] [
c001c418] ppc4xx_msi_probe+0x294/0x3b8
[
cf82dc70] [
c0209fbc] platform_drv_probe+0x40/0x9c
[
cf82dc90] [
c0208240] driver_probe_device+0x2a8/0x350
[
cf82dcc0] [
c0206204] bus_for_each_drv+0x60/0xac
[
cf82dcf0] [
c0207e88] __device_attach+0xe8/0x160
[
cf82dd20] [
c02071e0] bus_probe_device+0xa0/0xbc
[
cf82dd40] [
c02050c8] device_add+0x404/0x5c4
[
cf82dd90] [
c0288978] of_platform_device_create_pdata+0x88/0xd8
[
cf82ddb0] [
c0288b70] of_platform_bus_create+0x134/0x220
[
cf82de10] [
c0288bcc] of_platform_bus_create+0x190/0x220
[
cf82de70] [
c0288cf4] of_platform_bus_probe+0x98/0xec
[
cf82de90] [
c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54
[
cf82dea0] [
c0002404] do_one_initcall+0x40/0x188
[
cf82df00] [
c043daec] kernel_init_freeable+0x130/0x1d0
[
cf82df30] [
c0002600] kernel_init+0x18/0x104
[
cf82df40] [
c000c23c] ret_from_kernel_thread+0x14/0x1c
Instruction dump:
90010024 813d0024 2f890000 83c30058 41bd0014 48000038 813d0024 7f89f800
409d002c 813e000c 57ea103a 3bff0001 <
7c69502e>
2f830000 419effe0 4803b26d
---[ end trace
8cf551077ecfc42a ]---
Fix it up. Specifically,
- Return valid error codes from ppc4xx_setup_pcieh_hw(), have it clean
up after itself, and only access hardware after all possible error
conditions have been handled.
- Use devm_kzalloc() instead of kzalloc() in ppc4xx_msi_probe()
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nicholas Piggin [Thu, 2 Aug 2018 15:39:51 +0000 (01:39 +1000)]
powernv/cpuidle: Fix idle states all being marked invalid
Commit
9c7b185ab2fe ("powernv/cpuidle: Parse dt idle properties into
global structure") parses dt idle states into structs, but never marks
them valid. This results in all idle states being lost.
Fixes: 9c7b185ab2fe ("powernv/cpuidle: Parse dt idle properties into global structure")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Sam Bobroff [Mon, 30 Jul 2018 01:59:14 +0000 (11:59 +1000)]
powerpc/pseries: fix EEH recovery of some IOV devices
EEH recovery currently fails on pSeries for some IOV capable PCI
devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
certain device tree properties for the device. (Found on an IOV
capable device using the ipr driver.)
Recovery fails in pci_enable_resources() at the check on r->parent,
because r->flags is set and r->parent is not. This state is due to
sriov_init() setting the start, end and flags members of the IOV BARs
but the parent not being set later in
pseries_pci_fixup_iov_resources(), because the
"ibm,open-sriov-vf-bar-info" property is missing.
Correct this by zeroing the resource flags for IOV BARs when they
can't be configured (this is the same method used by sriov_init() and
__pci_read_base()).
VFs cleared this way can't be enabled later, because that requires
another device tree property, "ibm,number-of-configurable-vfs" as well
as support for the RTAS function "ibm_map_pes". These are all part of
hypervisor support for IOV and it seems unlikely that a hypervisor
would ever partially, but not fully, support it. (None are currently
provided by QEMU/KVM.)
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Bryant G. Ly <bryantly@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Shilpasri G Bhat [Tue, 24 Jul 2018 09:13:09 +0000 (14:43 +0530)]
hwmon: (ibmpowernv) Add attributes to enable/disable sensor groups
OPAL firmware provides the facility for some groups of sensors to be
enabled/disabled at runtime to give the user the option of using the
system resources for collecting these sensors or not.
For example, on POWER9 systems, the On Chip Controller (OCC) gathers
various system and chip level sensors and maintains their values in
main memory.
This patch provides support for enabling/disabling the sensor groups
like power, temperature, current and voltage.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[stewart@linux.vnet.ibm.com: Commit message]
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Shilpasri G Bhat [Tue, 24 Jul 2018 09:13:08 +0000 (14:43 +0530)]
powerpc/powernv: Add support to enable sensor groups
Adds support to enable/disable a sensor group at runtime. This
can be used to select the sensor groups that needs to be copied to
main memory by OCC. Sensor groups like power, temperature, current,
voltage, frequency, utilization can be enabled/disabled at runtime.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Akshay Adiga [Thu, 5 Jul 2018 11:40:22 +0000 (17:10 +0530)]
powernv/cpuidle: Use parsed device tree values for cpuidle_init
Export pnv_idle_states and nr_pnv_idle_states so that its accessible to
cpuidle driver. Use properties from pnv_idle_states structure for powernv
cpuidle_init.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Akshay Adiga [Thu, 5 Jul 2018 11:40:21 +0000 (17:10 +0530)]
powernv/cpuidle: Parse dt idle properties into global structure
Device-tree parsing happens twice, once while deciding idle state to be
used for hotplug and once during cpuidle init. Hence, parsing the device
tree and caching it will reduce code duplication. Parsing code has been
moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
to the properties in the device tree the number of available states is
also required.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Mon, 2 Jul 2018 08:21:19 +0000 (04:21 -0400)]
macintosh/via-pmu: Disambiguate interrupt statistics
Some of the event counters are overloaded which makes it very
difficult to interpret their values.
Counter 0 is supposed to report CB1 interrupts but it can also count
PMU_INT_WAITING_CHARGER events.
Counter 1 is supposed to report GPIO interrupts but it can also count
other events (depending upon the value of the PMU_INT_ADB bit).
Disambiguate these statistics with dedicated counters for GPIO and
CB1 interrupts.
Comments in the MkLinux source code say that the type 0 and type 1
interrupts are model-specific. Label them as "unknown".
This change to the contents of /proc/pmu/interrupts is by necessity
visible in userland. However, packages which interact with the PMU
(that is, pbbuttonsd, pmac-utils and pmud) don't open this file.
AFAIK, user software has no need to poll these counters.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Mon, 2 Jul 2018 08:21:19 +0000 (04:21 -0400)]
macintosh/via-pmu: Clean up interrupt statistics
Replace an open-coded ffs() with the function call.
Simplify an if-else cascade using a switch statement.
Correct a typo and an indentation issue.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Mon, 2 Jul 2018 08:21:19 +0000 (04:21 -0400)]
macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver
Now that the PowerMac via-pmu driver supports m68k PowerBooks,
switch over to that driver and remove the via-pmu68k driver.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Mon, 2 Jul 2018 08:21:19 +0000 (04:21 -0400)]
macintosh/via-pmu68k: Don't load driver on unsupported hardware
Don't load the via-pmu68k driver on early PowerBooks. The M50753 PMU
device found in those models was never supported by this driver.
Attempting to load the driver usually causes a boot hang.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Michael Schmitz <schmitzmic@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Mon, 2 Jul 2018 08:21:19 +0000 (04:21 -0400)]
macintosh/via-pmu: Explicitly specify CONFIG_PPC_PMAC dependencies
At present, CONFIG_ADB_PMU depends on CONFIG_PPC_PMAC. When this gets
relaxed to CONFIG_PPC_PMAC || CONFIG_MAC, those Kconfig symbols with
implicit deps on PPC_PMAC will need explicit deps. Add them now.
No functional change.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Mon, 2 Jul 2018 08:21:19 +0000 (04:21 -0400)]
macintosh/via-pmu: Add support for m68k PowerBooks
Put #ifdefs around the Open Firmware, xmon, interrupt dispatch,
battery and suspend code. Add the necessary interrupt handling to
support m68k PowerBooks.
The pmu_kind value is available to userspace using the
PMU_IOC_GET_MODEL ioctl. It is not clear yet what hardware classes
are be needed to describe m68k PowerBook models, so pmu_kind is given
the provisional value PMU_UNKNOWN.
To find out about the hardware, user programs can use /proc/bootinfo
or /proc/hardware, or send the PMU_GET_VERSION command using /dev/adb.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Mon, 2 Jul 2018 08:21:18 +0000 (04:21 -0400)]
macintosh/via-pmu: Replace via pointer with via1 and via2 pointers
On most PowerPC Macs, the PMU driver uses the shift register and
IO port B from a single VIA chip.
On 68k and early PowerPC PowerBooks, the driver uses the shift register
from one VIA chip together with IO port B from another.
Replace via with via1 and via2 to accommodate this. For the
CONFIG_PPC_PMAC case, set via1 = via2 so there is no change.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Mon, 2 Jul 2018 08:21:18 +0000 (04:21 -0400)]
macintosh/via-pmu: Enhance state machine with new 'uninitialized' state
On 68k Macs, the via/vias pointer can't be used to determine whether
the PMU driver has been initialized. For portability, add a new state
to indicate that via_find_pmu() succeeded.
After via_find_pmu() executes, testing vias == NULL is equivalent to
testing via == NULL. Replace these tests with pmu_state == uninitialized
which is simpler and more consistent. No functional change.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Mon, 2 Jul 2018 08:21:18 +0000 (04:21 -0400)]
macintosh/via-pmu: Don't clear shift register interrupt flag twice
The shift register interrupt flag gets cleared in via_pmu_interrupt()
and once again in pmu_sr_intr(). Fix this theoretical race condition.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Mon, 2 Jul 2018 08:21:18 +0000 (04:21 -0400)]
macintosh/via-pmu: Add missing mmio accessors
Add missing in_8() accessors to init_pmu() and pmu_sr_intr().
This fixes several sparse warnings:
drivers/macintosh/via-pmu.c:536:29: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:537:33: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:1455:17: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:1456:69: warning: dereference of noderef expression
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Finn Thain [Mon, 2 Jul 2018 08:21:18 +0000 (04:21 -0400)]
macintosh/via-pmu: Fix section mismatch warning
The pmu_init() function has the __init qualifier, but the ops struct
that holds a pointer to it does not. This causes a build warning.
The driver works fine because the pointer is only dereferenced early.
The function is so small that there's negligible benefit from using
the __init qualifier. Remove it to fix the warning, consistent with
the other ADB drivers.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Alexey Spirkov [Thu, 26 Jul 2018 12:52:50 +0000 (12:52 +0000)]
powerpc/44x: Mark mmu_init_secondary() as __init
mmu_init_secondary() calls ppc44x_pin_tlb() which is marked __init,
leading to a warning:
The function mmu_init_secondary() references
the function __init ppc44x_pin_tlb().
There's no CPU hotplug support on 44x so mmu_init_secondary() will
only be called at boot. Therefore we should mark it as __init.
Signed-off-by: Alexey Spirkov <alexeis@astrosoft.ru>
[mpe: Flesh out change log details]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael Ellerman [Thu, 19 Jul 2018 14:33:16 +0000 (00:33 +1000)]
powerpc/mm: Don't report PUDs as memory leaks when using kmemleak
Paul Menzel reported that kmemleak was producing reports such as:
unreferenced object 0xc0000000f8b80000 (size 16384):
comm "init", pid 1, jiffies
4294937416 (age 312.240s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
00000000d997deb7>] __pud_alloc+0x80/0x190
[<
0000000087f2e8a3>] move_page_tables+0xbac/0xdc0
[<
00000000091e51c2>] shift_arg_pages+0xc0/0x210
[<
00000000ab88670c>] setup_arg_pages+0x22c/0x2a0
[<
0000000060871529>] load_elf_binary+0x41c/0x1648
[<
00000000ecd9d2d4>] search_binary_handler.part.11+0xbc/0x280
[<
0000000034e0cdd7>] __do_execve_file.isra.13+0x73c/0x940
[<
000000005f953a6e>] sys_execve+0x58/0x70
[<
000000009700a858>] system_call+0x5c/0x70
Indicating that a PUD was being leaked.
However what's really happening is that kmemleak is not able to
recognise the references from the PGD to the PUD, because they are not
fully qualified pointers.
We can confirm that in xmon, eg:
Find the task struct for pid 1 "init":
0:mon> P
task_struct ->thread.ksp PID PPID S P CMD
c0000001fe7c0000 c0000001fe803960 1 0 S 13 systemd
Dump virtual address 0 to find the PGD:
0:mon> dv 0
c0000001fe7c0000
pgd @ 0xc0000000f8b01000
Dump the memory of the PGD:
0:mon> d
c0000000f8b01000
c0000000f8b01000 00000000f8b90000 0000000000000000 |................|
c0000000f8b01010 0000000000000000 0000000000000000 |................|
c0000000f8b01020 0000000000000000 0000000000000000 |................|
c0000000f8b01030 0000000000000000 00000000f8b80000 |................|
^^^^^^^^^^^^^^^^
There we can see the reference to our supposedly leaked PUD. But
because it's missing the leading 0xc, kmemleak won't recognise it.
We can confirm it's still in use by translating an address that is
mapped via it:
0:mon> dv
7fff94000000 c0000001fe7c0000
pgd @ 0xc0000000f8b01000
pgdp @ 0xc0000000f8b01038 = 0x00000000f8b80000 <--
pudp @ 0xc0000000f8b81ff8 = 0x00000000037c4000
pmdp @ 0xc0000000037c5ca0 = 0x00000000fbd89000
ptep @ 0xc0000000fbd89000 = 0xc0800001d5ce0386
Maps physical address = 0x00000001d5ce0000
Flags = Accessed Dirty Read Write
The fix is fairly simple. We need to tell kmemleak to ignore PUD
allocations and never report them as leaks. We can also tell it not to
scan the PGD, because it will never find pointers in there. However it
will still notice if we allocate a PGD and then leak it.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Thu, 5 Jul 2018 16:25:21 +0000 (16:25 +0000)]
powerpc: split asm/tlbflush.h
Split asm/tlbflush.h into:
asm/nohash/tlbflush.h
asm/book3s/32/tlbflush.h
asm/book3s/64/tlbflush.h (already existing)
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Christophe Leroy [Thu, 5 Jul 2018 16:25:19 +0000 (16:25 +0000)]
powerpc: remove unnecessary inclusion of asm/tlbflush.h
asm/tlbflush.h is only needed for:
- using functions xxx_flush_tlb_xxx()
- using MMU_NO_CONTEXT
- including asm-generic/pgtable.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>