openwrt/staging/blogic.git
6 years agopowerpc/numa: Ensure nodes initialized for hotplug
Michael Bringmann [Tue, 28 Nov 2017 22:58:40 +0000 (16:58 -0600)]
powerpc/numa: Ensure nodes initialized for hotplug

This patch fixes some problems encountered at runtime with
configurations that support memory-less nodes, or that hot-add CPUs
into nodes that are memoryless during system execution after boot. The
problems of interest include:

* Nodes known to powerpc to be memoryless at boot, but to have CPUs in
  them are allowed to be 'possible' and 'online'. Memory allocations
  for those nodes are taken from another node that does have memory
  until and if memory is hot-added to the node.

* Nodes which have no resources assigned at boot, but which may still
  be referenced subsequently by affinity or associativity attributes,
  are kept in the list of 'possible' nodes for powerpc. Hot-add of
  memory or CPUs to the system can reference these nodes and bring
  them online instead of redirecting the references to one of the set
  of nodes known to have memory at boot.

Note that this software operates under the context of CPU hotplug. We
are not doing memory hotplug in this code, but rather updating the
kernel's CPU topology (i.e. arch_update_cpu_topology /
numa_update_cpu_topology). We are initializing a node that may be used
by CPUs or memory before it can be referenced as invalid by a CPU
hotplug operation. CPU hotplug operations are protected by a range of
APIs including cpu_maps_update_begin/cpu_maps_update_done,
cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
Memory hotplug operations, including try_online_node, are protected by
mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the
case of CPUs being hot-added to a previously memoryless node, the
try_online_node operation occurs wholly within the CPU locks with no
overlap. Using HMC hot-add/hot-remove operations, we have been able to
add and remove CPUs to any possible node without failures. HMC
operations involve a degree self-serialization, though.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
Michael Bringmann [Tue, 28 Nov 2017 22:58:36 +0000 (16:58 -0600)]
powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes

On powerpc systems which allow 'hot-add' of CPU or memory resources,
it may occur that the new resources are to be inserted into nodes that
were not used for these resources at bootup. In the kernel, any node
that is used must be defined and initialized. These empty nodes may
occur when,

* Dedicated vs. shared resources. Shared resources require information
  such as the VPHN hcall for CPU assignment to nodes. Associativity
  decisions made based on dedicated resource rules, such as
  associativity properties in the device tree, may vary from decisions
  made using the values returned by the VPHN hcall.

* memoryless nodes at boot. Nodes need to be defined as 'possible' at
  boot for operation with other code modules. Previously, the powerpc
  code would limit the set of possible nodes to those which have
  memory assigned at boot, and were thus online. Subsequent add/remove
  of CPUs or memory would only work with this subset of possible
  nodes.

* memoryless nodes with CPUs at boot. Due to the previous restriction
  on nodes, nodes that had CPUs but no memory were being collapsed
  into other nodes that did have memory at boot. In practice this
  meant that the node assignment presented by the runtime kernel
  differed from the affinity and associativity attributes presented by
  the device tree or VPHN hcalls. Nodes that might be known to the
  pHyp were not 'possible' in the runtime kernel because they did not
  have memory at boot.

This patch ensures that sufficient nodes are defined to support
configuration requirements after boot, as well as at boot. This patch
set fixes a couple of problems.

* Nodes known to powerpc to be memoryless at boot, but to have CPUs in
  them are allowed to be 'possible' and 'online'. Memory allocations
  for those nodes are taken from another node that does have memory
  until and if memory is hot-added to the node. * Nodes which have no
  resources assigned at boot, but which may still be referenced
  subsequently by affinity or associativity attributes, are kept in
  the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs
  to the system can reference these nodes and bring them online
  instead of redirecting to one of the set of nodes that were known to
  have memory at boot.

This patch extracts the value of the lowest domain level (number of
allocable resources) from the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system. This new setting will
override the instruction:

    nodes_and(node_possible_map, node_possible_map, node_online_map);

presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

If the "ibm,max-associativity-domains" property is not present at
boot, no operation will be performed to define or enable additional
nodes, or enable the above 'nodes_and()'.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/kernel: Block interrupts when updating TIDR
Sukadev Bhattiprolu [Tue, 28 Nov 2017 19:39:43 +0000 (13:39 -0600)]
powerpc/kernel: Block interrupts when updating TIDR

clear_thread_tidr() is called in interrupt context as a part of delayed
put of the task structure (i.e as a part of timer interrupt). To prevent
a deadlock, block interrupts when holding vas_thread_id_lock to set/
clear TIDR for a task.

Fixes: ec233ede4c86 ("powerpc: Add support for setting SPRN_TIDR")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn
Alexey Kardashevskiy [Mon, 23 Oct 2017 08:07:02 +0000 (19:07 +1100)]
powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn

The pcidev value stored in pci_dn is only used for NPU/NPU2
initialization. We can easily drop the cached pointer and
use an ancient helper - pci_get_domain_bus_and_slot() instead in order
to reduce complexity.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/mm/nohash: do not flush the entire mm when range is a single page
Christophe Leroy [Tue, 23 Jan 2018 13:22:50 +0000 (14:22 +0100)]
powerpc/mm/nohash: do not flush the entire mm when range is a single page

Most of the time, flush_tlb_range() is called on single pages.
At the time being, flush_tlb_range() inconditionnaly calls
flush_tlb_mm() which flushes at least the entire PID pages and on
older CPUs like 4xx or 8xx it flushes the entire TLB table.

This patch calls flush_tlb_page() instead of flush_tlb_mm() when
the range is a single page.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/pseries: Add Initialization of VF Bars
Bryant G. Ly [Fri, 5 Jan 2018 16:45:52 +0000 (10:45 -0600)]
powerpc/pseries: Add Initialization of VF Bars

When enabling SR-IOV in pseries platform, the VF bar properties for a
PF are reported on the device node in the device tree.

This patch adds the IOV Bar resources to Linux structures from the
device tree for later use when configuring SR-IOV by PF driver.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV
Bryant G. Ly [Fri, 5 Jan 2018 16:45:51 +0000 (10:45 -0600)]
powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV

After initial validation of SR-IOV resources, firmware will associate
PEs to the dynamic VFs created within this call. This patch adds the
association of PEs to the PF array of PE numbers indexed by VF.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/eeh: Add EEH notify resume sysfs
Bryant G. Ly [Fri, 5 Jan 2018 16:45:50 +0000 (10:45 -0600)]
powerpc/eeh: Add EEH notify resume sysfs

Introduce a method for notify resume to be called from sysfs. In this
patch one can now call notify resume from sysfs when is supported by
platform.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
[mpe: Add NULL check, add empty versions to avoid #ifdefs]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/eeh: Add EEH operations to notify resume
Bryant G. Ly [Fri, 5 Jan 2018 16:45:49 +0000 (10:45 -0600)]
powerpc/eeh: Add EEH operations to notify resume

When pseries SR-IOV is enabled and after a PF driver has resumed from
EEH, platform has to be notified of the event so the child VFs can be
allowed to resume their normal recovery path.

This patch makes the EEH operation allow unfreeze platform dependent
code and adds the call to pseries EEH code.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/pseries: Set eeh_pe of EEH_PE_VF type
Bryant G. Ly [Fri, 5 Jan 2018 16:45:48 +0000 (10:45 -0600)]
powerpc/pseries: Set eeh_pe of EEH_PE_VF type

To correctly use EEH code one has to make sure that the EEH_PE_VF is
set for dynamic created VFs. Therefore this patch allocates an eeh_pe
of eeh type EEH_PE_VF and associates PE with parent.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoPCI/AER: Add uevents in AER and EEH error/resume
Bryant G. Ly [Fri, 5 Jan 2018 16:45:47 +0000 (10:45 -0600)]
PCI/AER: Add uevents in AER and EEH error/resume

Devices can go offline when erors reported. This patch adds a change
to the kernel object and lets udev know of error. When device resumes,
a change is also set reporting device as online. Therefore, EEH and
AER events are better propagated to user space for PCI devices in all
arches.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/eeh: Update VF config space after EEH
Bryant G. Ly [Fri, 5 Jan 2018 16:45:46 +0000 (10:45 -0600)]
powerpc/eeh: Update VF config space after EEH

Add EEH platform operations for pseries to update VF config space.
With this change after EEH, the VF will have updated config space for
pseries platform.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoocxl: add MAINTAINERS entry
Frederic Barrat [Tue, 23 Jan 2018 11:31:48 +0000 (12:31 +0100)]
ocxl: add MAINTAINERS entry

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoocxl: Documentation
Frederic Barrat [Tue, 23 Jan 2018 11:31:47 +0000 (12:31 +0100)]
ocxl: Documentation

ocxl.rst gives a quick, high-level view of opencapi.

Update ioctl-number.txt to reflect ioctl numbers being used by the
ocxl driver

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
[mpe: Fix up mixed whitespace as spotted by gregkh]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agocxl: Remove support for "Processing accelerators" class
Frederic Barrat [Tue, 23 Jan 2018 11:31:46 +0000 (12:31 +0100)]
cxl: Remove support for "Processing accelerators" class

The cxl driver currently declares in its table of supported PCI
devices the class "Processing accelerators". Therefore it may be
called to probe for opencapi devices, which generates errors, as the
config space of a cxl device is not compatible with opencapi.

So remove support for the generic class, as we now have (at least) two
drivers for devices of the same class. Most cxl devices are FPGAs with
a PSL which will show a known device ID of 0x477. Other devices are
really supported by the cxlflash driver and are already listed in the
table. So removing the class is expected to go unnoticed.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoocxl: Add Makefile and Kconfig
Frederic Barrat [Tue, 23 Jan 2018 11:31:45 +0000 (12:31 +0100)]
ocxl: Add Makefile and Kconfig

OCXL_BASE triggers the platform support needed by the driver.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoocxl: Add trace points
Frederic Barrat [Tue, 23 Jan 2018 11:31:44 +0000 (12:31 +0100)]
ocxl: Add trace points

Define a few trace points so that we can use the standard tracing
mechanism for debug and/or monitoring.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoocxl: Add a kernel API for other opencapi drivers
Frederic Barrat [Tue, 23 Jan 2018 11:31:43 +0000 (12:31 +0100)]
ocxl: Add a kernel API for other opencapi drivers

Some of the functions done by the generic driver should also be needed
by other opencapi drivers: attaching a context to an adapter,
translation fault handling, AFU interrupt allocation...

So to avoid code duplication, the driver provides a kernel API that
other drivers can use, similar to calling a in-kernel library.

It is still a bit theoretical, for lack of real hardware, and will
likely need adjustements down the road. But we used the cxlflash
driver as a guinea pig.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoocxl: Add AFU interrupt support
Frederic Barrat [Tue, 23 Jan 2018 11:31:42 +0000 (12:31 +0100)]
ocxl: Add AFU interrupt support

Add user APIs through ioctl to allocate, free, and be notified of an
AFU interrupt.

For opencapi, an AFU can trigger an interrupt on the host by sending a
specific command targeting a 64-bit object handle. On POWER9, this is
implemented by mapping a special page in the address space of a
process and a write to that page will trigger an interrupt.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoocxl: Driver code for 'generic' opencapi devices
Frederic Barrat [Tue, 23 Jan 2018 11:31:41 +0000 (12:31 +0100)]
ocxl: Driver code for 'generic' opencapi devices

Add an ocxl driver to handle generic opencapi devices. Of course, it's
not meant to be the only opencapi driver, any device is free to
implement its own. But if a host application only needs basic services
like attaching to an opencapi adapter, have translation faults handled
or allocate AFU interrupts, it should suffice.

The AFU config space must follow the opencapi specification and use
the expected vendor/device ID to be seen by the generic driver.

The driver exposes the device AFUs as a char device in /dev/ocxl/

Note that the driver currently doesn't handle memory attached to the
opencapi device.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/powernv: Capture actag information for the device
Frederic Barrat [Tue, 23 Jan 2018 11:31:40 +0000 (12:31 +0100)]
powerpc/powernv: Capture actag information for the device

In the opencapi protocol, host memory contexts are referenced by a
'actag'. During setup, a driver must tell the device how many actags
it can used, and what values are acceptable.

On POWER9, the NPU can handle 64 actags per link, so they must be
shared between all the PCI functions of the link. To get a global
picture of how many actags are used by each AFU of every function, we
capture some data at the end of PCI enumeration, so that actags can be
shared fairly if needed.

This is not powernv specific per say, but rather a consequence of the
opencapi configuration specification being quite general. The number
of available actags on POWER9 makes it more likely to be hit. This is
somewhat mitigated by the fact that existing AFUs are coded by
requesting a reasonable count of actags and existing devices carry
only one AFU.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/powernv: Add platform-specific services for opencapi
Frederic Barrat [Tue, 23 Jan 2018 11:31:39 +0000 (12:31 +0100)]
powerpc/powernv: Add platform-specific services for opencapi

Implement a few platform-specific calls which can be used by drivers:

- provide the Transaction Layer capabilities of the host, so that the
  driver can find some common ground and configure the device and host
  appropriately.

- provide the hw interrupt to be used for translation faults raised by
  the NPU

- map/unmap some NPU mmio registers to get the fault context when the
  NPU raises an address translation fault

The rest are wrappers around the previously-introduced opal calls.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/powernv: Add opal calls for opencapi
Frederic Barrat [Tue, 23 Jan 2018 11:31:38 +0000 (12:31 +0100)]
powerpc/powernv: Add opal calls for opencapi

Add opal calls to interact with the NPU:

OPAL_NPU_SPA_SETUP: set the Shared Process Area (SPA)
The SPA is a table containing one entry (Process Element) per memory
context which can be accessed by the opencapi device.

OPAL_NPU_SPA_CLEAR_CACHE: clear the context cache
The NPU keeps a cache of recently accessed memory contexts. When a
Process Element is removed from the SPA, the cache for the link must
be cleared.

OPAL_NPU_TL_SET: configure the Transaction Layer
The Transaction Layer specification defines several templates for
messages to be exchanged on the link. During link setup, the host and
device must negotiate what templates are supported on both sides and
at what rates those messages can be sent.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/powernv: Set correct configuration space size for opencapi devices
Andrew Donnellan [Tue, 23 Jan 2018 11:31:37 +0000 (12:31 +0100)]
powerpc/powernv: Set correct configuration space size for opencapi devices

The configuration space for opencapi devices doesn't have a PCI
Express capability, therefore confusing linux in thinking it's of an
old PCI type with a 256-byte configuration space size, instead of the
desired 4k. So add a PCI fixup to declare the correct size.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/powernv: Introduce new PHB type for opencapi links
Frederic Barrat [Tue, 23 Jan 2018 11:31:36 +0000 (12:31 +0100)]
powerpc/powernv: Introduce new PHB type for opencapi links

The NPU was already abstracted by opal as a virtual PHB for nvlink,
but it helps to be able to differentiate between a nvlink or opencapi
PHB, as it's not completely transparent to linux. In particular, PE
assignment differs and we'll also need the information in later
patches.

So rename existing PNV_PHB_NPU type to PNV_PHB_NPU_NVLINK and add a
new type PNV_PHB_NPU_OCAPI.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/64s: Improve RFI L1-D cache flush fallback
Nicholas Piggin [Wed, 17 Jan 2018 13:58:18 +0000 (23:58 +1000)]
powerpc/64s: Improve RFI L1-D cache flush fallback

The fallback RFI flush is used when firmware does not provide a way
to flush the cache. It's a "displacement flush" that evicts useful
data by displacing it with an uninteresting buffer.

The flush has to take care to work with implementation specific cache
replacment policies, so the recipe has been in flux. The initial
slow but conservative approach is to touch all lines of a congruence
class, with dependencies between each load. It has since been
determined that a linear pattern of loads without dependencies is
sufficient, and is significantly faster.

Measuring the speed of a null syscall with RFI fallback flush enabled
gives the relative improvement:

P8 - 1.83x
P9 - 1.75x

The flush also becomes simpler and more adaptable to different cache
geometries.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/pseries, ps3: panic flush kernel messages before halting system
Nicholas Piggin [Sat, 23 Dec 2017 16:49:23 +0000 (02:49 +1000)]
powerpc/pseries, ps3: panic flush kernel messages before halting system

Platforms with a panic handler that halts the system can have problems
getting kernel messages out, because the panic notifiers are called
before kernel/panic.c does its flushing of printk buffers an console
etc.

This was attempted to be solved with commit a3b2cb30f252 ("powerpc: Do
not call ppc_md.panic in fadump panic notifier"), but that wasn't the
right approach and caused other problems, and was reverted by commit
ab9dbf771ff9.

Instead, the powernv shutdown paths have already had a similar
problem, fixed by taking the message flushing sequence from
kernel/panic.c. That's a little bit ugly, but while we have the code
duplicated, it will work for this case as well. So have ppc panic
handlers do the same flushing before they terminate.

Without this patch, a qemu pseries_le_defconfig guest stops silently
when issued the nmi command when xmon is off and no crash dumpers
enabled. Afterwards, an oops is printed by each CPU as expected.

Fixes: ab9dbf771ff9 ("Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/selftests: Check endianness on trap in TM
Gustavo Romero [Sun, 31 Dec 2017 23:20:46 +0000 (18:20 -0500)]
powerpc/selftests: Check endianness on trap in TM

Add a selftest to check if endianness is flipped inadvertently to BE
(MSR.LE set to zero) on BE and LE machines when a trap is caught in
transactional mode and load_fp and load_vec are zero, i.e. when MSR.FP
and MSR.VEC are zeroed (disabled).

Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/tm: Fix endianness flip on trap
Gustavo Romero [Sun, 31 Dec 2017 23:20:45 +0000 (18:20 -0500)]
powerpc/tm: Fix endianness flip on trap

Currently it's possible that a thread on PPC64 LE has its endianness
flipped inadvertently to Big-Endian resulting in a crash once the process
is back from the signal handler.

If giveup_all() is called when regs->msr has the bits MSR.FP and MSR.VEC
disabled (and hence MSR.VSX disabled too) it returns without calling
check_if_tm_restore_required() which copies regs->msr to ckpt_regs->msr if
the process caught a signal whilst in transactional mode. Then once in
setup_tm_sigcontexts() MSR from ckpt_regs.msr is used, but since
check_if_tm_restore_required() was not called previuosly, gp_regs[PT_MSR]
gets a copy of invalid MSR bits as MSR in ckpt_regs was not updated from
regs->msr and so is zeroed. Later when leaving the signal handler once in
sys_rt_sigreturn() the TS bits of gp_regs[PT_MSR] are checked to determine
if restore_tm_sigcontexts() must be called to pull in the correct MSR state
into the user context. Because TS bits are zeroed
restore_tm_sigcontexts() is never called and MSR restored from the user
context on returning from the signal handler has the MSR.LE (the endianness
bit) forced to zero (Big-Endian). That leads, for instance, to 'nop' being
treated as an illegal instruction in the following sequence:

tbegin.
beq 1f
trap
tend.
1: nop

on PPC64 LE machines and the process dies just after returning from the
signal handler.

PPC64 BE is also affected but in a subtle way since forcing Big-Endian on
a BE machine does not change the endianness.

This commit fixes the issue described above by ensuring that once in
setup_tm_sigcontexts() the MSR used is from regs->msr instead of from
ckpt_regs->msr and by ensuring that we pull in only the MSR.FP, MSR.VEC,
and MSR.VSX bits from ckpt_regs->msr.

The fix was tested both on LE and BE machines and no regression regarding
the powerpc/tm selftests was observed.

Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: Expose TSCR via sysfs
Anton Blanchard [Thu, 7 Sep 2017 17:11:12 +0000 (03:11 +1000)]
powerpc: Expose TSCR via sysfs

The thread switch control register (TSCR) is a per core register
that configures how the CPU shares resources between SMT threads.

Exposing it via sysfs allows us to tune it at run time.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/radix: Remove trace_tlbie call from radix__flush_tlb_all
Mahesh Salgaonkar [Thu, 30 Nov 2017 09:05:54 +0000 (14:35 +0530)]
powerpc/radix: Remove trace_tlbie call from radix__flush_tlb_all

radix__flush_tlb_all() is called only in kexec path in real mode and any
tracepoints at this stage will make kexec to fail if enabled.

To verify enable tlbie trace before kexec.

$ echo 1 > /sys/kernel/debug/tracing/events/powerpc/tlbie/enable
== kexec into new kernel and kexec fails.

Fix this by not calling trace_tlbie from radix__flush_tlb_all().

Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/powernv: Add ppc_pci_reset_phbs parameter to issue a PHB reset
Guilherme G. Piccoli [Fri, 17 Nov 2017 18:58:59 +0000 (16:58 -0200)]
powerpc/powernv: Add ppc_pci_reset_phbs parameter to issue a PHB reset

During a kdump kernel boot in PowerPC, we request a reset of the PHBs
to the FW. It makes sense, since if we are booting a kdump kernel it
means we had some trouble before and we cannot rely in the adapters'
health; they could be in a bad state, hence the reset is needed.

But this reset is useful not only in kdump - there are situations,
specially when debugging drivers, that we could break an adapter in
a way it requires such reset. One can tell to just go ahead and
reboot the machine, but happens that many times doing kexec is much
faster, and so preferable than a full power cycle.

This patch adds the ppc_pci_reset_phbs parameter to perform such reset.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/pseries: Don't give a warning when HPT resizing isn't available
David Gibson [Fri, 17 Mar 2017 01:11:19 +0000 (12:11 +1100)]
powerpc/pseries: Don't give a warning when HPT resizing isn't available

As of 438cc81a41 "powerpc/pseries: Automatically resize HPT for memory hot
add/remove" when running on the pseries platform, we always attempt to
use the PAPR extension to resize the hashed page table (HPT) when we add
or remove memory.

This is fine, but when the extension is available we'll give a harmless,
but scary warning.  This patch suppresses the warning in this case.  It
will still warn if the feature is supposed to be available, but didn't
work.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agodt: booting-without-of: DT fix s/#interrupt-cell/#interrupt-cells/
Geert Uytterhoeven [Fri, 2 Jun 2017 12:38:46 +0000 (14:38 +0200)]
dt: booting-without-of: DT fix s/#interrupt-cell/#interrupt-cells/

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoselftests/powerpc: Add alignment handler selftest
Andrew Donnellan [Mon, 16 Oct 2017 05:04:02 +0000 (16:04 +1100)]
selftests/powerpc: Add alignment handler selftest

Add a selftest to exercise the powerpc alignment fault handler.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
[mpe: Add 32-bit support to the signal handler]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: Use octal numbers for file permissions
Russell Currey [Thu, 12 Jan 2017 03:54:13 +0000 (14:54 +1100)]
powerpc: Use octal numbers for file permissions

Symbolic macros are unintuitive and hard to read, whereas octal constants
are much easier to interpret.  Replace macros for the basic permission
flags (user/group/other read/write/execute) with numeric constants
instead, across the whole powerpc tree.

Introducing a significant number of changes across the tree for no runtime
benefit isn't exactly desirable, but so long as these macros are still
used in the tree people will keep sending patches that add them.  Not only
are they hard to parse at a glance, there are multiple ways of coming to
the same value (as you can see with 0444 and 0644 in this patch) which
hurts readability.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/boot/dts: Remove leading 0x and 0s from bindings notation
Mathieu Malaterre [Thu, 14 Dec 2017 16:54:00 +0000 (17:54 +0100)]
powerpc/boot/dts: Remove leading 0x and 0s from bindings notation

Improve the DTS files by removing all the leading "0x" and zeros to
fix the following dtc warnings:

  Warning (unit_address_format): Node /XXX unit name should not have leading "0x"

and:

  Warning (unit_address_format): Node /XXX unit name should not have leading 0s

Converted using the following command:

  find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -E -i -e "s/@0x([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" -e "s/@0+([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" {} +

For simplicity, two sed expressions were used to solve each warnings
separately.

To make the regex expression more robust a few other issues were
resolved, namely setting unit-address to lower case, and adding a
whitespace before the the opening curly brace:

  https://elinux.org/Device_Tree_Linux#Linux_conventions

This is a follow up to commit 4c9847b7375a ("dt-bindings: Remove
leading 0x from bindings notation")

Reported-by: David Daney <ddaney@caviumnetworks.com>
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agobacklight: Fix old-style function definition
Mathieu Malaterre [Tue, 26 Dec 2017 14:12:41 +0000 (15:12 +0100)]
backlight: Fix old-style function definition

Fix warning:

drivers/macintosh/via-pmu-backlight.c: In function â€˜pmu_backlight_init’:
drivers/macintosh/via-pmu-backlight.c:140:13: warning: old-style function definition [-Wold-style-definition]
 void __init pmu_backlight_init()
             ^~~~~~~~~~~~~~~~~~

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: Fix old-style function definition
Mathieu Malaterre [Tue, 26 Dec 2017 14:12:33 +0000 (15:12 +0100)]
powerpc: Fix old-style function definition

Fix warnings such as:

arch/powerpc/platforms/powermac/backlight.c: In function â€˜pmac_backlight_get_legacy_brightness’:
arch/powerpc/platforms/powermac/backlight.c:189:5: error: old-style function definition [-Werror=old-style-definition]
 int pmac_backlight_get_legacy_brightness()
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/xmon: Do not compute/store the major opcode
Mathieu Malaterre [Tue, 26 Dec 2017 13:25:47 +0000 (14:25 +0100)]
powerpc/xmon: Do not compute/store the major opcode

In commit 5b102782c7f4 ("powerpc/xmon: Enable disassembly files (compilation
changes)") usage of variable `op` has been removed. Completely remove opcode
computation since not used anymore.

Fix fatal warning:

arch/powerpc/xmon/ppc-dis.c: In function â€˜lookup_powerpc’:
arch/powerpc/xmon/ppc-dis.c:96:17: error: variable â€˜op’ set but not used [-Werror=unused-but-set-variable]
   unsigned long op;
                 ^~

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/xive: Properly use static keyword for inline function
Mathieu Malaterre [Tue, 26 Dec 2017 13:00:17 +0000 (14:00 +0100)]
powerpc/xive: Properly use static keyword for inline function

Fix fatal warning during compilation:

In file included from arch/powerpc/xmon/xmon.c:54:0:
./arch/powerpc/include/asm/xive.h:157:20: error: no previous prototype for â€˜xive_smp_prepare_cpu’ [-Werror=missing-prototypes]
 extern inline int  xive_smp_prepare_cpu(unsigned int cpu) { return -EINVAL; }
                    ^

Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoselftest/powerpc: Add additional option to mmap_bench test
Aneesh Kumar K.V [Tue, 28 Nov 2017 08:36:39 +0000 (14:06 +0530)]
selftest/powerpc: Add additional option to mmap_bench test

This patch adds --pgfault and --iterations options to mmap_bench test. With
--pgfault we touch every page mapped. This helps in measuring impact in the
page fault path with a patch series.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/hash: Skip non initialized page size in init_hpte_page_sizes
Aneesh Kumar K.V [Tue, 28 Nov 2017 08:34:40 +0000 (14:04 +0530)]
powerpc/hash: Skip non initialized page size in init_hpte_page_sizes

One of the easiest way to test config with 4K HPTE is to disable 64K hardware
page size like below.

int __init htab_dt_scan_page_sizes(unsigned long node,

  size -= 3; prop += 3;
  base_idx = get_idx_from_shift(base_shift);
- if (base_idx < 0) {
+ if (base_idx < 0 || base_idx == MMU_PAGE_64K) {
  /* skip the pte encoding also */
  prop += lpnum * 2; size -= lpnum * 2;

But then this results in error in other part of the code such as MPSS parsing
where we look at 4K base page size and 64K actual page size support.

This patch fix MPSS parsing by ignoring the actual page sizes marked
unsupported. In reality this can happen only with a corrupt device tree. But it
is good to tighten the error check.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoMerge branch 'next' of https://git.kernel.org/pub/scm/linux/kernel/git/scottwood...
Michael Ellerman [Sun, 21 Jan 2018 12:32:02 +0000 (23:32 +1100)]
Merge branch 'next' of https://git./linux/kernel/git/scottwood/linux into next

Freescale updates from Scott:

"Contains fixes for CPM GPIO and an FSL PCI erratum workaround, plus a
 minor cleanup patch."

6 years agoMerge branch 'fixes' into next
Michael Ellerman [Sun, 21 Jan 2018 12:21:14 +0000 (23:21 +1100)]
Merge branch 'fixes' into next

Merge our fixes branch from the 4.15 cycle.

Unusually the fixes branch saw some significant features merged,
notably the RFI flush patches, so we want the code in next to be
tested against that, to avoid any surprises when the two are merged.

There's also some other work on the panic handling that was reverted
in fixes and we now want to do properly in next, which would conflict.

And we also fix a few other minor merge conflicts.

6 years agoMerge branch 'topic/ppc-kvm' into next
Michael Ellerman [Sun, 21 Jan 2018 11:43:43 +0000 (22:43 +1100)]
Merge branch 'topic/ppc-kvm' into next

Merge the topic branch we share with kvm-ppc, this brings in two xive
commits, one from Paul to rework HMI handling, and a minor cleanup to
drop an unused flag.

6 years agopowerpc/mm: Remove unused flag arg in global_invalidates
Aneesh Kumar K.V [Mon, 6 Nov 2017 12:27:44 +0000 (17:57 +0530)]
powerpc/mm: Remove unused flag arg in global_invalidates

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/sysdev: change CPM GPIO to platform_device
Christophe Leroy [Wed, 13 Dec 2017 11:26:23 +0000 (12:26 +0100)]
powerpc/sysdev: change CPM GPIO to platform_device

Since commit 9427ecbed46cc ("gpio: Rework of_gpiochip_set_names()
to use device property accessors"), gpio chips have to have a
parent, otherwise devprop_gpiochip_set_names() prematurely exists
with message "GPIO chip parent is NULL" and doesn't proceed
'gpio-line-names' DT property.

This patch wraps the CPM GPIO into a platform driver to allow
assignment of the parent device.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
6 years agopowerpc: Enable support for ibm,drc-info devtree property
Michael Bringmann [Fri, 1 Dec 2017 23:19:55 +0000 (17:19 -0600)]
powerpc: Enable support for ibm,drc-info devtree property

To: linuxppc-dev@lists.ozlabs.org

From: Michael Bringmann <mwb@linux.vnet.ibm.com>

Cc: Michael Bringmann <mwb@linux.vnet.ibm.com>
Cc: nfont@linux.vnet.ibm.com
Subject: [PATCH V6 4/4] powerpc: Enable support for ibm,drc-info devtree property

prom_init.c: Enable support for new DRC device tree property
"ibm,drc-info" in initial handshake between the Linux kernel and
the front end processor.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agohotplug/drc-info: Add code to search ibm,drc-info property
Michael Bringmann [Fri, 1 Dec 2017 23:19:48 +0000 (17:19 -0600)]
hotplug/drc-info: Add code to search ibm,drc-info property

rpadlpar_core.c: Provide parallel routines to search the older device-
tree properties ("ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types"
and "ibm,drc-power-domains"), or the new property "ibm,drc-info".

The interface to examine the DRC information is changed from a "get"
function that returns values for local verification elsewhere, to a
"check" function that validates the 'name' and/or 'type' of a device
node.  This update hides the format of the underlying device-tree
properties, and concentrates the value checks into a single function
without requiring the user to verify whether a search was successful.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopseries/drc-info: Search DRC properties for CPU indexes
Michael Bringmann [Fri, 1 Dec 2017 23:19:43 +0000 (17:19 -0600)]
pseries/drc-info: Search DRC properties for CPU indexes

pseries/drc-info: Provide parallel routines to convert between
drc_index and CPU numbers at runtime, using the older device-tree
properties ("ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types"
and "ibm,drc-power-domains"), or the new property "ibm,drc-info".

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/firmware: Add definitions for new drc-info firmware feature
Michael Bringmann [Fri, 1 Dec 2017 23:19:40 +0000 (17:19 -0600)]
powerpc/firmware: Add definitions for new drc-info firmware feature

Firmware Features: Define new bit flag representing the presence of
new device tree property "ibm,drc-info".  The flag is used to tell
the front end processor whether the Linux kernel supports the new
property, and by the front end processor to tell the Linux kernel
that the new property is present in the device tree.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/fsl_pci: Fix ptr_ret.cocci warnings
Vasyl Gomonovych [Mon, 27 Nov 2017 21:37:33 +0000 (22:37 +0100)]
powerpc/fsl_pci: Fix ptr_ret.cocci warnings

arch/powerpc/sysdev/fsl_pci.c:1307:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci

Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>
6 years agopowerpc/fsl_pci: Correct fsl_pci_mcheck_exception
Joakim Tjernlund [Tue, 5 Sep 2017 11:59:43 +0000 (13:59 +0200)]
powerpc/fsl_pci: Correct fsl_pci_mcheck_exception

get_user() had it args reversed causing NIP to be NULL:ed instead
of fixing up the PCI access.

Note: This still hangs my P1020 Freescale CPU hard, but at least
I get a NIP now.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
6 years agopowerpc/watchdog: improve watchdog comments
Nicholas Piggin [Wed, 1 Nov 2017 00:27:33 +0000 (11:27 +1100)]
powerpc/watchdog: improve watchdog comments

The overview comments in the powerpc watchdog are out of date after
several iterations and changes of the code. Bring them up to date.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/lib/feature-fixups: use raw_patch_instruction()
Christophe Leroy [Fri, 24 Nov 2017 07:31:09 +0000 (08:31 +0100)]
powerpc/lib/feature-fixups: use raw_patch_instruction()

feature fixups need to use patch_instruction() early in the boot,
even before the code is relocated to its final address, requiring
patch_instruction() to use PTRRELOC() in order to address data.

But feature fixups applies on code before it is set to read only,
even for modules. Therefore, feature fixups can use
raw_patch_instruction() instead.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/lib/code-patching: refactor patch_instruction()
Christophe Leroy [Fri, 24 Nov 2017 07:31:07 +0000 (08:31 +0100)]
powerpc/lib/code-patching: refactor patch_instruction()

patch_instruction() uses almost the same sequence as
__patch_instruction()

This patch refactor it so that patch_instruction() uses
__patch_instruction() instead of duplicating code.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: restore alphabetic order in Kconfig
Christophe Leroy [Thu, 4 Jan 2018 15:35:25 +0000 (16:35 +0100)]
powerpc: restore alphabetic order in Kconfig

This patch restores the alphabetic order which was broken by
commit 1e0fc9d1eb2b0 ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX
for some configs")

Fixes: 1e0fc9d1eb2b0 ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/spufs: use timespec64 for timestamps
Arnd Bergmann [Tue, 16 Jan 2018 17:00:35 +0000 (18:00 +0100)]
powerpc/spufs: use timespec64 for timestamps

The switch log prints the tv_sec portion of timespec as a 32-bit
number, while overflows in 2106. It also uses the timespec type,
which is safe on 64-bit architectures, but deprecated because
it causes overflows in 2038 elsewhere.

This changes it to timespec64 and printing a 64-bit number for
consistency.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/mpic_timer: avoid struct timeval
Arnd Bergmann [Tue, 16 Jan 2018 17:01:50 +0000 (18:01 +0100)]
powerpc/mpic_timer: avoid struct timeval

In an effort to remove all instances of 'struct timeval'
from the kernel, I'm changing the powerpc mpic_timer interface
to use plain seconds instead. There is only one user of this
interface, and that doesn't use the microseconds portion, so
the code gets noticeably simpler in the process.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/powernv/ioda: Finish removing explicit max window size check
Alexey Kardashevskiy [Thu, 18 Jan 2018 02:51:03 +0000 (13:51 +1100)]
powerpc/powernv/ioda: Finish removing explicit max window size check

9003a2498 removed checn from the DMA window pages allocator, however
the VFIO driver tests limits before doing so by calling
the get_table_size hook which was left behind; this fixes it.

Fixes: 9003a2498 "powerpc/powernv/ioda: Remove explicit max window size check"
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/mm: Invalidate subpage_prot() system call on radix platforms
Anshuman Khandual [Mon, 4 Dec 2017 05:49:22 +0000 (11:19 +0530)]
powerpc/mm: Invalidate subpage_prot() system call on radix platforms

Radix enabled platforms don't support subpage_prot() system calls. But
at present the system call goes through without an error and fails
later on while validating expected subpage accesses. Lets not allow
the system call on powerpc radix platforms to begin with to prevent
this confusion in user space.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: sys_pkey_mprotect() system call
Ram Pai [Fri, 19 Jan 2018 01:50:46 +0000 (17:50 -0800)]
powerpc: sys_pkey_mprotect() system call

Patch provides the ability for a process to
associate a pkey with a address range.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: sys_pkey_alloc() and sys_pkey_free() system calls
Ram Pai [Fri, 19 Jan 2018 01:50:45 +0000 (17:50 -0800)]
powerpc: sys_pkey_alloc() and sys_pkey_free() system calls

Finally this patch provides the ability for a process to
allocate and free a protection key.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: Enable pkey subsystem
Ram Pai [Fri, 19 Jan 2018 01:50:44 +0000 (17:50 -0800)]
powerpc: Enable pkey subsystem

PAPR defines 'ibm,processor-storage-keys' property. It exports two
values. The first value holds the number of data-access keys and the
second holds the number of instruction-access keys. Due to a bug in
the firmware, instruction-access keys is always reported as zero.
However any key can be configured to disable data-access and/or
disable execution-access. The inavailablity of the second value is not
a big handicap, though it could have been used to determine if the
platform supported disable-execution-access.

Non-PAPR platforms do not define this property in the device tree yet.
Fortunately power8 is the only released Non-PAPR platform that is
supported. Here, we hardcode the number of supported pkey to 32, by
consulting the PowerISA3.0

This patch calculates the number of keys supported by the platform.
Also it determines the platform support for read/write/execution
access support for pkeys.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[mpe: Use a PVR check instead of CPU_FTR for execute. Restrict to
 Power7/8/9 for now until older CPUs are tested.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/ptrace: Add memory protection key regset
Thiago Jung Bauermann [Fri, 19 Jan 2018 01:50:43 +0000 (17:50 -0800)]
powerpc/ptrace: Add memory protection key regset

The AMR/IAMR/UAMOR are part of the program context.
Allow it to be accessed via ptrace and through core files.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: Deliver SEGV signal on pkey violation
Ram Pai [Fri, 19 Jan 2018 01:50:42 +0000 (17:50 -0800)]
powerpc: Deliver SEGV signal on pkey violation

The value of the pkey, whose protection got violated,
is made available in si_pkey field of the siginfo structure.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: introduce get_mm_addr_key() helper
Ram Pai [Fri, 19 Jan 2018 01:50:41 +0000 (17:50 -0800)]
powerpc: introduce get_mm_addr_key() helper

get_mm_addr_key() helper returns the pkey associated with
an address corresponding to a given mm_struct.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: Handle exceptions caused by pkey violation
Ram Pai [Fri, 19 Jan 2018 01:50:40 +0000 (17:50 -0800)]
powerpc: Handle exceptions caused by pkey violation

Handle Data and  Instruction exceptions caused by memory
protection-key.

The CPU will detect the key fault if the HPTE is already
programmed with the key.

However if the HPTE is not  hashed, a key fault will not
be detected by the hardware. The software will detect
pkey violation in such a case.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: implementation for arch_vma_access_permitted()
Ram Pai [Fri, 19 Jan 2018 01:50:39 +0000 (17:50 -0800)]
powerpc: implementation for arch_vma_access_permitted()

This patch provides the implementation for
arch_vma_access_permitted(). Returns true if the
requested access is allowed by pkey associated with the
vma.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: check key protection for user page access
Ram Pai [Fri, 19 Jan 2018 01:50:38 +0000 (17:50 -0800)]
powerpc: check key protection for user page access

Make sure that the kernel does not access user pages without
checking their key-protection.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[mpe: Integrate with upstream version of pte_access_permitted()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: helper to validate key-access permissions of a pte
Ram Pai [Fri, 19 Jan 2018 01:50:37 +0000 (17:50 -0800)]
powerpc: helper to validate key-access permissions of a pte

helper function that checks if the read/write/execute is allowed
on the pte.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: Program HPTE key protection bits
Ram Pai [Fri, 19 Jan 2018 01:50:36 +0000 (17:50 -0800)]
powerpc: Program HPTE key protection bits

Map the PTE protection key bits to the HPTE key protection bits,
while creating HPTE  entries.

Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: map vma key-protection bits to pte key bits.
Ram Pai [Fri, 19 Jan 2018 01:50:35 +0000 (17:50 -0800)]
powerpc: map vma key-protection bits to pte key bits.

Map  the  key  protection  bits of the vma to the pkey bits in
the PTE.

The PTE  bits used  for pkey  are  3,4,5,6  and 57. The  first
four bits are the same four bits that were freed up  initially
in this patch series. remember? :-) Without those four bits
this patch wouldn't be possible.

BUT, on 4k kernel, bit 3, and 4 could not be freed up. remember?
Hence we have to be satisfied with 5, 6 and 7.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: implementation for arch_override_mprotect_pkey()
Ram Pai [Fri, 19 Jan 2018 01:50:34 +0000 (17:50 -0800)]
powerpc: implementation for arch_override_mprotect_pkey()

arch independent code calls arch_override_mprotect_pkey()
to return a pkey that best matches the requested protection.

This patch provides the implementation.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: ability to associate pkey to a vma
Ram Pai [Fri, 19 Jan 2018 01:50:33 +0000 (17:50 -0800)]
powerpc: ability to associate pkey to a vma

arch-independent code expects the arch to  map
a  pkey  into the vma's protection bit setting.
The patch provides that ability.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: introduce execute-only pkey
Ram Pai [Fri, 19 Jan 2018 01:50:32 +0000 (17:50 -0800)]
powerpc: introduce execute-only pkey

This patch provides the implementation of execute-only pkey.
The architecture-independent layer expects the arch-dependent
layer, to support the ability to create and enable a special
key which has execute-only permission.

Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: store and restore the pkey state across context switches
Ram Pai [Fri, 19 Jan 2018 01:50:31 +0000 (17:50 -0800)]
powerpc: store and restore the pkey state across context switches

Store and restore the AMR, IAMR and UAMOR register state of the task
before scheduling out and after scheduling in, respectively.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: ability to create execute-disabled pkeys
Ram Pai [Fri, 19 Jan 2018 01:50:30 +0000 (17:50 -0800)]
powerpc: ability to create execute-disabled pkeys

powerpc has hardware support to disable execute on a pkey.
This patch enables the ability to create execute-disabled
keys.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: implementation for arch_set_user_pkey_access()
Ram Pai [Fri, 19 Jan 2018 01:50:29 +0000 (17:50 -0800)]
powerpc: implementation for arch_set_user_pkey_access()

This patch provides the detailed implementation for
a user to allocate a key and enable it in the hardware.

It provides the plumbing, but it cannot be used till
the system call is implemented. The next patch  will
do so.

Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: cleanup AMR, IAMR when a key is allocated or freed
Ram Pai [Fri, 19 Jan 2018 01:50:28 +0000 (17:50 -0800)]
powerpc: cleanup AMR, IAMR when a key is allocated or freed

Cleanup the bits corresponding to a key in the AMR, and IAMR
register, when the key is newly allocated/activated or is freed.
We dont want some residual bits cause the hardware enforce
unintended behavior when the key is activated or freed.

Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: helper functions to initialize AMR, IAMR and UAMOR registers
Ram Pai [Fri, 19 Jan 2018 01:50:27 +0000 (17:50 -0800)]
powerpc: helper functions to initialize AMR, IAMR and UAMOR registers

Introduce  helper functions that can initialize the bits in the AMR,
IAMR and UAMOR register; the bits that correspond to the given pkey.

Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: helper function to read, write AMR, IAMR, UAMOR registers
Ram Pai [Fri, 19 Jan 2018 01:50:26 +0000 (17:50 -0800)]
powerpc: helper function to read, write AMR, IAMR, UAMOR registers

Implements helper functions to read and write the key related
registers; AMR, IAMR, UAMOR.

AMR register tracks the read,write permission of a key
IAMR register tracks the execute permission of a key
UAMOR register enables and disables a key

Acked-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: track allocation status of all pkeys
Ram Pai [Fri, 19 Jan 2018 01:50:25 +0000 (17:50 -0800)]
powerpc: track allocation status of all pkeys

Total 32 keys are available on power7 and above. However
pkey 0,1 are reserved. So effectively we  have  30 pkeys.

On 4K kernels, we do not  have  5  bits  in  the  PTE to
represent  all the keys; we only have 3bits. Two of those
keys are reserved; pkey 0 and pkey 1. So effectively  we
have 6 pkeys.

This patch keeps track of reserved keys, allocated  keys
and keys that are currently free.

Also it  adds  skeletal  functions  and macros, that the
architecture-independent code expects to be available.

Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: initial pkey plumbing
Ram Pai [Fri, 19 Jan 2018 01:50:24 +0000 (17:50 -0800)]
powerpc: initial pkey plumbing

Basic  plumbing  to   initialize  the   pkey  system.
Nothing is enabled yet. A later patch will enable it
once all the infrastructure is in place.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[mpe: Rework copyrights to use SPDX tags]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agocxl: Add support for ASB_Notify on POWER9
Christophe Lombard [Thu, 11 Jan 2018 08:55:25 +0000 (09:55 +0100)]
cxl: Add support for ASB_Notify on POWER9

The POWER9 core supports a new feature: ASB_Notify which requires the
support of the Special Purpose Register: TIDR.

The ASB_Notify command, generated by the AFU, will attempt to
wake-up the host thread identified by the particular LPID:PID:TID.

This patch assign a unique TIDR (thread id) for the current thread which
will be used in the process element entry.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/perf: Change the data type for the variable 'ncpu' in IMC code
Anju T Sudhakar [Tue, 31 Oct 2017 09:52:00 +0000 (15:22 +0530)]
powerpc/perf: Change the data type for the variable 'ncpu' in IMC code

Change the data type for the variable 'ncpu' in ppc_core_imc_cpu_offline(),
since cpumask_any_but() returns an 'int' value.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/powernv: Add debugfs interface for imc-mode and imc-command
Anju T Sudhakar [Wed, 13 Dec 2017 06:09:54 +0000 (11:39 +0530)]
powerpc/powernv: Add debugfs interface for imc-mode and imc-command

In memory Collection (IMC) counter pmu driver controls the ucode's
execution state. At the system boot, IMC perf driver pause the ucode.
Ucode state is changed to "running" only when any of the nest units
are monitored or profiled using perf tool.

Nest units support only limited set of hardware counters and ucode is
always programmed in the "production mode" ("accumulation") mode. This
mode is configured to provide key performance metric data for most of
the nest units.

But ucode also supports other modes which would be used for "debug" to
drill down specific nest units. That is, ucode when switched to
"powerbus" debug mode (for example), will dynamically reconfigure the
nest counters to target only "powerbus" related events in the hardware
counters. This allows the IMC nest unit to focus on powerbus related
transactions in the system in more detail. At this point, production
mode events may or may not be counted.

IMC nest counters has both in-band (ucode access) and out of band
access to it. Since not all nest counter configurations are supported
by ucode, out of band tools are used to characterize other nest
counter configurations.

Patch provides an interface via "debugfs" to enable the switching of
ucode modes in the system. To switch ucode mode, one has to first
pause the microcode (imc_cmd), and then write the target mode value to
the "imc_mode" file.

Proposed Approach:

In the proposed approach, the function (export_imc_mode_and_cmd) which
creates the debugfs interface for imc mode and command is implemented
in opal-imc.c. Thus we can use imc_get_mem_addr() to get the homer
base address for each chip.

The interface to expose imc mode and command is required only if we
have nest pmu units registered. Employing the existing data structures
to track whether we have any nest units registered will require to
extend data from perf side to opal-imc.c. Instead an integer is
introduced to hold that information by counting successful nest unit
registration. Debugfs interface is removed based on the integer count.

Example for the interface:

  $ ls /sys/kernel/debug/imc
  imc_cmd_0  imc_cmd_8  imc_mode_0  imc_mode_8

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/perf: Pass struct imc_events as a parameter to imc_parse_event()
Anju T Sudhakar [Mon, 11 Dec 2017 05:58:37 +0000 (11:28 +0530)]
powerpc/perf: Pass struct imc_events as a parameter to imc_parse_event()

Remove the allocation of struct imc_events from imc_parse_event().
Instead pass imc_events as a parameter to imc_parse_event(), which is
a pointer to a slot in the array allocated in
update_events_in_group().

Reported-by: Dan Carpenter ("powerpc/perf: Fix a sizeof() typo so we allocate less memory")
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/perf: IMC code cleanup with some code refactoring
Anju T Sudhakar [Mon, 11 Dec 2017 05:58:36 +0000 (11:28 +0530)]
powerpc/perf: IMC code cleanup with some code refactoring

Factor out memory freeing part for attribute elements from
imc_common_cpuhp_mem_free().

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/perf: Remove thread_imc_pmu global variable from
Anju T Sudhakar [Mon, 11 Dec 2017 05:58:35 +0000 (11:28 +0530)]
powerpc/perf: Remove thread_imc_pmu global variable from

Remove the global variable 'thread_imc_pmu', since it is not used in the code.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/64s: Implement local_t using irq soft masking
Madhavan Srinivasan [Wed, 20 Dec 2017 03:55:57 +0000 (09:25 +0530)]
powerpc/64s: Implement local_t using irq soft masking

local_t is used for atomic modifications for per-CPU data, versus
re-entrant modifications via interrupts.

local_t read-modify-write atomic operations are currently implemented
with hardware atomics (larx/stcx), which are quite slow. This patch
implements them by masking all types of interrupts that may do local_t
operations ("standard" and perf interrupts).

Rusty's benchmark (https://lkml.org/lkml/2008/12/16/450) gives the
following timings for the local_t test, in nanoseconds per iteration:

             larx/stcx   irq+pmu disable
_inc                38                10
_add                38                10
_read                4                 4
_add_return         38                10

There are still some interrupt types (system reset, machine check, and
watchdog), which can not safely use local_t operations, because they
are not masked.

An alternative approach was proposed, using a CR bit to mark a critical
section, which is tested in the interrupt return path, and would then
branch to a fixup handler (similar to exception fixups), which re-starts
the operation. The problem with this was the complexity of the fixup
handler and the latency of the slow path.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-November/123024.html

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: use generic atomic implementation for local_t
Madhavan Srinivasan [Wed, 20 Dec 2017 03:55:56 +0000 (09:25 +0530)]
powerpc: use generic atomic implementation for local_t

powerpc implements local_t with atomic operations. There is already
an asm-generic implementation which does this using atomic_t.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/64s: Add new set of irq_soft_mask_ functions for PMI masking
Madhavan Srinivasan [Wed, 20 Dec 2017 03:55:55 +0000 (09:25 +0530)]
powerpc/64s: Add new set of irq_soft_mask_ functions for PMI masking

To support soft-masking of the performance monitor interrupt, a set of
new powerpc_local_irq_pmu_save() and powerpc_local_irq_restore()
functions are added. And powerpc_local_irq_save() implemented, by
adding a new irq_soft_mask manipulation function
irq_soft_mask_or_return().

Local_irq_pmu_* macros are provided to access these
powerpc_local_irq_pmu* functions which includes
trace_hardirqs_on|off() to match what we have in
include/linux/irqflags.h.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc: Add new kconfig CONFIG_PPC_IRQ_SOFT_MASK_DEBUG
Madhavan Srinivasan [Wed, 20 Dec 2017 03:55:54 +0000 (09:25 +0530)]
powerpc: Add new kconfig CONFIG_PPC_IRQ_SOFT_MASK_DEBUG

New Kconfig is added "CONFIG_PPC_IRQ_SOFT_MASK_DEBUG" to add WARN_ON
to alert the invalid transitions. Also moved the code under the
CONFIG_TRACE_IRQFLAGS in arch_local_irq_restore() to new Kconfig.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Fix name of CONFIG option in change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/64s: Add support to mask perf interrupts and replay them
Madhavan Srinivasan [Wed, 20 Dec 2017 03:55:53 +0000 (09:25 +0530)]
powerpc/64s: Add support to mask perf interrupts and replay them

Two new bit mask field "IRQ_DISABLE_MASK_PMU" is introduced to support
the masking of PMI and "IRQ_DISABLE_MASK_ALL" to aid interrupt masking
checking.

Couple of new irq #defs "PACA_IRQ_PMI" and "SOFTEN_VALUE_0xf0*" added
to use in the exception code to check for PMI interrupts.

In the masked_interrupt handler, for PMIs we reset the MSR[EE] and
return. In the __check_irq_replay(), replay the PMI interrupt by
calling performance_monitor_common handler.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/64s: Add support to take additional parameter in MASKABLE_* macro
Madhavan Srinivasan [Wed, 20 Dec 2017 03:55:52 +0000 (09:25 +0530)]
powerpc/64s: Add support to take additional parameter in MASKABLE_* macro

To support addition of "bitmask" to MASKABLE_* macros, factor out the
EXCPETION_PROLOG_1 macro.

Make it explicit the interrupt masking supported by a gievn interrupt
handler. Patch correspondingly extends the MASKABLE_* macros with an
addition's parameter. "bitmask" parameter is passed to SOFTEN_TEST
macro to decide on masking the interrupt.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/64s: Avoid using EXCEPTION_PROLOG_1 macro in MASKABLE_*
Madhavan Srinivasan [Wed, 20 Dec 2017 03:55:51 +0000 (09:25 +0530)]
powerpc/64s: Avoid using EXCEPTION_PROLOG_1 macro in MASKABLE_*

Currently we use both EXCEPTION_PROLOG_1 and __EXCEPTION_PROLOG_1 in
the MASKABLE_* macros. As a cleanup, this patch makes MASKABLE_* to
use only __EXCEPTION_PROLOG_1. There is not logic change.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/64: Rename soft_enabled to irq_soft_mask
Madhavan Srinivasan [Wed, 20 Dec 2017 03:55:50 +0000 (09:25 +0530)]
powerpc/64: Rename soft_enabled to irq_soft_mask

Rename the paca->soft_enabled to paca->irq_soft_mask as it is no
longer used as a flag for interrupt state, but a mask.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/64: Change soft_enabled from flag to bitmask
Madhavan Srinivasan [Wed, 20 Dec 2017 03:55:49 +0000 (09:25 +0530)]
powerpc/64: Change soft_enabled from flag to bitmask

"paca->soft_enabled" is used as a flag to mask some of interrupts.
Currently supported flags values and their details:

soft_enabled    MSR[EE]

0               0       Disabled (PMI and HMI not masked)
1               1       Enabled

"paca->soft_enabled" is initialized to 1 to make the interripts as
enabled. arch_local_irq_disable() will toggle the value when
interrupts needs to disbled. At this point, the interrupts are not
actually disabled, instead, interrupt vector has code to check for the
flag and mask it when it occurs. By "mask it", it update interrupt
paca->irq_happened and return. arch_local_irq_restore() is called to
re-enable interrupts, which checks and replays interrupts if any
occured.

Now, as mentioned, current logic doesnot mask "performance monitoring
interrupts" and PMIs are implemented as NMI. But this patchset depends
on local_irq_* for a successful local_* update. Meaning, mask all
possible interrupts during local_* update and replay them after the
update.

So the idea here is to reserve the "paca->soft_enabled" logic. New
values and details:

soft_enabled    MSR[EE]

1               0       Disabled  (PMI and HMI not masked)
0               1       Enabled

Reason for the this change is to create foundation for a third mask
value "0x2" for "soft_enabled" to add support to mask PMIs. When
->soft_enabled is set to a value "3", PMI interrupts are mask and when
set to a value of "1", PMI are not mask. With this patch also extends
soft_enabled as interrupt disable mask.

Current flags are renamed from IRQ_[EN?DIS}ABLED to
IRQS_ENABLED and IRQS_DISABLED.

Patch also fixes the ptrace call to force the user to see the softe
value to be alway 1. Reason being, even though userspace has no
business knowing about softe, it is part of pt_regs. Like-wise in
signal context.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>