openwrt/staging/blogic.git
6 years agospi: cadence: Fix missing clk_disable_unprepare() on error in cnds_runtime_resume()
Wei Yongjun [Wed, 11 Jul 2018 13:18:59 +0000 (13:18 +0000)]
spi: cadence: Fix missing clk_disable_unprepare() on error in cnds_runtime_resume()

Fix the missing clk_disable_unprepare() before return
from cnds_runtime_resume() in the error handling case.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: spi-fsl-dspi: Fixup regmap configuration
Esben Haabendal [Wed, 20 Jun 2018 07:34:36 +0000 (09:34 +0200)]
spi: spi-fsl-dspi: Fixup regmap configuration

Mark volatile registers to avoid caching bugs.

Note: SPI_MCR is marked volatile because of CLR_TXF and CLR_RXF bits.

Signed-off-by: Esben Haabendal <eha@deif.com>
Acked-by: Martin Hundebøll <martin@geanix.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: orion: fix CS GPIO handling again
Jan Kundrát [Mon, 4 Jun 2018 14:34:25 +0000 (16:34 +0200)]
spi: orion: fix CS GPIO handling again

The code did not de-assert any CS GPIOs before probing slaves. This
means that several CS signals could be active at once, garbling the
communication. Whether this was actually a problem depended on the type
of the SPI device attached (so my "spidev" for userspace access worked
correctly because its probe was effectively a no-op), and on the state
of the GPIO pins at SoC's boot.

The code was already iterating through all DT children of the SPI
controller, so this change re-uses that loop for CS GPIO setup as well.
This means that this might change the number of the HW CS signal which
is picked for all GPIO CS devices. Previously, the lowest one was used,
but we now use the first one from the DT.

With this move of the code, we can also finally initialize each GPIO CS
lane before registering the SPI controller (which in turn probes for
slaves).

I tried to fix this in 544248623b95 already, but that only did it half
way by registering the GPIOs properly. That patch failed to set their
logic signals early enough, though.

Signed-off-by: Jan Kundrát <jan.kundrat@cesnet.cz>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: cadence: Change usleep_range() to udelay(), for atomic context
Janek Kotas [Mon, 4 Jun 2018 11:24:44 +0000 (11:24 +0000)]
spi: cadence: Change usleep_range() to udelay(), for atomic context

The path "spi: cadence: Add usleep_range() for
cdns_spi_fill_tx_fifo()" added a usleep_range() function call,
which cannot be used in atomic context.
However the cdns_spi_fill_tx_fifo() function can be called during
an interrupt which may result in a kernel panic:

BUG: scheduling while atomic: grep/561/0x00010002
Modules linked in:
Preemption disabled at:
[<ffffff800858ea28>] wait_for_common+0x48/0x178
CPU: 0 PID: 561 Comm: grep Not tainted 4.17.0 #1
Hardware name: Cadence CSP (DT)
Call trace:
 dump_backtrace+0x0/0x198
 show_stack+0x14/0x20
 dump_stack+0x8c/0xac
 __schedule_bug+0x6c/0xb8
 __schedule+0x570/0x5d8
 schedule+0x34/0x98
 schedule_hrtimeout_range_clock+0x98/0x110
 schedule_hrtimeout_range+0x10/0x18
 usleep_range+0x64/0x98
 cdns_spi_fill_tx_fifo+0x70/0xb0
 cdns_spi_irq+0xd0/0xe0
 __handle_irq_event_percpu+0x9c/0x128
 handle_irq_event_percpu+0x34/0x88
 handle_irq_event+0x48/0x78
 handle_fasteoi_irq+0xbc/0x1b0
 generic_handle_irq+0x24/0x38
 __handle_domain_irq+0x84/0xf8
 gic_handle_irq+0xc4/0x180

This patch replaces the function call with udelay() which can be
used in an atomic context, like an interrupt.

Signed-off-by: Jan Kotas <jank@cadence.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
6 years agospi: sh-msiof: Make sure all DMA operations have completed
Geert Uytterhoeven [Wed, 13 Jun 2018 08:41:15 +0000 (10:41 +0200)]
spi: sh-msiof: Make sure all DMA operations have completed

In case of a bi-directional transfer, receive DMA may complete in the
rcar-dmac driver before transmit DMA, due to scheduling latencies.
As the MSIOF driver waits for completion of the receive DMA only, it may
submit the next transmit DMA request before the previous one has
completed.

Make the driver more robust by waiting for the completion of both
receive and transmit DMA, when applicable.

Based on a patch in the BSP by Ryo Kataoka.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agoMerge branch 'spi-4.17' into spi-4.18 for the merge window
Mark Brown [Mon, 4 Jun 2018 10:51:12 +0000 (11:51 +0100)]
Merge branch 'spi-4.17' into spi-4.18 for the merge window

6 years agospi: Fix typo on SPI_MEM help text
Fabio Estevam [Wed, 30 May 2018 19:29:15 +0000 (16:29 -0300)]
spi: Fix typo on SPI_MEM help text

The correct form is "a high-level", so fix it accordingly.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: sh-msiof: Fix setting SIRMDR1.SYNCAC to match SITMDR1.SYNCAC
Geert Uytterhoeven [Wed, 23 May 2018 09:02:04 +0000 (11:02 +0200)]
spi: sh-msiof: Fix setting SIRMDR1.SYNCAC to match SITMDR1.SYNCAC

According to section 59.2.4 MSIOF Receive Mode Register 1 (SIRMDR1) in
the R-Car Gen3 datasheet Rev.1.00, the value of the SIRMDR1.SYNCAC bit
must match the value of the SITMDR1.SYNCAC bit.  However,
sh_msiof_spi_setup() changes only the latter.

Fix this by updating the SIRMDR1 register like the SITMDR1 register,
taking into account register bits that exist in SITMDR1 only.

Reported-by: Renesas BSP team via Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: 7ff0b53c4051145d ("spi: sh-msiof: Avoid writing to registers from spi_master.setup()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agomtd: devices: m25p80: Use spi_mem_set_drvdata() instead of spi_set_drvdata()
Boris Brezillon [Tue, 22 May 2018 10:55:14 +0000 (12:55 +0200)]
mtd: devices: m25p80: Use spi_mem_set_drvdata() instead of spi_set_drvdata()

SPI mem drivers should use spi_mem_set_drvdata() not spi_set_drvdata()
to store their private data. Using spi_set_drvdata() will mess the
spi -> spi-mem link up and cause a kernel panic at shutdown or
device removal time.

Fixes: 4120f8d158ef ("mtd: spi-nor: Use the spi_mem_xx() API")
Reported-by: Marek Vasut <marek.vasut@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Marek Vasut <marek.vasut+renesas@gmail.com> on R8A7791 Porter
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: omap2-mcspi: Remove unnecessary pm_runtime_force_suspend()
Tony Lindgren [Fri, 18 May 2018 17:30:08 +0000 (10:30 -0700)]
spi: omap2-mcspi: Remove unnecessary pm_runtime_force_suspend()

Commit 5a686b2c9ed4 ("spi: omap2-mcspi: Idle hardware during suspend
and resume") added calls for pm_runtime_force_suspend() and
pm_runtime_force_resume() to make sure spi is idled between
device_prepare() and device_complete().

But testing Linux next, I now noticed that we will get the following:

spi_master spi0: Failed to power device: -13

Looking at things more turns out we can just remove this non-standard
code. I was probably testing with some extra experimental patches
earlier when I thought we need pm_runtime_force_suspend() and
pm_runtime_force_resume().

Fixes: 5a686b2c9ed4 ("spi: omap2-mcspi: Idle hardware during suspend
and resume")
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: Add missing pm_runtime_put_noidle() after failed get
Tony Lindgren [Fri, 18 May 2018 17:30:07 +0000 (10:30 -0700)]
spi: Add missing pm_runtime_put_noidle() after failed get

If pm_runtime_get_sync() fails we should call pm_runtime_put_noidle().
This is probably not a critical fix as we should only hit this when
things are broken elsewhere.

Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: ti-qspi: Make sure res_mmap != NULL before dereferencing it
Boris Brezillon [Mon, 14 May 2018 09:11:29 +0000 (11:11 +0200)]
spi: ti-qspi: Make sure res_mmap != NULL before dereferencing it

resource_size() is dereferencing the res without checking that it is
not NULL, so we need to do the check before calling resource_size().

Fixes: b95cb394ab59 ("spi: ti-qspi: Implement the spi_mem interface")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: spi-s3c64xx: Fix system resume support
Marek Szyprowski [Wed, 16 May 2018 08:42:39 +0000 (10:42 +0200)]
spi: spi-s3c64xx: Fix system resume support

Since Linux v4.10 release (commit 1d9174fbc55e "PM / Runtime: Defer
resuming of the device in pm_runtime_force_resume()"),
pm_runtime_force_resume() function doesn't runtime resume device if it was
not runtime active before system suspend. Thus, driver should not do any
register access after pm_runtime_force_resume() without checking the
runtime status of the device. To fix this issue, simply move
s3c64xx_spi_hwinit() call to s3c64xx_spi_runtime_resume() to ensure that
hardware is always properly initialized. This fixes Synchronous external
abort issue on system suspend/resume cycle on newer Exynos SoCs.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
6 years agospi: bcm-qspi: Fix build failure caused by spi_flash_read() API removal
Boris Brezillon [Sat, 12 May 2018 06:24:54 +0000 (08:24 +0200)]
spi: bcm-qspi: Fix build failure caused by spi_flash_read() API removal

Patch http://patchwork.ozlabs.org/patch/905205/ has been partially
applied, and changes to the bcm-qspi driver have been lost somehow
(probably due to a conflict when applying the patch).

Remove the ->spi_flash_read() bits from this driver to fix the build
error.

Fixes: c1f5ba70decf ("spi: Get rid of the spi_flash_read() API")
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: Get rid of the spi_flash_read() API
Boris Brezillon [Thu, 26 Apr 2018 16:18:20 +0000 (18:18 +0200)]
spi: Get rid of the spi_flash_read() API

This API has been replaced by the spi_mem_xx() one, its only user
(spi-nor) has been converted to spi_mem_xx() and all SPI controller
drivers that were implementing the ->spi_flash_xxx() hooks are also
implementing the spi_mem ones. So we can safely get rid of this API.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@exceet.de>
Tested-by: Frieder Schrempf <frieder.schrempf@exceet.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agomtd: spi-nor: Use the spi_mem_xx() API
Boris Brezillon [Thu, 26 Apr 2018 16:18:19 +0000 (18:18 +0200)]
mtd: spi-nor: Use the spi_mem_xx() API

The spi_mem_xxx() API has been introduced to replace the
spi_flash_read() one. Make use of it so we can get rid of
spi_flash_read().

Note that using spi_mem_xx() also simplifies the code because this API
takes care of using the regular spi_sync() interface when the optimized
->mem_ops interface is not implemented by the controller.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@exceet.de>
Tested-by: Frieder Schrempf <frieder.schrempf@exceet.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: ti-qspi: Implement the spi_mem interface
Boris Brezillon [Thu, 26 Apr 2018 16:18:18 +0000 (18:18 +0200)]
spi: ti-qspi: Implement the spi_mem interface

The spi_mem interface is meant to replace the spi_flash_read() one.
Implement the ->exec_op() method so that we can smoothly get rid of the
old interface.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Reviewed-by: Vignesh R <vigneshr@ti.com>
Tested-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: bcm-qspi: Implement the spi_mem interface
Boris Brezillon [Thu, 26 Apr 2018 16:18:16 +0000 (18:18 +0200)]
spi: bcm-qspi: Implement the spi_mem interface

The spi_mem interface is meant to replace the ->spi_flash_read() one.
Implement the ->exec_op() method to ease removal of the old interface.

Not that ->spi_flash_read() is now implemented as a wrapper around the
new bcm_qspi_exec_mem_op() function so that we can easily get rid of
it when ->spi_flash_read() is removed.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Reviewed-by: Kamal Dasu <kdasu.kdev@gmail.com>
Tested-by: Kamal Dasu <kdasu.kdev@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: Make support for regular transfers optional when ->mem_ops != NULL
Boris Brezillon [Thu, 26 Apr 2018 16:18:15 +0000 (18:18 +0200)]
spi: Make support for regular transfers optional when ->mem_ops != NULL

Some SPI/QuadSPI controllers only expose a high-level SPI memory
interface, thus preventing any regular SPI transfers from being done.

In that case, SPI controller drivers can leave all ->transfer_xxx()
hooks empty and only implement the spi_mem_ops interface.

Adjust the core to allow such situations:
- extend spi_controller_check_ops() to accept situations where all
  ->transfer_xxx() pointers are NULL only if ->mem_ops != NULL
- make sure we do not initialize the SPI message queue if
  ctlr->transfer_one and ctlr->transfer_one_message are missing
- return -ENOTSUPP if someone tries to do a regular SPI transfer on
  a controller that does not support it

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@exceet.de>
Tested-by: Frieder Schrempf <frieder.schrempf@exceet.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: Extend the core to ease integration of SPI memory controllers
Boris Brezillon [Thu, 26 Apr 2018 16:18:14 +0000 (18:18 +0200)]
spi: Extend the core to ease integration of SPI memory controllers

Some controllers are exposing high-level interfaces to access various
kind of SPI memories. Unfortunately they do not fit in the current
spi_controller model and usually have drivers placed in
drivers/mtd/spi-nor which are only supporting SPI NORs and not SPI
memories in general.

This is an attempt at defining a SPI memory interface which works for
all kinds of SPI memories (NORs, NANDs, SRAMs).

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Reviewed-by: Frieder Schrempf <frieder.schrempf@exceet.de>
Tested-by: Frieder Schrempf <frieder.schrempf@exceet.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: remove forgotten CONFIG_SPI_BCM53XX
Rafał Miłecki [Thu, 10 May 2018 05:27:58 +0000 (07:27 +0200)]
spi: remove forgotten CONFIG_SPI_BCM53XX

I accidentally sent an early version of patch removing spi-bcm53xx
driver which got rid of .c and .h files *only*. I amended local commit
but forgot to re-format the patch.

This commit removes leftovers of dropped driver.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: remove the older/duplicated bcm53xx driver
Rafał Miłecki [Mon, 7 May 2018 09:27:03 +0000 (11:27 +0200)]
spi: remove the older/duplicated bcm53xx driver

This driver was added by commit 0fc6a323e1917 ("spi: bcm53xx: driver for
SPI controller on Broadcom bcma SoC") back in 2014. It was needed to
provide a minimal support for SPI controller on BCM5301X (AKA Northstar)
devices.

An alternative driver was added by Kamal in commit fa236a7ef2404 ("spi:
bcm-qspi: Add Broadcom MSPI driver") 2 years later. It supports the same
hardware but for some reason a new driver has been developed for it.

At this point the new driver supports: more modes, setting a speed,
setting bits per word and uses IRQs instead of polling. DTS file for
BCM5301X has also been updated in the commit 1c8f406507238 ("ARM: dts:
BCM5301X: convert to iProc QSPI") - over a year ago.

That explained I see no reason to keep the old driver alive.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: pxa2xx: check clk_prepare_enable() return value
Tobias Jordan [Mon, 30 Apr 2018 14:30:06 +0000 (16:30 +0200)]
spi: pxa2xx: check clk_prepare_enable() return value

clk_prepare_enable() can fail, so its return value should be checked and
acted upon.

Found by Linux Driver Verification project (linuxtesting.org).

Fixes: 3343b7a6d2cd ("spi/pxa2xx: convert to the common clk framework")
Signed-off-by: Tobias Jordan <Tobias.Jordan@elektrobit.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: bcm2835aux: ensure interrupts are enabled for shared handler
Rob Herring [Thu, 3 May 2018 18:09:44 +0000 (13:09 -0500)]
spi: bcm2835aux: ensure interrupts are enabled for shared handler

The BCM2835 AUX SPI has a shared interrupt line (with AUX UART).
Downstream fixes this with an AUX irqchip to demux the IRQ sources and a
DT change which breaks compatibility with older kernels. The AUX irqchip
was already rejected for upstream[1] and the DT change would break
working systems if the DTB is updated to a newer one. The latter issue
was brought to my attention by Alex Graf.

The root cause however is a bug in the shared handler. Shared handlers
must check that interrupts are actually enabled before servicing the
interrupt. Add a check that the TXEMPTY or IDLE interrupts are enabled.

[1] https://patchwork.kernel.org/patch/9781221/

Cc: Alexander Graf <agraf@suse.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ray Jui <rjui@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: linux-spi@vger.kernel.org
Cc: linux-rpi-kernel@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: lpspi: Switch to SPDX identifier
Fabio Estevam [Wed, 2 May 2018 19:18:29 +0000 (16:18 -0300)]
spi: lpspi: Switch to SPDX identifier

Adopt the SPDX license identifier headers to ease license compliance
management.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: mxs: Switch to SPDX identifier
Fabio Estevam [Wed, 2 May 2018 19:18:28 +0000 (16:18 -0300)]
spi: mxs: Switch to SPDX identifier

Adopt the SPDX license identifier headers to ease license compliance
management.

Cc: Marek Vasut <marex@denx.de>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: imx: Switch to SPDX identifier
Fabio Estevam [Wed, 2 May 2018 19:18:27 +0000 (16:18 -0300)]
spi: imx: Switch to SPDX identifier

Adopt the SPDX license identifier headers to ease license compliance
management.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: omap2-mcspi: Idle hardware during suspend and resume
Tony Lindgren [Fri, 27 Apr 2018 15:50:07 +0000 (08:50 -0700)]
spi: omap2-mcspi: Idle hardware during suspend and resume

We currently are calling mcspi suspend and resume without considering
that mcspi might provide resources for other device driver such as
regulators. This means resume can fail and will produce -EACCES if
errors if anything calls mcspi functions between device_prepare()
and device_complete().

To fix the issue, let's do the following changes:

1. Let's add checking for return values for pm_runtime_get calls,
   and call pm_runtime_put_noidle() on errors. Things still fail
   after this change, but at least we see something is wrong as
   we now see -EACCES errors on resume.

2. Let's use noirq level for suspend and resume as other drivers
   can still call SPI related functions on suspend and resume. This
   still won't fix the -EACCES issue, but gets us to something a bit
   saner.

3. Finally, let's modify suspend and resume to call to make sure
   the device is idled properly on suspend. We have device_prepare()
   call pm_runtime_get_noresume() that won't get released until in
   device_complete() when it calls pm_runtime_put(). So if SPI is
   still active on entering suspend, it will never get idled unless
   we add calls to pm_runtime_force_suspend() and resume. This also
   fixes the -EACCES errors on resume together with changes 1 and 2
   above.

And since we're already rewriting suspend resume functions, let's
arrange the order of suspend and resume functions to be like they
usually are with suspend first.

Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: omap2-mcspi: Restore context always in runtime_resume
Tony Lindgren [Wed, 25 Apr 2018 14:08:43 +0000 (07:08 -0700)]
spi: omap2-mcspi: Restore context always in runtime_resume

We can have the SoC enter off mode also during idle, not just
during suspend. Currently we are handling the CS restore properly
for unused CS only for resume and not for runtime resume.

Let's just move all the context related restore to runtime_resume().

Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: meson-spicc: Fix error handling in meson_spicc_probe()
Alexey Khoroshilov [Sat, 28 Apr 2018 22:46:23 +0000 (01:46 +0300)]
spi: meson-spicc: Fix error handling in meson_spicc_probe()

If devm_spi_register_master() fails in meson_spicc_probe(),
spicc->core is left undisabled. The patch fixes that.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: bcm-qspi: Always read and set BSPI_MAST_N_BOOT_CTRL
Kamal Dasu [Thu, 26 Apr 2018 18:48:01 +0000 (14:48 -0400)]
spi: bcm-qspi: Always read and set BSPI_MAST_N_BOOT_CTRL

Always confirm the BSPI_MAST_N_BOOT_CTRL bit when enabling
or disabling BSPI transfers.

Fixes: 4e3b2d236fe00 ("spi: bcm-qspi: Add BSPI spi-nor flash controller driver")
Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
6 years agospi: bcm-qspi: Avoid setting MSPI_CDRAM_PCS for spi-nor master
Kamal Dasu [Thu, 26 Apr 2018 18:48:00 +0000 (14:48 -0400)]
spi: bcm-qspi: Avoid setting MSPI_CDRAM_PCS for spi-nor master

Added fix for probing of spi-nor device non-zero chip selects. Set
MSPI_CDRAM_PCS (peripheral chip select) with spi master for MSPI
controller and not for MSPI/BSPI spi-nor master controller. Ensure
setting of cs bit in chip select register on chip select change.

Fixes: fa236a7ef24048 ("spi: bcm-qspi: Add Broadcom MSPI driver")
Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
6 years agospi: mpc52xx: Use gpio_is_valid()
Arvind Yadav [Fri, 27 Apr 2018 11:02:17 +0000 (16:32 +0530)]
spi: mpc52xx: Use gpio_is_valid()

Replace the manual validity checks for the GPIO with the
gpio_is_valid().

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: Check presence the of ->transfer[_xxx]() before registering a controller
Boris Brezillon [Tue, 10 Apr 2018 22:44:30 +0000 (00:44 +0200)]
spi: Check presence the of ->transfer[_xxx]() before registering a controller

Right now, no checks are done on the presence of a ->transfer[_xxx]()
method, which can lead to a NULL pointer dereference when someone
starts sending something on the bus.

Do the check at registration time and refuse to add the controller if
all ->transfer[_xxx]() pointers are NULL.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi/bcm63xx-hspi: Enable the clock before calling clk_get_rate().
Stefan Potyra [Thu, 26 Apr 2018 07:28:02 +0000 (09:28 +0200)]
spi/bcm63xx-hspi: Enable the clock before calling clk_get_rate().

Enable the clock prior to calling clk_get_rate(), because clk_get_rate()
should only be called if the clock is enabled.

Additionally, prepare/enable the pll_clk before calling clk_get_rate()
for the same reason.

Found by Linux Driver Verification project (linuxtesting.org).

Fixes: 142168eba9dc ("spi: bcm63xx-hsspi: add bcm63xx HSSPI driver")
Signed-off-by: Stefan Potyra <Stefan.Potyra@elektrobit.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: s3c64xx: samsung: Remove support for Exynos5440
Krzysztof Kozlowski [Tue, 24 Apr 2018 20:32:37 +0000 (22:32 +0200)]
spi: s3c64xx: samsung: Remove support for Exynos5440

The Exynos5440 is not actively developed, there are no development
boards available and probably there are no real products with it.
Remove wide-tree support for Exynos5440.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: pxa2xx: Allow 64-bit DMA
Andy Shevchenko [Thu, 19 Apr 2018 16:53:32 +0000 (19:53 +0300)]
spi: pxa2xx: Allow 64-bit DMA

Currently the 32-bit device address only is supported for DMA. However,
starting from Intel Sunrisepoint PCH the DMA address of the device FIFO
can be 64-bit.

Change the respective variable to be compatible with DMA engine
expectations, i.e. to phys_addr_t.

Fixes: 34cadd9c1bcb ("spi: pxa2xx: Add support for Intel Sunrisepoint")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
6 years agospi: Add an helper to flush the message queue
Boris Brezillon [Sun, 22 Apr 2018 18:35:15 +0000 (20:35 +0200)]
spi: Add an helper to flush the message queue

This is needed by the spi-mem logic to force all messages that have been
queued before a memory operation to be sent before we start the memory
operation. We do that in order to guarantee that spi-mem operations do
not preempt regular SPI transfers.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: Expose spi_{map,unmap}_buf() for internal use
Boris Brezillon [Sun, 22 Apr 2018 18:35:14 +0000 (20:35 +0200)]
spi: Expose spi_{map,unmap}_buf() for internal use

spi_{map,unmap}_buf() are needed by the spi-mem logic that is about to
be introduced to prepare data buffer for DMA operations.

Remove the static specifier on these functions and add their prototypes
to drivers/spi/internals.h. We do not export the symbols here because
both SPI_MEM and SPI can't be enabled as modules and we'd like to
prevent controller/device drivers from using these functions.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: simplify getting .drvdata
Wolfram Sang [Thu, 19 Apr 2018 14:06:16 +0000 (16:06 +0200)]
spi: simplify getting .drvdata

We should get drvdata from struct device directly. Going via
platform_device is an unneeded step back and forth.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: pxa2xx: Allow 64-bit DMA
Andy Shevchenko [Thu, 19 Apr 2018 16:53:32 +0000 (19:53 +0300)]
spi: pxa2xx: Allow 64-bit DMA

Currently the 32-bit device address only is supported for DMA. However,
starting from Intel Sunrisepoint PCH the DMA address of the device FIFO
can be 64-bit.

Change the respective variable to be compatible with DMA engine
expectations, i.e. to phys_addr_t.

Fixes: 34cadd9c1bcb ("spi: pxa2xx: Add support for Intel Sunrisepoint")
Cc: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: spi-s3c64xx: Allow higher transfer lengths in polling IO mode
Sylwester Nawrocki [Tue, 17 Apr 2018 14:29:54 +0000 (16:29 +0200)]
spi: spi-s3c64xx: Allow higher transfer lengths in polling IO mode

Some variants of the SPI controller have no DMA support, in such case
SPI transfers longer than the FIFO length are not currently properly
handled by the driver. Fix it by doing multiple transfers in the
s3c64xx_spi_transfer_one() function if the SPI transfer length exceeds
the FIFO size.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Andi Shyti <andi@etezian.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: spi-s3c64xx: Use local variable for FIFO length
Sylwester Nawrocki [Tue, 17 Apr 2018 14:29:53 +0000 (16:29 +0200)]
spi: spi-s3c64xx: Use local variable for FIFO length

More references to fifo_len are added in subsequent patch.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Andi Shyti <andi@etezian.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: pxa2xx: pxa2xx_spi_transfer_one() can be static
kbuild test robot [Tue, 17 Apr 2018 19:53:23 +0000 (03:53 +0800)]
spi: pxa2xx: pxa2xx_spi_transfer_one() can be static

Fixes: d5898e19c0d7 ("spi: pxa2xx: Use core message processing loop")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: spi-s3c64xx: Add missing s3c64xx_ prefix to function names
Sylwester Nawrocki [Tue, 17 Apr 2018 14:29:51 +0000 (16:29 +0200)]
spi: spi-s3c64xx: Add missing s3c64xx_ prefix to function names

Add a s3c64xx_ prefix to remaining generic function names so it is clear
the code is part of the driver when grepping or looking at debug logs.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Andi Shyti <andi@etezian.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: spi-s3c64xx: Drop unused enable_datapath() function argument
Sylwester Nawrocki [Tue, 17 Apr 2018 14:29:50 +0000 (16:29 +0200)]
spi: spi-s3c64xx: Drop unused enable_datapath() function argument

The spi pointer argument is not used now so remove it.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Andi Shyti <andi@etezian.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: cadence: Add usleep_range() for cdns_spi_fill_tx_fifo()
sxauwsk [Mon, 16 Apr 2018 20:01:27 +0000 (04:01 +0800)]
spi: cadence: Add usleep_range() for cdns_spi_fill_tx_fifo()

In case of xspi work in busy condition, may send bytes failed.
once something wrong, spi controller did't work any more

My test found this situation appear in both of read/write process.
so when TX FIFO is full, add one byte delay before send data;

Signed-off-by: sxauwsk <sxauwsk@163.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: Remove depends on HAS_DMA in case of platform dependency
Geert Uytterhoeven [Tue, 17 Apr 2018 17:49:18 +0000 (19:49 +0200)]
spi: Remove depends on HAS_DMA in case of platform dependency

Remove dependencies on HAS_DMA where a Kconfig symbol depends on another
symbol that implies HAS_DMA, and, optionally, on "|| COMPILE_TEST".
In most cases this other symbol is an architecture or platform specific
symbol, or PCI.

Generic symbols and drivers without platform dependencies keep their
dependencies on HAS_DMA, to prevent compiling subsystems or drivers that
cannot work anyway.

This simplifies the dependencies, and allows to improve compile-testing.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: pxa2xx: Use core message processing loop
Jarkko Nikula [Tue, 17 Apr 2018 14:20:02 +0000 (17:20 +0300)]
spi: pxa2xx: Use core message processing loop

Convert the pump_transfers() transfer tasklet to transfer_one() hook the
SPI core calls to process single transfer instead of handling message
processing and chip select handling in the driver. This not only
simplifies the driver but also brings transfer statistics from the core.

Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: pxa2xx: Remove pump_transfers string from dev_ prints
Jarkko Nikula [Tue, 17 Apr 2018 14:20:01 +0000 (17:20 +0300)]
spi: pxa2xx: Remove pump_transfers string from dev_ prints

We are going to rename and modify pump_transfers(). Prepare for it by
removing the string "pump_transfers:" from error and warning prints.

While at it make these user-visible strings single line in sources as it
helps source grepping from error reports.

Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: pxa2xx: Remove unused argument from pxa2xx_spi_dma_prepare()
Jarkko Nikula [Tue, 17 Apr 2018 14:20:00 +0000 (17:20 +0300)]
spi: pxa2xx: Remove unused argument from pxa2xx_spi_dma_prepare()

Current DMA engine implementation of pxa2xx_spi_dma_prepare() don't use
the dma_burst argument. Remove it since it became unused after
commit 6356437e65c2 ("spi: spi-pxa2xx: remove legacy PXA DMA bits").

Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: spi-s3c64xx: Fix indentation in the register offset definitions
Sylwester Nawrocki [Mon, 16 Apr 2018 15:40:20 +0000 (17:40 +0200)]
spi: spi-s3c64xx: Fix indentation in the register offset definitions

Change indentation so register address offset and register bit definitions
are aligned to same column.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Andi Shyti <andi@etezian.org>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: spi-s3c64xx: Do not ignore timeout errors in polling I/O mode
Sylwester Nawrocki [Mon, 16 Apr 2018 15:40:19 +0000 (17:40 +0200)]
spi: spi-s3c64xx: Do not ignore timeout errors in polling I/O mode

Currently timeout errors in polling I/O mode transfer are silently ignored.
Fix it by returning an error when we time out waiting on the RX FIFO level
to reach the transfer length.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Andi Shyti <andi@etezian.org>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: spi-s3c64xx: Remove unused s3c64xx_spi_hwinit() function argument
Sylwester Nawrocki [Mon, 16 Apr 2018 15:40:17 +0000 (17:40 +0200)]
spi: spi-s3c64xx: Remove unused s3c64xx_spi_hwinit() function argument

The channel argument is not used and anyway it could be retrieved from
the passed driver data structure.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Andi Shyti <andi@etezian.org>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: spi-s3c64xx: Remove unused driver data structure tgl_spi field
Sylwester Nawrocki [Mon, 16 Apr 2018 15:40:16 +0000 (17:40 +0200)]
spi: spi-s3c64xx: Remove unused driver data structure tgl_spi field

The tgl_spi pointer is now unused so remove it.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Andi Shyti <andi@etezian.org>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: zynqmp: Add pm runtime support
Naga Sureshkumar Relli [Mon, 26 Mar 2018 13:04:20 +0000 (18:34 +0530)]
spi: zynqmp: Add pm runtime support

This patch adds runtime pm functions.

Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>
Signed-off-by: Naga Sureshkumar Relli <nagasure@xilinx.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: sh-msiof: Simplify calculation of divisors for transfer rate
Vladimir Zapolskiy [Fri, 13 Apr 2018 12:44:17 +0000 (15:44 +0300)]
spi: sh-msiof: Simplify calculation of divisors for transfer rate

The change updates sh_msiof_spi_set_clk_regs() function by iterating
over BRDV power values. Note that the change is a functional one, namely
prescaler output x 1/1 set in BRDV bit field (0b111) for MSO division
rate set to 2 is substituted by BRDV = 0b000 and BRPS = 0b0, in terms
of written values to TSCR setting of 0x0107 is substituted by 0x0000,
and for all input parameter cases this is the only functional change,
which touches the controller.

As a result of the rework the function is supposed to be slightly more
efficient and more readable and maintainable in case of any further
extensions.

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: stm32: Fix error handling in stm32_spi_probe()
Alexey Khoroshilov [Fri, 30 Mar 2018 19:54:44 +0000 (22:54 +0300)]
spi: stm32: Fix error handling in stm32_spi_probe()

clk_get_rate() is below clk_prepare_enable(), so
its error should lead to goto err_clk_disable, not to err_master_put.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Reviewed-by: Amelie Delaunay <amelie.delaunay@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: sh-msiof: Fix bit field overflow writes to TSCR/RSCR
Vladimir Zapolskiy [Fri, 13 Apr 2018 12:44:16 +0000 (15:44 +0300)]
spi: sh-msiof: Fix bit field overflow writes to TSCR/RSCR

The change fixes a bit field overflow which allows to write to higher
bits while calculating SPI transfer clock and setting BRPS and BRDV
bit fields, the problem is reproduced if 'parent_rate' to 'spi_hz'
ratio is greater than 1024, for instance

  p->min_div      = 2,
  MSO rate        = 33333333,
  SPI device rate = 10000

results in

  k          = 5, i.e. BRDV = 0b100 or 1/32 prescaler output,
  BRPS       = 105,
  TSCR value = 0x6804, thus MSSEL and MSIMM bit fields are non-zero.

Fixes: 65d5665bb260 ("spi: sh-msiof: Update calculation of frequency dividing")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agospi: imx: Update MODULE_DESCRIPTION to "SPI Controller driver"
wangbo [Thu, 12 Apr 2018 08:58:08 +0000 (16:58 +0800)]
spi: imx: Update MODULE_DESCRIPTION to "SPI Controller driver"

Now i.MX SPI controller can work in Slave mode.
Update MODULE_DESCRIPTION to "SPI Controller driver".

Signed-off-by: wangbo <wang.bo116@zte.com.cn>
Signed-off-by: Mark Brown <broonie@kernel.org>
6 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Thu, 12 Apr 2018 01:58:27 +0000 (18:58 -0700)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio update from Michael Tsirkin:
 "This adds reporting hugepage stats to virtio-balloon"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_balloon: export hugetlb page allocation counts

6 years agoMerge tag 'iommu-updates-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 12 Apr 2018 01:50:41 +0000 (18:50 -0700)]
Merge tag 'iommu-updates-v4.17' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:

 - OF_IOMMU support for the Rockchip iommu driver so that it can use
   generic DT bindings

 - rework of locking in the AMD IOMMU interrupt remapping code to make
   it work better in RT kernels

 - support for improved iotlb flushing in the AMD IOMMU driver

 - support for 52-bit physical and virtual addressing in the ARM-SMMU

 - various other small fixes and cleanups

* tag 'iommu-updates-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (53 commits)
  iommu/io-pgtable-arm: Avoid warning with 32-bit phys_addr_t
  iommu/rockchip: Support sharing IOMMU between masters
  iommu/rockchip: Add runtime PM support
  iommu/rockchip: Fix error handling in init
  iommu/rockchip: Use OF_IOMMU to attach devices automatically
  iommu/rockchip: Use IOMMU device for dma mapping operations
  dt-bindings: iommu/rockchip: Add clock property
  iommu/rockchip: Control clocks needed to access the IOMMU
  iommu/rockchip: Fix TLB flush of secondary IOMMUs
  iommu/rockchip: Use iopoll helpers to wait for hardware
  iommu/rockchip: Fix error handling in attach
  iommu/rockchip: Request irqs in rk_iommu_probe()
  iommu/rockchip: Fix error handling in probe
  iommu/rockchip: Prohibit unbind and remove
  iommu/amd: Return proper error code in irq_remapping_alloc()
  iommu/amd: Make amd_iommu_devtable_lock a spin_lock
  iommu/amd: Drop the lock while allocating new irq remap table
  iommu/amd: Factor out setting the remap table for a devid
  iommu/amd: Use `table' instead `irt' as variable name in amd_iommu_update_ga()
  iommu/amd: Remove the special case from alloc_irq_table()
  ...

6 years agoMerge tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Thu, 12 Apr 2018 00:03:20 +0000 (17:03 -0700)]
Merge tag 'pm-4.17-rc1-2' of git://git./linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These include one big-ticket item which is the rework of the idle loop
  in order to prevent CPUs from spending too much time in shallow idle
  states. It reduces idle power on some systems by 10% or more and may
  improve performance of workloads in which the idle loop overhead
  matters. This has been in the works for several weeks and it has been
  tested and reviewed quite thoroughly.

  Also included are changes that finalize the cpufreq cleanup moving
  frequency table validation from drivers to the core, a few fixes and
  cleanups of cpufreq drivers, a cpuidle documentation update and a PM
  QoS core update to mark the expected switch fall-throughs in it.

  Specifics:

   - Rework the idle loop in order to prevent CPUs from spending too
     much time in shallow idle states by making it stop the scheduler
     tick before putting the CPU into an idle state only if the idle
     duration predicted by the idle governor is long enough.

     That required the code to be reordered to invoke the idle governor
     before stopping the tick, among other things (Rafael Wysocki,
     Frederic Weisbecker, Arnd Bergmann).

   - Add the missing description of the residency sysfs attribute to the
     cpuidle documentation (Prashanth Prakash).

   - Finalize the cpufreq cleanup moving frequency table validation from
     drivers to the core (Viresh Kumar).

   - Fix a clock leak regression in the armada-37xx cpufreq driver
     (Gregory Clement).

   - Fix the initialization of the CPU performance data structures for
     shared policies in the CPPC cpufreq driver (Shunyong Yang).

   - Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers a
     bit (Viresh Kumar, Rafael Wysocki).

   - Mark the expected switch fall-throughs in the PM QoS core (Gustavo
     Silva)"

* tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
  tick-sched: avoid a maybe-uninitialized warning
  cpufreq: Drop cpufreq_table_validate_and_show()
  cpufreq: SCMI: Don't validate the frequency table twice
  cpufreq: CPPC: Initialize shared perf capabilities of CPUs
  cpufreq: armada-37xx: Fix clock leak
  cpufreq: CPPC: Don't set transition_latency
  cpufreq: ti-cpufreq: Use builtin_platform_driver()
  cpufreq: intel_pstate: Do not include debugfs.h
  PM / QoS: mark expected switch fall-throughs
  cpuidle: Add definition of residency to sysfs documentation
  time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
  nohz: Avoid duplication of code related to got_idle_tick
  nohz: Gather tick_sched booleans under a common flag field
  cpuidle: menu: Avoid selecting shallow states with stopped tick
  cpuidle: menu: Refine idle state selection for running tick
  sched: idle: Select idle state before stopping the tick
  time: hrtimer: Introduce hrtimer_next_event_without()
  time: tick-sched: Split tick_nohz_stop_sched_tick()
  cpuidle: Return nohz hint from cpuidle_select()
  jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
  ...

6 years agoMerge tag 'ktest-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Wed, 11 Apr 2018 23:42:27 +0000 (16:42 -0700)]
Merge tag 'ktest-v4.17' of git://git./linux/kernel/git/rostedt/linux-ktest

Pull ktest updates from Steven Rostedt:
 "These commits have either been sitting in my INBOX or have been in my
  local tree for some time. I need to push them upstream:

   - Separate out config-bisect.pl from ktest.pl.

     This allows users to do config bisects without full ktest setup.

   - Email on status change.

     Allow the user to be emailed on test start, finish, failure, etc.

   - Other small fixes and enhancements"

* tag 'ktest-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest: (24 commits)
  ktest: Take submenu into account for grub2 menus
  ktest.pl: Add MAIL_COMMAND option to define how to send email
  ktest.pl: Use run_command to execute sending mail
  ktest.pl: Allow dodie be recursive
  ktest.pl: Kill test if mailer is not supported
  ktest.pl: Add MAIL_PATH option to define where to find the mailer
  ktest.pl: No need to print no mailer is specified when mailto is not
  Ktest: add email options to sample.config
  Ktest: Use dodie for critical falures
  Ktest: Add SigInt handling
  Ktest: Add email support
  ktest.pl: Detect if a config-bisect was interrupted
  ktest.pl: Make finding config-bisect.pl dynamic
  ktest.pl: Have ktest.pl pass -r to config-bisect.pl to reset bisect
  ktest.pl: Use diffconfig if available for failed config bisects
  ktest.pl: Allow for the config-bisect.pl output to display to console
  ktest: Use config-bisect.pl in ktest.pl
  ktest: Add standalone config-bisect.pl program
  ktest: Set do_not_reboot=y for CONFIG_BISECT_TYPE=build
  ktest: Set buildonly=1 for CONFIG_BISECT_TYPE=build
  ...

6 years agoMerge tag 'tags/upstream-4.17-rc1' of git://git.infradead.org/linux-ubifs
Linus Torvalds [Wed, 11 Apr 2018 23:39:34 +0000 (16:39 -0700)]
Merge tag 'tags/upstream-4.17-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI and UBIFS updates from Richard Weinberger:
 "Minor bug fixes and improvements"

* tag 'tags/upstream-4.17-rc1' of git://git.infradead.org/linux-ubifs:
  ubi: Reject MLC NAND
  ubifs: Remove useless parameter of lpt_heap_replace
  ubifs: Constify struct ubifs_lprops in scan_for_leb_for_idx
  ubifs: remove unnecessary assignment
  ubi: Fix error for write access
  ubi: fastmap: Don't flush fastmap work on detach
  ubifs: Check ubifs_wbuf_sync() return code

6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Linus Torvalds [Wed, 11 Apr 2018 23:36:47 +0000 (16:36 -0700)]
Merge git://git./pub/scm/linux/kernel/git/rw/uml

Pull UML updates from Richard Weinberger:

 - a new and faster epoll based IRQ controller and NIC driver

 - misc fixes and janitorial updates

* git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  Fix vector raw inintialization logic
  Migrate vector timers to new timer API
  um: Compile with modern headers
  um: vector: Fix an error handling path in 'vector_parse()'
  um: vector: Fix a memory allocation check
  um: vector: fix missing unlock on error in vector_net_open()
  um: Add missing EXPORT for free_irq_by_fd()
  High Performance UML Vector Network Driver
  Epoll based IRQ controller
  um: Use POSIX ucontext_t instead of struct ucontext
  um: time: Use timespec64 for persistent clock
  um: Restore symbol versions for __memcpy and memcpy

6 years agoMerge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Wed, 11 Apr 2018 23:12:21 +0000 (16:12 -0700)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Here is a very small set of fixes for inclusion in linux-4.17-rc1: Two
  changes for the maintainer file, and one more fix for the newly added
  npcm platform, to enable the level 2 cache controller"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  MAINTAINERS: Update ASPEED entry with details
  MAINTAINERS: Migrate oxnas list to groups.io
  arm: npcm: enable L2 cache in NPCM7xx architecture

6 years agoMerge tag 'nios2-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan...
Linus Torvalds [Wed, 11 Apr 2018 23:02:18 +0000 (16:02 -0700)]
Merge tag 'nios2-v4.17-rc1' of git://git./linux/kernel/git/lftan/nios2

Pull nios2 update from Ley Foon Tan:
 "Use read_persistent_clock64() instead of read_persistent_clock()"

* tag 'nios2-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
  nios2: Use read_persistent_clock64() instead of read_persistent_clock()

6 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Wed, 11 Apr 2018 17:51:26 +0000 (10:51 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge more updates from Andrew Morton:

 - almost all of the rest of MM

 - kasan updates

 - lots of procfs work

 - misc things

 - lib/ updates

 - checkpatch

 - rapidio

 - ipc/shm updates

 - the start of willy's XArray conversion

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (140 commits)
  page cache: use xa_lock
  xarray: add the xa_lock to the radix_tree_root
  fscache: use appropriate radix tree accessors
  export __set_page_dirty
  unicore32: turn flush_dcache_mmap_lock into a no-op
  arm64: turn flush_dcache_mmap_lock into a no-op
  mac80211_hwsim: use DEFINE_IDA
  radix tree: use GFP_ZONEMASK bits of gfp_t for flags
  linux/const.h: refactor _BITUL and _BITULL a bit
  linux/const.h: move UL() macro to include/linux/const.h
  linux/const.h: prefix include guard of uapi/linux/const.h with _UAPI
  xen, mm: allow deferred page initialization for xen pv domains
  elf: enforce MAP_FIXED on overlaying elf segments
  fs, elf: drop MAP_FIXED usage from elf_map
  mm: introduce MAP_FIXED_NOREPLACE
  MAINTAINERS: update bouncing aacraid@adaptec.com addresses
  fs/dcache.c: add cond_resched() in shrink_dentry_list()
  include/linux/kfifo.h: fix comment
  ipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_ops
  kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_param
  ...

6 years agopage cache: use xa_lock
Matthew Wilcox [Tue, 10 Apr 2018 23:36:56 +0000 (16:36 -0700)]
page cache: use xa_lock

Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink:
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoxarray: add the xa_lock to the radix_tree_root
Matthew Wilcox [Tue, 10 Apr 2018 23:36:52 +0000 (16:36 -0700)]
xarray: add the xa_lock to the radix_tree_root

This results in no change in structure size on 64-bit machines as it
fits in the padding between the gfp_t and the void *.  32-bit machines
will grow the structure from 8 to 12 bytes.  Almost all radix trees are
protected with (at least) a spinlock, so as they are converted from
radix trees to xarrays, the data structures will shrink again.

Initialising the spinlock requires a name for the benefit of lockdep, so
RADIX_TREE_INIT() now needs to know the name of the radix tree it's
initialising, and so do IDR_INIT() and IDA_INIT().

Also add the xa_lock() and xa_unlock() family of wrappers to make it
easier to use the lock.  If we could rely on -fplan9-extensions in the
compiler, we could avoid all of this syntactic sugar, but that wasn't
added until gcc 4.6.

Link: http://lkml.kernel.org/r/20180313132639.17387-8-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofscache: use appropriate radix tree accessors
Matthew Wilcox [Tue, 10 Apr 2018 23:36:48 +0000 (16:36 -0700)]
fscache: use appropriate radix tree accessors

Don't open-code accesses to data structure internals.

Link: http://lkml.kernel.org/r/20180313132639.17387-7-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoexport __set_page_dirty
Matthew Wilcox [Tue, 10 Apr 2018 23:36:44 +0000 (16:36 -0700)]
export __set_page_dirty

XFS currently contains a copy-and-paste of __set_page_dirty().  Export
it from buffer.c instead.

Link: http://lkml.kernel.org/r/20180313132639.17387-6-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agounicore32: turn flush_dcache_mmap_lock into a no-op
Matthew Wilcox [Tue, 10 Apr 2018 23:36:40 +0000 (16:36 -0700)]
unicore32: turn flush_dcache_mmap_lock into a no-op

Unicore doesn't walk the VMA tree in its flush_dcache_page()
implementation, so has no need to take the tree_lock.

Link: http://lkml.kernel.org/r/20180313132639.17387-5-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoarm64: turn flush_dcache_mmap_lock into a no-op
Matthew Wilcox [Tue, 10 Apr 2018 23:36:36 +0000 (16:36 -0700)]
arm64: turn flush_dcache_mmap_lock into a no-op

ARM64 doesn't walk the VMA tree in its flush_dcache_page()
implementation, so has no need to take the tree_lock.

Link: http://lkml.kernel.org/r/20180313132639.17387-4-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomac80211_hwsim: use DEFINE_IDA
Matthew Wilcox [Tue, 10 Apr 2018 23:36:33 +0000 (16:36 -0700)]
mac80211_hwsim: use DEFINE_IDA

This is preferred to opencoding an IDA_INIT.

Link: http://lkml.kernel.org/r/20180313132639.17387-2-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoradix tree: use GFP_ZONEMASK bits of gfp_t for flags
Matthew Wilcox [Tue, 10 Apr 2018 23:36:28 +0000 (16:36 -0700)]
radix tree: use GFP_ZONEMASK bits of gfp_t for flags

Patch series "XArray", v9.  (First part thereof).

This patchset is, I believe, appropriate for merging for 4.17.  It
contains the XArray implementation, to eventually replace the radix
tree, and converts the page cache to use it.

This conversion keeps the radix tree and XArray data structures in sync
at all times.  That allows us to convert the page cache one function at
a time and should allow for easier bisection.  Other than renaming some
elements of the structures, the data structures are fundamentally
unchanged; a radix tree walk and an XArray walk will touch the same
number of cachelines.  I have changes planned to the XArray data
structure, but those will happen in future patches.

Improvements the XArray has over the radix tree:

 - The radix tree provides operations like other trees do; 'insert' and
   'delete'. But what most users really want is an automatically
   resizing array, and so it makes more sense to give users an API that
   is like an array -- 'load' and 'store'. We still have an 'insert'
   operation for users that really want that semantic.

 - The XArray considers locking as part of its API. This simplifies a
   lot of users who formerly had to manage their own locking just for
   the radix tree. It also improves code generation as we can now tell
   RCU that we're holding a lock and it doesn't need to generate as much
   fencing code. The other advantage is that tree nodes can be moved
   (not yet implemented).

 - GFP flags are now parameters to calls which may need to allocate
   memory. The radix tree forced users to decide what the allocation
   flags would be at creation time. It's much clearer to specify them at
   allocation time.

 - Memory is not preloaded; we don't tie up dozens of pages on the off
   chance that the slab allocator fails. Instead, we drop the lock,
   allocate a new node and retry the operation. We have to convert all
   the radix tree, IDA and IDR preload users before we can realise this
   benefit, but I have not yet found a user which cannot be converted.

 - The XArray provides a cmpxchg operation. The radix tree forces users
   to roll their own (and at least four have).

 - Iterators take a 'max' parameter. That simplifies many users and will
   reduce the amount of iteration done.

 - Iteration can proceed backwards. We only have one user for this, but
   since it's called as part of the pagefault readahead algorithm, that
   seemed worth mentioning.

 - RCU-protected pointers are not exposed as part of the API. There are
   some fun bugs where the page cache forgets to use rcu_dereference()
   in the current codebase.

 - Value entries gain an extra bit compared to radix tree exceptional
   entries. That gives us the extra bit we need to put huge page swap
   entries in the page cache.

 - Some iterators now take a 'filter' argument instead of having
   separate iterators for tagged/untagged iterations.

The page cache is improved by this:

 - Shorter, easier to read code

 - More efficient iterations

 - Reduction in size of struct address_space

 - Fewer walks from the top of the data structure; the XArray API
   encourages staying at the leaf node and conducting operations there.

This patch (of 8):

None of these bits may be used for slab allocations, so we can use them
as radix tree flags as long as we mask them off before passing them to
the slab allocator. Move the IDR flag from the high bits to the
GFP_ZONEMASK bits.

Link: http://lkml.kernel.org/r/20180313132639.17387-3-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agolinux/const.h: refactor _BITUL and _BITULL a bit
Masahiro Yamada [Tue, 10 Apr 2018 23:36:24 +0000 (16:36 -0700)]
linux/const.h: refactor _BITUL and _BITULL a bit

Minor cleanups available by _UL and _ULL.

Link: http://lkml.kernel.org/r/1519301715-31798-5-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agolinux/const.h: move UL() macro to include/linux/const.h
Masahiro Yamada [Tue, 10 Apr 2018 23:36:19 +0000 (16:36 -0700)]
linux/const.h: move UL() macro to include/linux/const.h

ARM, ARM64 and UniCore32 duplicate the definition of UL():

  #define UL(x) _AC(x, UL)

This is not actually arch-specific, so it will be useful to move it to a
common header.  Currently, we only have the uapi variant for
linux/const.h, so I am creating include/linux/const.h.

I also added _UL(), _ULL() and ULL() because _AC() is mostly used in
the form either _AC(..., UL) or _AC(..., ULL).  I expect they will be
replaced in follow-up cleanups.  The underscore-prefixed ones should
be used for exported headers.

Link: http://lkml.kernel.org/r/1519301715-31798-4-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agolinux/const.h: prefix include guard of uapi/linux/const.h with _UAPI
Masahiro Yamada [Tue, 10 Apr 2018 23:36:15 +0000 (16:36 -0700)]
linux/const.h: prefix include guard of uapi/linux/const.h with _UAPI

Patch series "linux/const.h: cleanups of macros such as UL(), _BITUL(),
BIT() etc", v3.

ARM, ARM64, UniCore32 define UL() as a shorthand of _AC(..., UL).  More
architectures may introduce it in the future.

UL() is arch-agnostic, and useful. So let's move it to
include/linux/const.h

Currently, <asm/memory.h> must be included to use UL().  It pulls in more
bloats just for defining some bit macros.

I posted V2 one year ago.

The previous posts are:
https://patchwork.kernel.org/patch/9498273/
https://patchwork.kernel.org/patch/9498275/
https://patchwork.kernel.org/patch/9498269/
https://patchwork.kernel.org/patch/9498271/

At that time, what blocked this series was a comment from
David Howells:
  You need to be very careful doing this.  Some userspace stuff
  depends on the guard macro names on the kernel header files.

(https://patchwork.kernel.org/patch/9498275/)

Looking at the code closer, I noticed this is not a problem.

See the following line.
https://github.com/torvalds/linux/blob/v4.16-rc2/scripts/headers_install.sh#L40

scripts/headers_install.sh rips off _UAPI prefix from guard macro names.

I ran "make headers_install" and confirmed the result is what I expect.

So, we can prefix the include guard of include/uapi/linux/const.h,
and add a new include/linux/const.h.

This patch (of 4):

I am going to add include/linux/const.h for the kernel space.

Add _UAPI to the include guard of include/uapi/linux/const.h to
prepare for that.

Please notice the guard name of the exported one will be kept as-is.
So, this commit has no impact to the userspace even if some userspace
stuff depends on the guard macro names.

scripts/headers_install.sh processes exported headers by SED, and
rips off "_UAPI" from guard macro names.

  #ifndef _UAPI_LINUX_CONST_H
  #define _UAPI_LINUX_CONST_H

will be turned into

  #ifndef _LINUX_CONST_H
  #define _LINUX_CONST_H

Link: http://lkml.kernel.org/r/1519301715-31798-2-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoxen, mm: allow deferred page initialization for xen pv domains
Pavel Tatashin [Tue, 10 Apr 2018 23:36:10 +0000 (16:36 -0700)]
xen, mm: allow deferred page initialization for xen pv domains

Juergen Gross noticed that commit f7f99100d8d ("mm: stop zeroing memory
during allocation in vmemmap") broke XEN PV domains when deferred struct
page initialization is enabled.

This is because the xen's PagePinned() flag is getting erased from
struct pages when they are initialized later in boot.

Juergen fixed this problem by disabling deferred pages on xen pv
domains.  It is desirable, however, to have this feature available as it
reduces boot time.  This fix re-enables the feature for pv-dmains, and
fixes the problem the following way:

The fix is to delay setting PagePinned flag until struct pages for all
allocated memory are initialized, i.e.  until after free_all_bootmem().

A new x86_init.hyper op init_after_bootmem() is called to let xen know
that boot allocator is done, and hence struct pages for all the
allocated memory are now initialized.  If deferred page initialization
is enabled, the rest of struct pages are going to be initialized later
in boot once page_alloc_init_late() is called.

xen_after_bootmem() walks page table's pages and marks them pinned.

Link: http://lkml.kernel.org/r/20180226160112.24724-2-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Jinbum Park <jinb.park7@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jia Zhang <zhang.jia@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoelf: enforce MAP_FIXED on overlaying elf segments
Michal Hocko [Tue, 10 Apr 2018 23:36:05 +0000 (16:36 -0700)]
elf: enforce MAP_FIXED on overlaying elf segments

Anshuman has reported that with "fs, elf: drop MAP_FIXED usage from
elf_map" applied, some ELF binaries in his environment fail to start
with

 [   23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000 requested but the memory is mapped already
 [   23.423706] requested [1003000010040000] mapped [1003000010040000] 100073 anon

The reason is that the above binary has overlapping elf segments:

  LOAD           0x0000000000000000 0x0000000010000000 0x0000000010000000
                 0x0000000000013a8c 0x0000000000013a8c  R E    10000
  LOAD           0x000000000001fd40 0x000000001002fd40 0x000000001002fd40
                 0x00000000000002c0 0x00000000000005e8  RW     10000
  LOAD           0x0000000000020328 0x0000000010030328 0x0000000010030328
                 0x0000000000000384 0x00000000000094a0  RW     10000

That binary has two RW LOAD segments, the first crosses a page border
into the second

  0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr)

Handle this situation by enforcing MAP_FIXED when we establish a
temporary brk VMA to handle overlapping segments.  All other mappings
will still use MAP_FIXED_NOREPLACE.

Link: http://lkml.kernel.org/r/20180213100440.GM3443@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofs, elf: drop MAP_FIXED usage from elf_map
Michal Hocko [Tue, 10 Apr 2018 23:36:01 +0000 (16:36 -0700)]
fs, elf: drop MAP_FIXED usage from elf_map

Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that.  This is
however dangerous thing prone to silent data corruption which can be
even exploitable.

Let's take CVE-2017-1000253 as an example.  At the time (before commit
eab09532d400: "binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away).  Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246: "fs/binfmt_elf.c: fix
bug in loading of PIE binaries"), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d400 and later by c715b72c1ba4: "mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes") and excessive
stack consumption early during execve fully stopped by da029c11e6b1
("exec: Limit arg stack to at most 75% of _STK_LIM").  So we should be
safe and any attack should be impractical.  On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous.  Moreover it shouldn't be even needed.  We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address should
succeed even without MAP_FIXED.  Something is terribly wrong if this is
not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by changing MAP_FIXED to the newly added
MAP_FIXED_NOREPLACE.  This will mean that mmap will fail if there is an
existing mapping clashing with the requested one without clobbering it.

[mhocko@suse.com: fix build]
[akpm@linux-foundation.org: coding-style fixes]
[avagin@openvz.org: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC]
Link: http://lkml.kernel.org/r/20171218184916.24445-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/20171213092550.2774-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm: introduce MAP_FIXED_NOREPLACE
Michal Hocko [Tue, 10 Apr 2018 23:35:57 +0000 (16:35 -0700)]
mm: introduce MAP_FIXED_NOREPLACE

Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2.

This has started as a follow up discussion [3][4] resulting in the
runtime failure caused by hardening patch [5] which removes MAP_FIXED
from the elf loader because MAP_FIXED is inherently dangerous as it
might silently clobber an existing underlying mapping (e.g.  stack).
The reason for the failure is that some architectures enforce an
alignment for the given address hint without MAP_FIXED used (e.g.  for
shared or file backed mappings).

One way around this would be excluding those archs which do alignment
tricks from the hardening [6].  The patch is really trivial but it has
been objected, rightfully so, that this screams for a more generic
solution.  We basically want a non-destructive MAP_FIXED.

The first patch introduced MAP_FIXED_NOREPLACE which enforces the given
address but unlike MAP_FIXED it fails with EEXIST if the given range
conflicts with an existing one.  The flag is introduced as a completely
new one rather than a MAP_FIXED extension because of the backward
compatibility.  We really want a never-clobber semantic even on older
kernels which do not recognize the flag.  Unfortunately mmap sucks
wrt flags evaluation because we do not EINVAL on unknown flags.  On
those kernels we would simply use the traditional hint based semantic so
the caller can still get a different address (which sucks) but at least
not silently corrupt an existing mapping.  I do not see a good way
around that.  Except we won't export expose the new semantic to the
userspace at all.

It seems there are users who would like to have something like that.
Jemalloc has been mentioned by Michael Ellerman [7]

Florian Weimer has mentioned the following:
: glibc ld.so currently maps DSOs without hints.  This means that the kernel
: will map right next to each other, and the offsets between them a completely
: predictable.  We would like to change that and supply a random address in a
: window of the address space.  If there is a conflict, we do not want the
: kernel to pick a non-random address. Instead, we would try again with a
: random address.

John Hubbard has mentioned CUDA example
: a) Searches /proc/<pid>/maps for a "suitable" region of available
: VA space.  "Suitable" generally means it has to have a base address
: within a certain limited range (a particular device model might
: have odd limitations, for example), it has to be large enough, and
: alignment has to be large enough (again, various devices may have
: constraints that lead us to do this).
:
: This is of course subject to races with other threads in the process.
:
: Let's say it finds a region starting at va.
:
: b) Next it does:
:     p = mmap(va, ...)
:
: *without* setting MAP_FIXED, of course (so va is just a hint), to
: attempt to safely reserve that region. If p != va, then in most cases,
: this is a failure (almost certainly due to another thread getting a
: mapping from that region before we did), and so this layer now has to
: call munmap(), before returning a "failure: retry" to upper layers.
:
:     IMPROVEMENT: --> if instead, we could call this:
:
:             p = mmap(va, ... MAP_FIXED_NOREPLACE ...)
:
:         , then we could skip the munmap() call upon failure. This
:         is a small thing, but it is useful here. (Thanks to Piotr
:         Jaroszynski and Mark Hairgrove for helping me get that detail
:         exactly right, btw.)
:
: c) After that, CUDA suballocates from p, via:
:
:      q = mmap(sub_region_start, ... MAP_FIXED ...)
:
: Interestingly enough, "freeing" is also done via MAP_FIXED, and
: setting PROT_NONE to the subregion. Anyway, I just included (c) for
: general interest.

Atomic address range probing in the multithreaded programs in general
sounds like an interesting thing to me.

The second patch simply replaces MAP_FIXED use in elf loader by
MAP_FIXED_NOREPLACE.  I believe other places which rely on MAP_FIXED
should follow.  Actually real MAP_FIXED usages should be docummented
properly and they should be more of an exception.

[1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org
[3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
[4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
[5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
[6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
[7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au

This patch (of 2):

MAP_FIXED is used quite often to enforce mapping at the particular range.
The main problem of this flag is, however, that it is inherently dangerous
because it unmaps existing mappings covered by the requested range.  This
can cause silent memory corruptions.  Some of them even with serious
security implications.  While the current semantic might be really
desiderable in many cases there are others which would want to enforce the
given range but rather see a failure than a silent memory corruption on a
clashing range.  Please note that there is no guarantee that a given range
is obeyed by the mmap even when it is free - e.g.  arch specific code is
allowed to apply an alignment.

Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this
behavior.  It has the same semantic as MAP_FIXED wrt.  the given address
request with a single exception that it fails with EEXIST if the requested
address is already covered by an existing mapping.  We still do rely on
get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
check for a conflicting vma after it returns.

The flag is introduced as a completely new one rather than a MAP_FIXED
extension because of the backward compatibility.  We really want a
never-clobber semantic even on older kernels which do not recognize the
flag.  Unfortunately mmap sucks wrt.  flags evaluation because we do not
EINVAL on unknown flags.  On those kernels we would simply use the
traditional hint based semantic so the caller can still get a different
address (which sucks) but at least not silently corrupt an existing
mapping.  I do not see a good way around that.

[mpe@ellerman.id.au: fix whitespace]
[fail on clashing range with EEXIST as per Florian Weimer]
[set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
Link: http://lkml.kernel.org/r/20171213092550.2774-2-mhocko@kernel.org
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jason Evans <jasone@google.com>
Cc: David Goldblatt <davidtgoldblatt@gmail.com>
Cc: Edward Tomasz Napierała <trasz@FreeBSD.org>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoMAINTAINERS: update bouncing aacraid@adaptec.com addresses
Joe Perches [Tue, 10 Apr 2018 23:35:53 +0000 (16:35 -0700)]
MAINTAINERS: update bouncing aacraid@adaptec.com addresses

Adaptec is now part of Microsemi.

Commit 2a81ffdd9da1 ("MAINTAINERS: Update email address for aacraid")
updated only one of the driver maintainer addresses.

Update the other two sections as the aacraid@adaptec.com address
bounces.

Link: http://lkml.kernel.org/r/1522103936.12357.27.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofs/dcache.c: add cond_resched() in shrink_dentry_list()
Nikolay Borisov [Tue, 10 Apr 2018 23:35:49 +0000 (16:35 -0700)]
fs/dcache.c: add cond_resched() in shrink_dentry_list()

As previously reported (https://patchwork.kernel.org/patch/8642031/)
it's possible to call shrink_dentry_list with a large number of dentries
(> 10000).  This, in turn, could trigger the softlockup detector and
possibly trigger a panic.  In addition to the unmount path being
vulnerable to this scenario, at SuSE we've observed similar situation
happening during process exit on processes that touch a lot of dentries.
Here is an excerpt from a crash dump.  The number after the colon are
the number of dentries on the list passed to shrink_dentry_list:

PID 99760: 10722
PID 107530: 215
PID 108809: 24134
PID 108877: 21331
PID 141708: 16487

So we want to kill between 15k-25k dentries without yielding.

And one possible call stack looks like:

4 [ffff8839ece41db0] _raw_spin_lock at ffffffff8152a5f8
5 [ffff8839ece41db0] evict at ffffffff811c3026
6 [ffff8839ece41dd0] __dentry_kill at ffffffff811bf258
7 [ffff8839ece41df0] shrink_dentry_list at ffffffff811bf593
8 [ffff8839ece41e18] shrink_dcache_parent at ffffffff811bf830
9 [ffff8839ece41e50] proc_flush_task at ffffffff8120dd61
10 [ffff8839ece41ec0] release_task at ffffffff81059ebd
11 [ffff8839ece41f08] do_exit at ffffffff8105b8ce
12 [ffff8839ece41f78] sys_exit at ffffffff8105bd53
13 [ffff8839ece41f80] system_call_fastpath at ffffffff81532909

While some of the callers of shrink_dentry_list do use cond_resched,
this is not sufficient to prevent softlockups.  So just move
cond_resched into shrink_dentry_list from its callers.

David said: I've found hundreds of occurrences of warnings that we emit
when need_resched stays set for a prolonged period of time with the
stack trace that is included in the change log.

Link: http://lkml.kernel.org/r/1521718946-31521-1-git-send-email-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoinclude/linux/kfifo.h: fix comment
Valentin Vidic [Tue, 10 Apr 2018 23:35:46 +0000 (16:35 -0700)]
include/linux/kfifo.h: fix comment

Clean up unusual formatting in the note about locking.

Link: http://lkml.kernel.org/r/20180324002630.13046-1-Valentin.Vidic@CARNet.hr
Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Sean Young <sean@mess.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_ops
Andrew Morton [Tue, 10 Apr 2018 23:35:42 +0000 (16:35 -0700)]
ipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_ops

This was added by the recent "ipc/shm.c: add split function to
shm_vm_ops", but it is not necessary.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_param
Waiman Long [Tue, 10 Apr 2018 23:35:38 +0000 (16:35 -0700)]
kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_param

Kdoc comments are added to the do_proc_dointvec_minmax_conv_param and
do_proc_douintvec_minmax_conv_param structures thare are used internally
for range checking.

The error codes returned by proc_dointvec_minmax() and
proc_douintvec_minmax() are also documented.

Link: http://lkml.kernel.org/r/1519926220-7453-3-git-send-email-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agofs/proc/proc_sysctl.c: fix typo in sysctl_check_table_array()
Waiman Long [Tue, 10 Apr 2018 23:35:35 +0000 (16:35 -0700)]
fs/proc/proc_sysctl.c: fix typo in sysctl_check_table_array()

Patch series "ipc: Clamp *mni to the real IPCMNI limit", v3.

The sysctl parameters msgmni, shmmni and semmni have an inherent limit
of IPC_MNI (32k).  However, users may not be aware of that because they
can write a value much higher than that without getting any error or
notification.  Reading the parameters back will show the newly written
values which are not real.

Enforcing the limit by failing sysctl parameter write, however, can
break existing user applications.  To address this delemma, a new flags
field is introduced into the ctl_table.  The value CTL_FLAGS_CLAMP_RANGE
can be added to any ctl_table entries to enable a looser range clamping
without returning any error.  For example,

  .flags = CTL_FLAGS_CLAMP_RANGE,

This flags value are now used for the range checking of shmmni, msgmni
and semmni without breaking existing applications.  If any out of range
value is written to those sysctl parameters, the following warning will
be printed instead.

  Kernel parameter "shmmni" was set out of range [0, 32768], clamped to 32768.

Reading the values back will show 32768 instead of some fake values.

This patch (of 6):

Fix a typo.

Link: http://lkml.kernel.org/r/1519926220-7453-2-git-send-email-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoipc/msg: introduce msgctl(MSG_STAT_ANY)
Davidlohr Bueso [Tue, 10 Apr 2018 23:35:30 +0000 (16:35 -0700)]
ipc/msg: introduce msgctl(MSG_STAT_ANY)

There is a permission discrepancy when consulting msq ipc object
metadata between /proc/sysvipc/msg (0444) and the MSG_STAT shmctl
command.  The later does permission checks for the object vs S_IRUGO.
As such there can be cases where EACCESS is returned via syscall but the
info is displayed anyways in the procfs files.

While this might have security implications via info leaking (albeit no
writing to the msq metadata), this behavior goes way back and showing
all the objects regardless of the permissions was most likely an
overlook - so we are stuck with it.  Furthermore, modifying either the
syscall or the procfs file can cause userspace programs to break (ie
ipcs).  Some applications require getting the procfs info (without root
privileges) and can be rather slow in comparison with a syscall -- up to
500x in some reported cases for shm.

This patch introduces a new MSG_STAT_ANY command such that the msq ipc
object permissions are ignored, and only audited instead.  In addition,
I've left the lsm security hook checks in place, as if some policy can
block the call, then the user has no other choice than just parsing the
procfs file.

Link: http://lkml.kernel.org/r/20180215162458.10059-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Robert Kettler <robert.kettler@outlook.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoipc/sem: introduce semctl(SEM_STAT_ANY)
Davidlohr Bueso [Tue, 10 Apr 2018 23:35:26 +0000 (16:35 -0700)]
ipc/sem: introduce semctl(SEM_STAT_ANY)

There is a permission discrepancy when consulting shm ipc object
metadata between /proc/sysvipc/sem (0444) and the SEM_STAT semctl
command.  The later does permission checks for the object vs S_IRUGO.
As such there can be cases where EACCESS is returned via syscall but the
info is displayed anyways in the procfs files.

While this might have security implications via info leaking (albeit no
writing to the sma metadata), this behavior goes way back and showing
all the objects regardless of the permissions was most likely an
overlook - so we are stuck with it.  Furthermore, modifying either the
syscall or the procfs file can cause userspace programs to break (ie
ipcs).  Some applications require getting the procfs info (without root
privileges) and can be rather slow in comparison with a syscall -- up to
500x in some reported cases for shm.

This patch introduces a new SEM_STAT_ANY command such that the sem ipc
object permissions are ignored, and only audited instead.  In addition,
I've left the lsm security hook checks in place, as if some policy can
block the call, then the user has no other choice than just parsing the
procfs file.

Link: http://lkml.kernel.org/r/20180215162458.10059-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Robert Kettler <robert.kettler@outlook.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoipc/shm: introduce shmctl(SHM_STAT_ANY)
Davidlohr Bueso [Tue, 10 Apr 2018 23:35:23 +0000 (16:35 -0700)]
ipc/shm: introduce shmctl(SHM_STAT_ANY)

Patch series "sysvipc: introduce STAT_ANY commands", v2.

The following patches adds the discussed (see [1]) new command for shm
as well as for sems and msq as they are subject to the same
discrepancies for ipc object permission checks between the syscall and
via procfs.  These new commands are justified in that (1) we are stuck
with this semantics as changing syscall and procfs can break userland;
and (2) some users can benefit from performance (for large amounts of
shm segments, for example) from not having to parse the procfs
interface.

Once merged, I will submit the necesary manpage updates.  But I'm thinking
something like:

: diff --git a/man2/shmctl.2 b/man2/shmctl.2
: index 7bb503999941..bb00bbe21a57 100644
: --- a/man2/shmctl.2
: +++ b/man2/shmctl.2
: @@ -41,6 +41,7 @@
:  .\" 2005-04-25, mtk -- noted aberrant Linux behavior w.r.t. new
:  .\" attaches to a segment that has already been marked for deletion.
:  .\" 2005-08-02, mtk: Added IPC_INFO, SHM_INFO, SHM_STAT descriptions.
: +.\" 2018-02-13, dbueso: Added SHM_STAT_ANY description.
:  .\"
:  .TH SHMCTL 2 2017-09-15 "Linux" "Linux Programmer's Manual"
:  .SH NAME
: @@ -242,6 +243,18 @@ However, the
:  argument is not a segment identifier, but instead an index into
:  the kernel's internal array that maintains information about
:  all shared memory segments on the system.
: +.TP
: +.BR SHM_STAT_ANY " (Linux-specific)"
: +Return a
: +.I shmid_ds
: +structure as for
: +.BR SHM_STAT .
: +However, the
: +.I shm_perm.mode
: +is not checked for read access for
: +.IR shmid ,
: +resembing the behaviour of
: +/proc/sysvipc/shm.
:  .PP
:  The caller can prevent or allow swapping of a shared
:  memory segment with the following \fIcmd\fP values:
: @@ -287,7 +300,7 @@ operation returns the index of the highest used entry in the
:  kernel's internal array recording information about all
:  shared memory segments.
:  (This information can be used with repeated
: -.B SHM_STAT
: +.B SHM_STAT/SHM_STAT_ANY
:  operations to obtain information about all shared memory segments
:  on the system.)
:  A successful
: @@ -328,7 +341,7 @@ isn't accessible.
:  \fIshmid\fP is not a valid identifier, or \fIcmd\fP
:  is not a valid command.
:  Or: for a
: -.B SHM_STAT
: +.B SHM_STAT/SHM_STAT_ANY
:  operation, the index value specified in
:  .I shmid
:  referred to an array slot that is currently unused.

This patch (of 3):

There is a permission discrepancy when consulting shm ipc object metadata
between /proc/sysvipc/shm (0444) and the SHM_STAT shmctl command.  The
later does permission checks for the object vs S_IRUGO.  As such there can
be cases where EACCESS is returned via syscall but the info is displayed
anyways in the procfs files.

While this might have security implications via info leaking (albeit no
writing to the shm metadata), this behavior goes way back and showing all
the objects regardless of the permissions was most likely an overlook - so
we are stuck with it.  Furthermore, modifying either the syscall or the
procfs file can cause userspace programs to break (ie ipcs).  Some
applications require getting the procfs info (without root privileges) and
can be rather slow in comparison with a syscall -- up to 500x in some
reported cases.

This patch introduces a new SHM_STAT_ANY command such that the shm ipc
object permissions are ignored, and only audited instead.  In addition,
I've left the lsm security hook checks in place, as if some policy can
block the call, then the user has no other choice than just parsing the
procfs file.

[1] https://lkml.org/lkml/2017/12/19/220

Link: http://lkml.kernel.org/r/20180215162458.10059-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Robert Kettler <robert.kettler@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/params.c: downgrade warning for unsafe parameters
Chris Wilson [Tue, 10 Apr 2018 23:35:18 +0000 (16:35 -0700)]
kernel/params.c: downgrade warning for unsafe parameters

As using an unsafe module parameter is, by its very definition, an
expected user action, emitting a warning is overkill.  Nothing has yet
gone wrong, and we add a taint flag for any future oops should something
actually go wrong.  So instead of having a user controllable pr_warn,
downgrade it to a pr_notice for "a normal, but significant condition".

We make use of unsafe kernel parameters in igt
(https://cgit.freedesktop.org/drm/igt-gpu-tools/) (we have not yet
succeeded in removing all such debugging options), which generates a
warning and taints the kernel.  The warning is unhelpful as we then need
to filter it out again as we check that every test themselves do not
provoke any kernel warnings.

Link: http://lkml.kernel.org/r/20180226151919.9674-1-chris@chris-wilson.co.uk
Fixes: 91f9d330cc14 ("module: make it possible to have unsafe, tainting module params")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Petri Latvala <petri.latvala@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/sysctl.c: fix sizeof argument to match variable name
Randy Dunlap [Tue, 10 Apr 2018 23:35:14 +0000 (16:35 -0700)]
kernel/sysctl.c: fix sizeof argument to match variable name

Fix sizeof argument to be the same as the data variable name.  Probably
a copy/paste error.

Mostly harmless since both variables are unsigned int.

Fixes kernel bugzilla #197371:
  Possible access to unintended variable in "kernel/sysctl.c" line 1339
https://bugzilla.kernel.org/show_bug.cgi?id=197371

Link: http://lkml.kernel.org/r/e0d0531f-361e-ef5f-8499-32743ba907e1@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Petru Mihancea <petrum@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agorapidio: use a reference count for struct mport_dma_req
Ioan Nicu [Tue, 10 Apr 2018 23:35:10 +0000 (16:35 -0700)]
rapidio: use a reference count for struct mport_dma_req

Once the dma request is passed to the DMA engine, the DMA subsystem
would hold a pointer to this structure and could call the completion
callback after do_dma_request() has timed out.

The current code deals with this by putting timed out SYNC requests to a
pending list and freeing them later, when the mport cdev device is
released.  This still does not guarantee that the DMA subsystem is
really done with those transfers, so in theory
dma_xfer_callback/dma_req_free could be called after
mport_cdev_release_dma and could potentially access already freed
memory.

This patch simplifies the current handling by using a kref in the mport
dma request structure, so that it gets freed only when nobody uses it
anymore.

This also simplifies the code a bit, as FAF transfers are now handled in
the same way as SYNC and ASYNC transfers.  There is no need anymore for
the pending list and for the dma workqueue which was used in case of FAF
transfers, so we remove them both.

Link: http://lkml.kernel.org/r/20180405203342.GA16191@nokia.com
Signed-off-by: Ioan Nicu <ioan.nicu.ext@nokia.com>
Acked-by: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Barry Wood <barry.wood@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Frank Kunz <frank.kunz@nokia.com>
Cc: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agodrivers/rapidio/rio-scan.c: fix typo in comment
Vasyl Gomonovych [Tue, 10 Apr 2018 23:35:06 +0000 (16:35 -0700)]
drivers/rapidio/rio-scan.c: fix typo in comment

Fix typo in the words 'receiver', 'specified', 'during'

Link: http://lkml.kernel.org/r/20180321211035.8904-1-gomonovych@gmail.com
Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoexec: pin stack limit during exec
Kees Cook [Tue, 10 Apr 2018 23:35:01 +0000 (16:35 -0700)]
exec: pin stack limit during exec

Since the stack rlimit is used in multiple places during exec and it can
be changed via other threads (via setrlimit()) or processes (via
prlimit()), the assumption that the value doesn't change cannot be made.
This leads to races with mm layout selection and argument size
calculations.  This changes the exec path to use the rlimit stored in
bprm instead of in current.  Before starting the thread, the bprm stack
rlimit is stored back to current.

Link: http://lkml.kernel.org/r/1518638796-20819-4-git-send-email-keescook@chromium.org
Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reported-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Brad Spengler <spender@grsecurity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoexec: introduce finalize_exec() before start_thread()
Kees Cook [Tue, 10 Apr 2018 23:34:57 +0000 (16:34 -0700)]
exec: introduce finalize_exec() before start_thread()

Provide a final callback into fs/exec.c before start_thread() takes
over, to handle any last-minute changes, like the coming restoration of
the stack limit.

Link: http://lkml.kernel.org/r/1518638796-20819-3-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Greg KH <greg@kroah.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoexec: pass stack rlimit into mm layout functions
Kees Cook [Tue, 10 Apr 2018 23:34:53 +0000 (16:34 -0700)]
exec: pass stack rlimit into mm layout functions

Patch series "exec: Pin stack limit during exec".

Attempts to solve problems with the stack limit changing during exec
continue to be frustrated[1][2].  In addition to the specific issues
around the Stack Clash family of flaws, Andy Lutomirski pointed out[3]
other places during exec where the stack limit is used and is assumed to
be unchanging.  Given the many places it gets used and the fact that it
can be manipulated/raced via setrlimit() and prlimit(), I think the only
way to handle this is to move away from the "current" view of the stack
limit and instead attach it to the bprm, and plumb this down into the
functions that need to know the stack limits.  This series implements
the approach.

[1] 04e35f4495dd ("exec: avoid RLIMIT_STACK races with prlimit()")
[2] 779f4e1c6c7c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"")
[3] to security@kernel.org, "Subject: existing rlimit races?"

This patch (of 3):

Since it is possible that the stack rlimit can change externally during
exec (either via another thread calling setrlimit() or another process
calling prlimit()), provide a way to pass the rlimit down into the
per-architecture mm layout functions so that the rlimit can stay in the
bprm structure instead of sitting in the signal structure until exec is
finalized.

Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>