git.openwrt.org Git - openwrt/staging/blogic.git/log

Merge tag 'powerpc-4.17-2' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

- Fix crashes when loading modules built with a different
   CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic.

- Fix busy loops in the OPAL NVRAM driver if we get certain error
   conditions from firmware.

- Remove tlbie trace points from KVM code that's called in real mode,
   because it causes crashes.

- Fix checkstops caused by invalid tlbiel on Power9 Radix.

- Ensure the set of CPU features we "know" are always enabled is
   actually the minimal set when we build with support for firmware
   supplied CPU features.

Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.

* tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features
  powerpc/mm/radix: Fix checkstops caused by invalid tlbiel
  KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
  powerpc/8xx: Fix build with hugetlbfs enabled
  powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
  powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
  powerpc/fscr: Enable interrupts earlier before calling get_user()
  powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
  powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic

Merge branch 'akpm' (patches from Andrew)

Merge yet more updates from Andrew Morton:

- various hotfixes

- kexec_file updates and feature work

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
  kernel/kexec_file.c: move purgatories sha256 to common code
  kernel/kexec_file.c: allow archs to set purgatory load address
  kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load
  kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs
  kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs
  kernel/kexec_file.c: split up __kexec_load_puragory
  kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*
  kernel/kexec_file.c: search symbols in read-only kexec_purgatory
  kernel/kexec_file.c: make purgatory_info->ehdr const
  kernel/kexec_file.c: remove checks in kexec_purgatory_load
  include/linux/kexec.h: silence compile warnings
  kexec_file, x86: move re-factored code to generic side
  x86: kexec_file: clean up prepare_elf64_headers()
  x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer
  x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()
  x86: kexec_file: purge system-ram walking from prepare_elf64_headers()
  kexec_file,x86,powerpc: factor out kexec_file_ops functions
  kexec_file: make use of purgatory optional
  proc: revalidate misc dentries
  mm, slab: reschedule cache_reap() on the same CPU
  ...

kernel/kexec_file.c: move purgatories sha256 to common code

The code to verify the new kernels sha digest is applicable for all
architectures.  Move it to common code.

One problem is the string.c implementation on x86.  Currently sha256
includes x86/boot/string.h which defines memcpy and memset to be gcc
builtins.  By moving the sha256 implementation to common code and
changing the include to linux/string.h both functions are no longer
defined.  Thus definitions have to be provided in x86/purgatory/string.c

Link: http://lkml.kernel.org/r/20180321112751.22196-12-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/kexec_file.c: allow archs to set purgatory load address

For s390 new kernels are loaded to fixed addresses in memory before they
are booted.  With the current code this is a problem as it assumes the
kernel will be loaded to an 'arbitrary' address.  In particular,
kexec_locate_mem_hole searches for a large enough memory region and sets
the load address (kexec_bufer->mem) to it.

Luckily there is a simple workaround for this problem.  By returning 1
in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off.  This
allows the architecture to set kbuf->mem by hand.  While the trick works
fine for the kernel it does not for the purgatory as here the
architectures don't have access to its kexec_buffer.

Give architectures access to the purgatories kexec_buffer by changing
kexec_load_purgatory to take a pointer to it.  With this change
architectures have access to the buffer and can edit it as they need.

A nice side effect of this change is that we can get rid of the
purgatory_info->purgatory_load_address field.  As now the information
stored there can directly be accessed from kbuf->mem.

Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load

The current code uses the sh_offset field in purgatory_info->sechdrs to
store a pointer to the current load address of the section. Depending
whether the section will be loaded or not this is either a pointer into
purgatory_info->purgatory_buf or kexec_purgatory. This is not only a
violation of the ELF standard but also makes the code very hard to
understand as you cannot tell if the memory you are using is read-only
or not.

Remove this misuse and store the offset of the section in
pugaroty_info->purgatory_buf in sh_offset.

Link: http://lkml.kernel.org/r/20180321112751.22196-10-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs

The main loop currently uses quite a lot of variables to update the
section headers. Some of them are unnecessary. So clean them up a
little.

Link: http://lkml.kernel.org/r/20180321112751.22196-9-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs

To update the entry point there is an extra loop over all section
headers although this can be done in the main loop. So move it there
and eliminate the extra loop and variable to store the 'entry section
index'.

Also, in the main loop, move the usual case, i.e. non-bss section, out
of the extra if-block.

Link: http://lkml.kernel.org/r/20180321112751.22196-8-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/kexec_file.c: split up __kexec_load_puragory

When inspecting __kexec_load_purgatory you find that it has two tasks

1) setting up the kexec_buffer for the new kernel and,
2) setting up pi->sechdrs for the final load address.

The two tasks are independent of each other. To improve readability
split up __kexec_load_purgatory into two functions, one for each task,
and call them directly from kexec_load_purgatory.

Link: http://lkml.kernel.org/r/20180321112751.22196-7-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*

When the relocations are applied to the purgatory only the section the
relocations are applied to is writable. The other sections, i.e. the
symtab and .rel/.rela, are in read-only kexec_purgatory. Highlight this
by marking the corresponding variables as 'const'.

While at it also change the signatures of arch_kexec_apply_relocations* to
take section pointers instead of just the index of the relocation section.
This removes the second lookup and sanity check of the sections in arch
code.

Link: http://lkml.kernel.org/r/20180321112751.22196-6-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/kexec_file.c: search symbols in read-only kexec_purgatory

The stripped purgatory does not contain a symtab. So when looking for
symbols this is done in read-only kexec_purgatory. Highlight this by
marking the corresponding variables as 'const'.

Link: http://lkml.kernel.org/r/20180321112751.22196-5-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/kexec_file.c: make purgatory_info->ehdr const

The kexec_purgatory buffer is read-only. Thus all pointers into
kexec_purgatory are read-only, too. Point this out by explicitly
marking purgatory_info->ehdr as 'const' and update the comments in
purgatory_info.

Link: http://lkml.kernel.org/r/20180321112751.22196-4-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/kexec_file.c: remove checks in kexec_purgatory_load

Before the purgatory is loaded several checks are done whether the ELF
file in kexec_purgatory is valid or not. These checks are incomplete.
For example they don't check for the total size of the sections defined
in the section header table or if the entry point actually points into
the purgatory.

On the other hand the purgatory, although an ELF file on its own, is
part of the kernel. Thus not trusting the purgatory means not trusting
the kernel build itself.

So remove all validity checks on the purgatory and just trust the kernel
build.

Link: http://lkml.kernel.org/r/20180321112751.22196-3-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

include/linux/kexec.h: silence compile warnings

Patch series "kexec_file: Clean up purgatory load", v2.

Following the discussion with Dave and AKASHI, here are the common code
patches extracted from my recent patch set (Add kexec_file_load support
to s390) [1].  The patches were extracted to allow upstream integration
together with AKASHI's common code patches before the arch code gets
adjusted to the new base.

The reason for this series is to prepare common code for adding
kexec_file_load to s390 as well as cleaning up the mis-use of the
sh_offset field during purgatory load.  In detail this series contains:

Patch #1&2: Minor cleanups/fixes.

Patch #3-9: Clean up the purgatory load/relocation code.  Especially
remove the mis-use of the purgatory_info->sechdrs->sh_offset field,
currently holding a pointer into either kexec_purgatory (ro) or
purgatory_buf (rw) depending on the section.  With these patches the
section address will be calculated verbosely and sh_offset will contain
the offset of the section in the stripped purgatory binary
(purgatory_buf).

Patch #10: Allows architectures to set the purgatory load address.  This
patch is important for s390 as the kernel and purgatory have to be
loaded to fixed addresses.  In current code this is impossible as the
purgatory load is opaque to the architecture.

Patch #11: Moves x86 purgatories sha implementation to common lib/
directory to allow reuse in other architectures.

This patch (of 11)

When building the kernel with CONFIG_KEXEC_FILE enabled gcc prints a
compile warning multiple times.

  In file included from <path>/linux/init/initramfs.c:526:0:
  <path>/include/linux/kexec.h:120:9: warning: `struct kimage' declared inside parameter list [enabled by default]
           unsigned long cmdline_len);
           ^

This is because the typedefs for kexec_file_load uses struct kimage
before it is declared.  Fix this by simply forward declaring struct
kimage.

Link: http://lkml.kernel.org/r/20180321112751.22196-2-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kexec_file, x86: move re-factored code to generic side

In the previous patches, commonly-used routines, exclude_mem_range() and
prepare_elf64_headers(), were carved out. Now place them in kexec
common code. A prefix "crash_" is given to each of their names to avoid
possible name collisions.

Link: http://lkml.kernel.org/r/20180306102303.9063-8-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: kexec_file: clean up prepare_elf64_headers()

Removing bufp variable in prepare_elf64_headers() makes the code simpler
and more understandable.

Link: http://lkml.kernel.org/r/20180306102303.9063-7-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer

While CRASH_MAX_RANGES (== 16) seems to be good enough, fixed-number
array is not a good idea in general.

In this patch, size of crash_mem buffer is calculated as before and the
buffer is now dynamically allocated. This change also allows removing
crash_elf_data structure.

Link: http://lkml.kernel.org/r/20180306102303.9063-6-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()

The code guarded by CONFIG_X86_64 is necessary on some architectures
which have a dedicated kernel mapping outside of linear memory mapping.
(arm64 is among those.)

In this patch, an additional argument, kernel_map, is added to enable/
disable the code removing #ifdef.

Link: http://lkml.kernel.org/r/20180306102303.9063-5-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: kexec_file: purge system-ram walking from prepare_elf64_headers()

While prepare_elf64_headers() in x86 looks pretty generic for other
architectures' use, it contains some code which tries to list crash
memory regions by walking through system resources, which is not always
architecture agnostic. To make this function more generic, the related
code should be purged.

In this patch, prepare_elf64_headers() simply scans crash_mem buffer
passed and add all the listed regions to elf header as a PT_LOAD
segment. So walk_system_ram_res(prepare_elf64_headers_callback) have
been moved forward before prepare_elf64_headers() where the callback,
prepare_elf64_headers_callback(), is now responsible for filling up
crash_mem buffer.

Meanwhile exclude_elf_header_ranges() used to be called every time in
this callback it is rather redundant and now called only once in
prepare_elf_headers() as well.

Link: http://lkml.kernel.org/r/20180306102303.9063-4-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kexec_file,x86,powerpc: factor out kexec_file_ops functions

As arch_kexec_kernel_image_{probe,load}(),
arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig()
are almost duplicated among architectures, they can be commonalized with
an architecture-defined kexec_file_ops array. So let's factor them out.

Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kexec_file: make use of purgatory optional

Patch series "kexec_file, x86, powerpc: refactoring for other
architecutres", v2.

This is a preparatory patchset for adding kexec_file support on arm64.

It was originally included in a arm64 patch set[1], but Philipp is also
working on their kexec_file support on s390[2] and some changes are now
conflicting.

So these common parts were extracted and put into a separate patch set
for better integration.  What's more, my original patch#4 was split into
a few small chunks for easier review after Dave's comment.

As such, the resulting code is basically identical with my original, and
the only *visible* differences are:

- renaming of _kexec_kernel_image_probe() and  _kimage_file_post_load_cleanup()

- change one of types of arguments at prepare_elf64_headers()

Those, unfortunately, require a couple of trivial changes on the rest
(#1, #6 to #13) of my arm64 kexec_file patch set[1].

Patch #1 allows making a use of purgatory optional, particularly useful
for arm64.

Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load,
verify_sig}() and arch_kimage_file_post_load_cleanup() across
architectures.

Patches #3-#7 are also intended to generalize parse_elf64_headers(),
along with exclude_mem_range(), to be made best re-use of.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html
[2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html

This patch (of 7):

On arm64, crash dump kernel's usable memory is protected by *unmapping*
it from kernel virtual space unlike other architectures where the region
is just made read-only.  It is highly unlikely that the region is
accidentally corrupted and this observation rationalizes that digest
check code can also be dropped from purgatory.  The resulting code is so
simple as it doesn't require a bit ugly re-linking/relocation stuff,
i.e.  arch_kexec_apply_relocations_add().

Please see:

   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html

All that the purgatory does is to shuffle arguments and jump into a new
kernel, while we still need to have some space for a hash value
(purgatory_sha256_digest) which is never checked against.

As such, it doesn't make sense to have trampline code between old kernel
and new kernel on arm64.

This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and
allows related code to be compiled in only if necessary.

[takahiro.akashi@linaro.org: fix trivial screwup]
Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org
Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

proc: revalidate misc dentries

If module removes proc directory while another process pins it by
chdir'ing to it, then subsequent recreation of proc entry and all
entries down the tree will not be visible to any process until pinning
process unchdir from directory and unpins everything.

Steps to reproduce:

proc_mkdir("aaa", NULL);
proc_create("aaa/bbb", ...);

chdir("/proc/aaa");

remove_proc_entry("aaa/bbb", NULL);
remove_proc_entry("aaa", NULL);

proc_mkdir("aaa", NULL);
# inaccessible because "aaa" dentry still points
# to the original "aaa".
proc_create("aaa/bbb", ...);

Fix is to implement ->d_revalidate and ->d_delete.

Link: http://lkml.kernel.org/r/20180312201938.GA4871@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, slab: reschedule cache_reap() on the same CPU

cache_reap() is initially scheduled in start_cpu_timer() via
schedule_delayed_work_on(). But then the next iterations are scheduled
via schedule_delayed_work(), i.e. using WORK_CPU_UNBOUND.

Thus since commit ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND
work on wq_unbound_cpumask CPUs") there is no guarantee the future
iterations will run on the originally intended cpu, although it's still
preferred.  I was able to demonstrate this with
/sys/module/workqueue/parameters/debug_force_rr_cpu.  IIUC, it may also
happen due to migrating timers in nohz context.  As a result, some cpu's
would be calling cache_reap() more frequently and others never.

This patch uses schedule_delayed_work_on() with the current cpu when
scheduling the next iteration.

Link: http://lkml.kernel.org/r/20180411070007.32225-1-vbabka@suse.cz
Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kexec: export PG_swapbacked to VMCOREINFO

Since commit 6326fec1122c ("mm: Use owner_priv bit for PageSwapCache,
valid when PageSwapBacked"), PG_swapcache is an alias for
PG_owner_priv_1, which may be also used for other purposes.

To know whether the bit indeed has the PG_swapcache meaning, it is
necessary to check PG_swapbacked, hence this bit must be exported.

Link: http://lkml.kernel.org/r/20180410161345.142e142d@ezekiel.suse.cz
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Xunlei Pang <xlpang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: "Marc-Andr Lureau" <marcandre.lureau@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipc/shm: fix use-after-free of shm file via remap_file_pages()

syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages().

Unfortunately it couldn't generate a reproducer, but I found a bug which
I think caused it.  When remap_file_pages() is passed a full System V
shared memory segment, the memory is first unmapped, then a new map is
created using the ->vm_file.  Between these steps, the shm ID can be
removed and reused for a new shm segment.  But, shm_mmap() only checks
whether the ID is currently valid before calling the underlying file's
->mmap(); it doesn't check whether it was reused.  Thus it can use the
wrong underlying file, one that was already freed.

Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches
the one associated with the "outer" file.

Taking the reference to the real shm file is needed to fully solve the
problem, since otherwise sfd->file could point to a freed file, which
then could be reallocated for the reused shm ID, causing the wrong shm
segment to be mapped (and without the required permission checks).

Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because
it didn't consider the case where the shm ID is reused.

The following program usually reproduces this bug:

#include <stdlib.h>
#include <sys/shm.h>
#include <sys/syscall.h>
#include <unistd.h>

int main()
{
int is_parent = (fork() != 0);
srand(getpid());
for (;;) {
int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
if (is_parent) {
void *addr = shmat(id, NULL, 0);
usleep(rand() % 50);
while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
} else {
usleep(rand() % 50);
shmctl(id, IPC_RMID, NULL);
}
}
}

It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed.  (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report.  But I think it's
possible with this bug; it would just take a more extraordinary race...)

BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
[...]
Call Trace:
file_accessed include/linux/fs.h:2063 [inline]
shmem_mmap+0x25/0x40 mm/shmem.c:2149
call_mmap include/linux/fs.h:1789 [inline]
shm_mmap+0x34/0x80 ipc/shm.c:465
call_mmap include/linux/fs.h:1789 [inline]
mmap_region+0x309/0x5b0 mm/mmap.c:1712
do_mmap+0x294/0x4a0 mm/mmap.c:1483
do_mmap_pgoff include/linux/mm.h:2235 [inline]
SYSC_remap_file_pages mm/mmap.c:2853 [inline]
SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7

[ebiggers@google.com: add comment]
Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/filemap.c: provide dummy filemap_page_mkwrite() for NOMMU

Building orangefs on MMU-less machines now results in a link error
because of the newly introduced use of the filemap_page_mkwrite()
function:

ERROR: "filemap_page_mkwrite" [fs/orangefs/orangefs.ko] undefined!

This adds a dummy version for it, similar to the existing
generic_file_mmap and generic_file_readonly_mmap stubs in the same file,
to avoid the link error without adding #ifdefs in each file system that
uses these.

Link: http://lkml.kernel.org/r/20180409105555.2439976-1-arnd@arndb.de
Fixes: a5135eeab2e5 ("orangefs: implement vm_ops->fault")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Martin Brandenburg <martin@omnibond.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/gup.c: document return value

__get_user_pages_fast handles errors differently from
get_user_pages_fast: the former always returns the number of pages
pinned, the later might return a negative error code.

Link: http://lkml.kernel.org/r/1522962072-182137-6-git-send-email-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

get_user_pages_fast(): return -EFAULT on access_ok failure

get_user_pages_fast is supposed to be a faster drop-in equivalent of
get_user_pages.  As such, callers expect it to return a negative return
code when passed an invalid address, and never expect it to return 0
when passed a positive number of pages, since its documentation says:

* Returns number of pages pinned. This may be fewer than the number
* requested. If nr_pages is 0 or negative, returns 0. If no pages
* were pinned, returns -errno.

When get_user_pages_fast fall back on get_user_pages this is exactly
what happens.  Unfortunately the implementation is inconsistent: it
returns 0 if passed a kernel address, confusing callers: for example,
the following is pretty common but does not appear to do the right thing
with a kernel address:

        ret = get_user_pages_fast(addr, 1, writeable, &page);
        if (ret < 0)
                return ret;

Change get_user_pages_fast to return -EFAULT when supplied a kernel
address to make it match expectations.

All callers have been audited for consistency with the documented
semantics.

Link: http://lkml.kernel.org/r/1522962072-182137-4-git-send-email-mst@redhat.com
Fixes: 5b65c4677a57 ("mm, x86/mm: Fix performance regression in get_user_pages_fast()")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm/gup_benchmark: handle gup failures

Patch series "mm/get_user_pages_fast fixes, cleanups", v2.

Turns out get_user_pages_fast and __get_user_pages_fast return different
values on error when given a single page: __get_user_pages_fast returns
0. get_user_pages_fast returns either 0 or an error.

Callers of get_user_pages_fast expect an error so fix it up to return an
error consistently.

Stress the difference between get_user_pages_fast and
__get_user_pages_fast to make sure callers aren't confused.

This patch (of 3):

__gup_benchmark_ioctl does not handle the case where get_user_pages_fast
fails:

- a negative return code will cause a buffer overrun

- returning with partial success will cause use of uninitialized
memory.

[akpm@linux-foundation.org: simplification]
Link: http://lkml.kernel.org/r/1522962072-182137-3-git-send-email-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

resource: fix integer overflow at reallocation

We've got a bug report indicating a kernel panic at booting on an x86-32
system, and it turned out to be the invalid PCI resource assigned after
reallocation.  __find_resource() first aligns the resource start address
and resets the end address with start+size-1 accordingly, then checks
whether it's contained.  Here the end address may overflow the integer,
although resource_contains() still returns true because the function
validates only start and end address.  So this ends up with returning an
invalid resource (start > end).

There was already an attempt to cover such a problem in the commit
47ea91b4052d ("Resource: fix wrong resource window calculation"), but
this case is an overseen one.

This patch adds the validity check of the newly calculated resource for
avoiding the integer overflow problem.

Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=1086739
Link: http://lkml.kernel.org/r/s5hpo37d5l8.wl-tiwai@suse.de
Fixes: 23c570a67448 ("resource: ability to resize an allocated resource")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reported-by: Michael Henders <hendersm@shaw.ca>
Tested-by: Michael Henders <hendersm@shaw.ca>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs updates from Miklos Szeredi:
"In addition to bug fixes and cleanups there are two new features from
  Amir:

   - Consistent inode number support for the case when layers are not
     all on the same filesystem (feature is dubbed "xino").

   - Optimize overlayfs file handle decoding. This one touches the
     exportfs interface to allow detecting the disconnected directory
     case"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: update documentation w.r.t "xino" feature
  ovl: add support for "xino" mount and config options
  ovl: consistent d_ino for non-samefs with xino
  ovl: consistent i_ino for non-samefs with xino
  ovl: constant st_ino for non-samefs with xino
  ovl: allocate anon bdev per unique lower fs
  ovl: factor out ovl_map_dev_ino() helper
  ovl: cleanup ovl_update_time()
  ovl: add WARN_ON() for non-dir redirect cases
  ovl: cleanup setting OVL_INDEX
  ovl: set d->is_dir and d->opaque for last path element
  ovl: Do not check for redirect if this is last layer
  ovl: lookup in inode cache first when decoding lower file handle
  ovl: do not try to reconnect a disconnected origin dentry
  ovl: disambiguate ovl_encode_fh()
  ovl: set lower layer st_dev only if setting lower st_ino
  ovl: fix lookup with middle layer opaque dir and absolute path redirects
  ovl: Set d->last properly during lookup
  ovl: set i_ino to the value of st_ino for NFS export

Merge branch 'next' of git://git./linux/kernel/git/rzhang/linux

Pull thermal management update from Zhang Rui:

- Fix race condition in imx_thermal_probe() (Mikhail Lappo)

- Add cooling device's statistics in sysfs (Viresh Kumar)

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: Add cooling device's statistics in sysfs
thermal: imx: Fix race condition in imx_thermal_probe()

Merge branch 'dmi-for-linus' of git://git./linux/kernel/git/jdelvare/staging

Pull dmi updates from Jean Delvare.

* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  firmware: dmi_scan: Use lowercase letters for UUID
  firmware: dmi_scan: Add DMI_OEM_STRING support to dmi_matches
  firmware: dmi_scan: Fix UUID length safety check

Merge tag 'chrome-platform-for-linus-4.17' of git://git./linux/kernel/git/bleung/chrome-platform

Pull chrome platform updates from Benson Leung:

- a series from Dmitry to remove platform data from chromeos_laptop.c,
   which was the only user of platform data for the atmel_mxt_ts driver.

- a series to clean up sysfs and debugfs for cros_ec

- other misc cleanups

* tag 'chrome-platform-for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform: (22 commits)
  platform/chrome: mfd/cros_ec_dev: Add sysfs entry to set keyboard wake lid angle
  platform/chrome: cros_ec_debugfs: Add PD port info to debugfs
  platform/chrome: cros_ec_debugfs: Use octal permissions '0444'
  platform/chrome: cros_ec_sysfs: use permission-specific DEVICE_ATTR variants
  platform/chrome: cros_ec_sysfs: introduce to_cros_ec_dev define.
  platform/chrome: cros_ec_sysfs: Modify error handling
  platform/chrome: cros_ec_lpc: Add support for Google devices using custom coreboot firmware
  platform/chrome: cros_ec_lpc: wake up from s2idle on Chrome EC
  Input: atmel_mxt_ts - remove platform data support
  platform/chrome: chromeos_laptop - discard data for unneeded boards
  platform/chrome: chromeos_laptop - use device properties for Pixel
  platform/chrome: chromeos_laptop - rely on I2C to set up interrupt trigger
  platform/chrome: chromeos_laptop - use I2C notifier to create devices
  platform/chrome: chromeos_laptop - parse DMI IRQ data once
  platform/chrome: chromeos_laptop - rework i2c peripherals initialization
  platform/chrome: chromeos_laptop - factor out getting IRQ from DMI
  platform/chrome: chromeos_laptop - introduce pr_fmt()
  platform/chrome: chromeos_laptop - stop setting suspend mode for Atmel devices
  platform/chrome: chromeos_laptop - add SPDX identifier
  Input: atmel_mxt_ts - switch ChromeOS ACPI devices to generic props
  ...

Merge tag 'clk-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk updates from Stephen Boyd:
"The large diff this time around is from the addition of a new clk
  driver for the TI Davinci family of SoCs. So far those clks have been
  supported with a custom implementation of the clk API in the arch port
  instead of in the CCF. With this driver merged we're one step closer
  to having a single clk API implementation.

  The other large diff is from the Amlogic clk driver that underwent
  some major surgery to use regmap. Beyond that, the biggest hitter is
  Samsung which needed some reworks to properly handle clk provider
  power domains and a bunch of PLL rate updates.

  The core framework was fairly quiet this round, just getting some
  cleanups and small fixes for some of the more esoteric features. And
  the usual set of driver non-critical fixes, cleanups, and minor
  additions are here as well.

  Core:
   - Rejig clk_ops::init() to be a little earlier for phase/accuracy ops
   - debugfs ops macroized to shave some lines of boilerplate code
   - Always calculate the phase instead of caching it in clk_get_phase()
   - More __must_check on bulk clk APIs

  New Drivers:
   - TI's Davinci family of SoCs
   - Intel's Stratix10 SoC
   - stm32mp157 SoC
   - Allwinner H6 CCU
   - Silicon Labs SI544 clock generator chip
   - Renesas R-Car M3-N and V3H SoCs
   - i.MX6SLL SoCs

  Removed Drivers:
   - ST-Ericsson AB8540/9540

  Updates:
   - Mediatek MT2701 and MT7622 audsys support and MT2712 updates
   - STM32F469 DSI and STM32F769 sdmmc2 support
   - GPIO clks can sleep now
   - Spreadtrum SC9860 RTC clks
   - Nvidia Tegra MBIST workarounds and various minor fixes
   - Rockchip phase handling fixes and a memory leak plugged
   - Renesas drivers switch to readl/writel from clk_readl/clk_writel
   - Renesas gained CPU (Z/Z2) and watchdog support
   - Rockchip rk3328 display clks and rk3399 1.6GHz PLL support
   - Qualcomm PM8921 PMIC XO buffers
   - Amlogic migrates to regmap APIs
   - TI Keystone clk latching support
   - Allwinner H3 and H5 video clk fixes
   - Broadcom BCM2835 PLLs needed another bit to enable
   - i.MX6SX CKO mux fix and i.MX7D Video PLL divider fix
   - i.MX6UL/ULL epdc_podf support
   - Hi3798CV200 COMBPHY0 and USB2_OTG_UTMI and phase support for eMMC"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (233 commits)
  clk: davinci: add a reset lookup table for psc0
  clk: imx: add clock driver for imx6sll
  dt-bindings: imx: update clock doc for imx6sll
  clk: imx: add new gate/gate2 wrapper funtion
  clk: imx: Add CLK_IS_CRITICAL flag for busy divider and busy mux
  clk: cs2000: set pm_ops in hibernate-compatible way
  clk: bcm2835: De-assert/assert PLL reset signal when appropriate
  clk: imx7d: Move clks_init_on before any clock operations
  clk: imx7d: Correct ahb clk parent select
  clk: imx7d: Correct dram pll type
  clk: imx7d: Add USB clock information
  clk: socfpga: stratix10: add clock driver for Stratix10 platform
  dt-bindings: documentation: add clock bindings information for Stratix10
  clk: ti: fix flag space conflict with clkctrl clocks
  clk: uniphier: add additional ethernet clock lines for Pro4
  clk: uniphier: add SATA clock control support
  clk: uniphier: add PCIe clock control support
  clk: Add driver for the si544 clock generator chip
  clk: davinci: Remove redundant dev_err calls
  clk: uniphier: add ethernet clock control support for PXs3
  ...

Merge tag 'pwm/for-4.17-rc1' of git://git./linux/kernel/git/thierry.reding/linux-pwm

Pull pwm updates from Thierry Reding:
"This set of changes adds support for more generations of the RCar
  controller as well as runtime PM support. The JZ4740 driver gains
  support for device tree and can now be used on all Ingenic SoCs.

  Rounding things off is a random assortment of fixes and cleanups all
  across the board"

* tag 'pwm/for-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (29 commits)
  pwm: rcar: Add suspend/resume support
  pwm: rcar: Use PM Runtime to control module clock
  dt-bindings: pwm: rcar: Add bindings for R-Car M3N support
  pwm: rcar: Fix a condition to prevent mismatch value setting to duty
  pwm: sysfs: Use put_device() instead of kfree()
  dt-bindings: pwm: sunxi: Add new compatible strings
  pwm: sun4i: Simplify controller mapping
  pwm: sun4i: Drop unused .has_rdy member
  pwm: sun4i: Properly check current state
  pwm: Remove depends on AVR32
  pwm: stm32: LPTimer: Use 3 cells ->of_xlate()
  dt-bindings: pwm-stm32-lp: Add #pwm-cells
  pwm: stm32: Protect common prescaler for all channels
  pwm: stm32: Remove unused struct device
  pwm: mediatek: Improve precision in rate calculation
  pwm: mediatek: Remove redundant MODULE_ALIAS entries
  pwm: mediatek: Fix up PWM4 and PWM5 malfunction on MT7623
  pwm: jz4740: Enable for all Ingenic SoCs
  pwm: jz4740: Add support for devicetree
  pwm: jz4740: Implement ->set_polarity()
  ...

Merge tag 'linux-watchdog-4.17-rc1' of git://linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

- Add Nuvoton NPCM watchdog driver

- renesas_wdt: Add R-Car Gen2 support

- renesas_wdt: add suspend/resume and restart handler support

- hpwdt: convert to watchdog core and improve NMI

- improve timeout setting/handling in various drivers

- coh901327: make license text and module licence match

- fix error handling in asm9260_wdt, sprd_wdt and davinci_wdt

- aspeed imrovements

- dw improvements (for control register & suspend/resume)

- add SPDX identifiers for watchdog subsystem

* tag 'linux-watchdog-4.17-rc1' of git://www.linux-watchdog.org/linux-watchdog: (35 commits)
  watchdog: davinci_wdt: fix error handling in davinci_wdt_probe()
  watchdog: add SPDX identifiers for watchdog subsystem
  watchdog: aspeed: Allow configuring for alternate boot
  watchdog: Add Nuvoton NPCM watchdog driver
  dt-bindings: watchdog: Add Nuvoton NPCM description
  watchdog: dw: save/restore control and timeout across suspend/resume
  watchdog: dw: RMW the control register
  watchdog: sprd_wdt: Fix error handling in sprd_wdt_enable()
  watchdog: aspeed: Fix translation of reset mode to ctrl register
  watchdog: renesas_wdt: Add restart handler
  watchdog: renesas_wdt: Add R-Car Gen2 support
  watchdog: renesas_wdt: Add suspend/resume support
  watchdog: f71808e_wdt: Fix WD_EN register read
  watchdog: hpwdt: Update driver version.
  watchdog: hpwdt: Add dynamic debug
  watchdog: hpwdt: Programable Pretimeout NMI
  watchdog: hpwdt: remove allow_kdump module parameter.
  watchdog: hpwdt: condition early return of NMI handler on iLO5
  watchdog: hpwdt: Modify to use watchdog core.
  watchdog: hpwdt: Update nmi_panic message.
  ...

Merge tag 'apparmor-pr-2018-04-10' of git://git./linux/kernel/git/jj/linux-apparmor

Pull apparmor updates from John Johansen:
"Features:
  - add base infrastructure for socket mediation. ABI bump and
    additional checks to ensure only v8 compliant policy uses socket af
    mediation.
  - improve and cleanup dfa verification
  - improve profile attachment logic
     - improve overlapping expression handling
     - add the xattr matching to the attachment logic
  - improve signal mediation handling with stacked labels
  - improve handling of no_new_privs in a label stack

  Cleanups and changes:
  - use dfa to parse string split
  - bounded version of label_parse
  - proper line wrap nulldfa.in
  - split context out into task and cred naming to better match usage
  - simplify code in aafs

  Bug fixes:
  - fix display of .ns_name for containers
  - fix resource audit messages when auditing peer
  - fix logging of the existence test for signals
  - fix resource audit messages when auditing peer
  - fix display of .ns_name for containers
  - fix an error code in verify_table_headers()
  - fix memory leak on buffer on error exit path
  - fix error returns checks by making size a ssize_t"

* tag 'apparmor-pr-2018-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: (36 commits)
  apparmor: fix memory leak on buffer on error exit path
  apparmor: fix dangling symlinks to policy rawdata after replacement
  apparmor: Fix an error code in verify_table_headers()
  apparmor: fix error returns checks by making size a ssize_t
  apparmor: update MAINTAINERS file git and wiki locations
  apparmor: remove POLICY_MEDIATES_SAFE
  apparmor: add base infastructure for socket mediation
  apparmor: improve overlapping domain attachment resolution
  apparmor: convert attaching profiles via xattrs to use dfa matching
  apparmor: Add support for attaching profiles via xattr, presence and value
  apparmor: cleanup: simplify code to get ns symlink name
  apparmor: cleanup create_aafs() error path
  apparmor: dfa split verification of table headers
  apparmor: dfa add support for state differential encoding
  apparmor: dfa move character match into a macro
  apparmor: update domain transitions that are subsets of confinement at nnp
  apparmor: move context.h to cred.h
  apparmor: move task related defines and fns to task.X files
  apparmor: cleanup, drop unused fn __aa_task_is_confined()
  apparmor: cleanup fixup description of aa_replace_profiles
  ...

Merge tag 'for-linus-20180413' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Followup fixes for this merge window. This contains:

   - Series from Ming, fixing corner cases in our CPU <-> queue mapping.

     This triggered repeated warnings on especially s390, but I also hit
     it in cpu hot plug/unplug testing while doing IO on NVMe on x86-64.

   - Another fix from Ming, ensuring that we always order budget and
     driver tag identically, avoiding a deadlock on QD=1 devices.

   - Loop locking regression fix from this merge window, from Omar.

   - Another loop locking fix, this time missing an unlock, from Tetsuo
     Handa.

   - Fix for racing IO submission with device removal from Bart.

   - sr reference fix from me, fixing a case where disk change or
     getevents can race with device removal.

   - Set of nvme fixes by way of Keith, from various contributors"

* tag 'for-linus-20180413' of git://git.kernel.dk/linux-block: (28 commits)
  nvme: expand nvmf_check_if_ready checks
  nvme: Use admin command effects for admin commands
  nvmet: fix space padding in serial number
  nvme: check return value of init_srcu_struct function
  nvmet: Fix nvmet_execute_write_zeroes sector count
  nvme-pci: Separate IO and admin queue IRQ vectors
  nvme-pci: Remove unused queue parameter
  nvme-pci: Skip queue deletion if there are no queues
  nvme: target: fix buffer overflow
  nvme: don't send keep-alives to the discovery controller
  nvme: unexport nvme_start_keep_alive
  nvme-loop: fix kernel oops in case of unhandled command
  nvme: enforce 64bit offset for nvme_get_log_ext fn
  sr: get/drop reference to device in revalidate and check_events
  blk-mq: Revert "blk-mq: reimplement blk_mq_hw_queue_mapped"
  blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash
  backing: silence compiler warning using __printf
  blk-mq: remove code for dealing with remapping queue
  blk-mq: reimplement blk_mq_hw_queue_mapped
  blk-mq: don't check queue mapped in __blk_mq_delay_run_hw_queue()
  ...

Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull more i2c updates from Wolfram Sang:

- hot bugfix for i801 to make laptops with strange BIOS reboot again
   when using SMBUS Host notify

- change to MAINTAINERS creating a specific fallback entry for I2C host
   drivers and settings its status to "Odd fixes"

- a long overdue param checking for the I2C core

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: add param sanity check to i2c_transfer()
  MAINTAINERS: add maintainer for Renesas I2C related drivers
  MAINTAINERS: remove me as maintainer for I2C host drivers
  i2c: i801: Restore configuration at shutdown
  i2c: i801: Save register SMBSLVCMD value only once

Merge tag 'sh-for-4.17' of git://git.libc.org/linux-sh

Pull arch/sh updates from Rich Felker:
"Fixes for bugs in futex, device tree, and userspace breakpoint traps,
  and for PCI issues on SH7786"

* tag 'sh-for-4.17' of git://git.libc.org/linux-sh:
  arch/sh: pcie-sh7786: handle non-zero DMA offset
  arch/sh: pcie-sh7786: adjust the memory mapping
  arch/sh: pcie-sh7786: adjust PCI MEM and IO regions
  arch/sh: pcie-sh7786: exclude unusable PCI MEM areas
  arch/sh: pcie-sh7786: mark unavailable PCI resource as disabled
  arch/sh: pci: don't use disabled resources
  arch/sh: make the DMA mapping operations observe dev->dma_pfn_offset
  arch/sh: add sh7786_mm_sel() function
  sh: fix debug trap failure to process signals before return to user
  sh: fix memory corruption of unflattened device tree
  sh: fix futex FUTEX_OP_SET op on userspace addresses

Merge tag 'arm64-upstream' of git://git./linux/kernel/git/arm64/linux

Pull more arm64 updates from Will Deacon:
"A few late updates to address some issues arising from conflicts with
  other trees:

   - Removal of Qualcomm-specific Spectre-v2 mitigation in favour of the
     generic SMCCC-based firmware call

   - Fix EL2 hardening capability checking, which was bodged to reduce
     conflicts with the KVM tree

   - Add some currently unused assembler macros for managing SIMD
     registers which will be used by some crypto code in the next merge
     window"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: assembler: add macros to conditionally yield the NEON under PREEMPT
  arm64: assembler: add utility macros to push/pop stack frames
  arm64: Move the content of bpi.S to hyp-entry.S
  arm64: Get rid of __smccc_workaround_1_hvc_*
  arm64: capabilities: Rework EL2 vector hardening entry
  arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening

Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull  more s390 updates from Martin Schwidefsky:
"Three notable larger changes next to the usual bug fixing:

   - update the email addresses in MAINTAINERS for the s390 folks to use
     the simpler linux.ibm.com domain instead of the old
     linux.vnet.ibm.com

   - an update for the zcrypt device driver that removes some old and
     obsolete interfaces and add support for up to 256 crypto adapters

   - a rework of the IPL aka boot code"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (23 commits)
  s390: correct nospec auto detection init order
  s390/zcrypt: Support up to 256 crypto adapters.
  s390/zcrypt: Remove deprecated zcrypt proc interface.
  s390/zcrypt: Remove deprecated ioctls.
  s390/zcrypt: Make ap init functions static.
  MAINTAINERS: update s390 maintainers email addresses
  s390/ipl: remove reipl_method and dump_method
  s390/ipl: correct kdump reipl block checksum calculation
  s390/ipl: remove non-existing functions declaration
  s390: assume diag308 set always works
  s390/ipl: avoid adding scpdata to cmdline during ftp/dvd boot
  s390/ipl: correct ipl parmblock valid checks
  s390/ipl: rely on diag308 store to get ipl info
  s390/ipl: move ipl_flags to ipl.c
  s390/ipl: get rid of ipl_ssid and ipl_devno
  s390/ipl: unite diag308 and scsi boot ipl blocks
  s390/ipl: ensure loadparm valid flag is set
  s390/qdio: lock device while installing IRQ handler
  s390/qdio: clear intparm during shutdown
  s390/ccwgroup: require at least one ccw device
  ...

powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features

The cpu_has_feature() mechanism has an optimisation where at build
time we construct a mask of the CPU feature bits that will always be
true for the given .config, based on the platform/bitness/etc. that we
are building for.

That is incompatible with DT CPU features, where the set of CPU
features is dependent on feature flags that are given to us by
firmware.

The result is that some feature bits can not be *disabled* by DT CPU
features. Or more accurately, they can be disabled but they will still
appear in the ALWAYS mask, meaning cpu_has_feature() will always
return true for them.

In the past this hasn't really been a problem because on Book3S
64 (where we support DT CPU features), the set of ALWAYS bits has been
very small. That was because we always built for POWER4 and later,
meaning the set of common bits was small.

The only bit that could be cleared by DT CPU features that was also in
the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in
the alignment handler to create a fake DSISR. That code was itself
deleted in 31bfdb036f12 ("powerpc: Use instruction emulation
infrastructure to handle alignment faults") (Sep 2017).

However the set of ALWAYS features changed with the recent commit
db5ae1c155af ("powerpc/64s: Refine feature sets for little endian
builds") which restricted the set of feature flags when building
little endian to Power7 or later. That caused the ALWAYS mask to
become much larger for little endian builds.

The result is that the following feature bits can currently not
be *disabled* by DT CPU features:

  CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT,
  CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY,
  CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR.

To fix it we need to mask the set of ALWAYS features with the base set
of DT CPU features, ie. the features that are always enabled by DT CPU
features. That way there are no bits in the ALWAYS mask that are not
also always set by DT CPU features.

Fixes: db5ae1c155af ("powerpc/64s: Refine feature sets for little endian builds")
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

firmware: dmi_scan: Use lowercase letters for UUID

RFC 4122 asks for letters a-f in UUID to be lowercase. Follow this
recommendation.

Suggested by Paul Dagnelie at:
https://savannah.nongnu.org/bugs/index.php?53569

Signed-off-by: Jean Delvare <jdelvare@suse.de>

firmware: dmi_scan: Add DMI_OEM_STRING support to dmi_matches

OEM strings are defined by each OEM and they contain customized and
useful OEM information. Supporting it provides more flexible uses of
the dmi_matches function.

Signed-off-by: Alex Hung <alex.hung@canonical.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>

firmware: dmi_scan: Fix UUID length safety check

The test which ensures that the DMI type 1 structure is long enough
to hold the UUID is off by one. It would fail if the structure is
exactly 24 bytes long, while that's sufficient to hold the UUID.

I don't expect this bug to cause problem in practice because all
implementations I have seen had length 8, 25 or 27 bytes, in line
with the SMBIOS specifications. But let's fix it still.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: a814c3597a6b ("firmware: dmi_scan: Check DMI structure length")
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

Merge branches 'thermal-core' and 'thermal-soc' into next

Merge tag 'drm-fixes-for-v4.17-rc1' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"One omap, and one alsa pm fix (we merged the breaking patch via drm
  tree).

  Otherwise it's two bunches of amdgpu fixes, removing an unneeded file,
  some DC fixes, HDMI audio regression fix, and some vega12 fixes"

* tag 'drm-fixes-for-v4.17-rc1' of git://people.freedesktop.org/~airlied/linux: (27 commits)
  Revert "drm/amd/display: disable CRTCs with NULL FB on their primary plane (V2)"
  Revert "drm/amd/display: fix dereferencing possible ERR_PTR()"
  drm/amd/display: Fix regamma not affecting full-intensity color values
  drm/amd/display: Fix FBC text console corruption
  drm/amd/display: Only register backlight device if embedded panel connected
  drm/amd/display: fix brightness level after resume from suspend
  drm/amd/display: HDMI has no sound after Panel power off/on
  drm/amdgpu: add MP1 and THM hw ip base reg offset
  drm/amdgpu: fix null pointer panic with direct fw loading on gpu reset
  drm/radeon: add PX quirk for Asus K73TK
  drm/omap: fix crash if there's no video PLL
  drm/amdgpu: Fix memory leaks at amdgpu_init() error path
  drm/amdgpu: Fix PCIe lane width calculation
  drm/radeon: Fix PCIe lane width calculation
  drm/amdgpu/si: implement get/set pcie_lanes asic callback
  drm/amdgpu: Add support for SRBM selection v3
  Revert "drm/amdgpu: Don't change preferred domian when fallback GTT v5"
  drm/amd/powerply: fix power reading on Fiji
  drm/amd/powerplay: Enable ACG SS feature
  drm/amdgpu/sdma: fix mask in emit_pipeline_sync
  ...

Merge tag 'trace-v4.17-2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"A few clean ups and bug fixes:

   - replace open coded "ARRAY_SIZE()" with macro

   - updates to uprobes

   - bug fix for perf event filter on error path"

* tag 'trace-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Enforce passing in filter=NULL to create_filter()
  trace_uprobe: Simplify probes_seq_show()
  trace_uprobe: Use %lx to display offset
  tracing/uprobe: Add support for overlayfs
  tracing: Use ARRAY_SIZE() macro instead of open coding it

proc: fixup copyright sign

Add copyright in two files before they get autorubberstamped.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'pci-v4.17-changes-2' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

- mark Extended Tags as broken on Broadcom HT1100 and HT2000 Root Ports
   to fix drm/Xorg hangs and unresponsive keyboards (Sinan Kaya)

- remove useless messages during resource reassignment (Desnes A. Nunes
   do Rosario)

* tag 'pci-v4.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Remove messages about reassigning resources
  PCI: Mark Broadcom HT1100 and HT2000 Root Port Extended Tags as broken

Merge branch 'parisc-4.17-2' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc updates from Helge Deller:

- fix panic when halting system via "shutdown -h now"

- drop own coding in favour of generic CONFIG_COMPAT_BINFMT_ELF
   implementation

- add FPE_CONDTRAP constant: last outstanding parisc-specific cleanup
   for Eric Biedermans siginfo patches

- move some functions to .init and some to .text.hot linker sections

* 'parisc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Prevent panic at system halt
  parisc: Switch to generic COMPAT_BINFMT_ELF
  parisc: Move cache flush functions into .text.hot section
  parisc/signal: Add FPE_CONDTRAP for conditional trap handling

arch/sh: pcie-sh7786: handle non-zero DMA offset

On SuperH, the base of the physical memory might be different from
zero. In this case, PCI address zero will map to a non-zero physical
address. In order to make sure that the DMA mapping API takes care of
this DMA offset, we must fill in the dev->dma_pfn_offset field for PCI
devices. This gets done in the pcibios_bus_add_device() hook, called
for each new PCI device detected.

The dma_pfn_offset global variable is re-calculated for every PCI
controller available on the platform, but that's not an issue because
its value will each time be exactly the same, as it only depends on
the memory start address and memory size.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>

arch/sh: pcie-sh7786: adjust the memory mapping

The code setting up the PCI -> SuperHighway mapping doesn't take into
account the fact that the address stored in PCIELARx must be aligned
with the size stored in PCIELAMRx.

For example, when your physical memory starts at 0x0800_0000 (128 MB),
a size of 64 MB or 128 MB is fine. However, if you have 256 MB of
memory, it doesn't work because the base address is not aligned on the
size.

In such situation, we have to round down the base address to make sure
it is aligned on the size of the area. For for a 0x0800_0000 base
address with 256 MB of memory, we will round down to 0x0, and extend
the size of the mapping to 512 MB.

This allows the mapping to work on platforms that have 256 MB of
RAM. The current setup would only work with 128 MB of RAM or less.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>

arch/sh: pcie-sh7786: adjust PCI MEM and IO regions

The current definition of the PCIe IO and MEM resources for SH7786
doesn't match what the datasheet says. For example, for PCIe0
0xfe100000 is advertised by the datasheet as a PCI IO region, while
0xfd000000 is advertised as a PCI MEM region. The code currently
inverts the two.

The SH4A_PCIEPARL and SH4A_PCIEPTCTLR registers allow to define the
base address and role of the different regions (including whether it's
a MEM or IO region). However, practical experience on a SH7786 shows
that if 0xfe100000 is used for LEL and 0xfd000000 for IO, a PCIe
device using two MEM BARs cannot be accessed at all. Simply using
0xfe100000 for IO and 0xfd000000 for MEM makes the PCIe device
accessible.

It is very likely that this was never seen because there are two other
PCI MEM region listed in the resources. However, for different
reasons, none of the two other MEM regions are usable on the specific
SH7786 platform the problem was encountered. Therefore, the last MEM
region at 0xfe100000 was used to place the BARs, making the device
non-functional.

This commit therefore adjusts those PCI MEM and IO resources
definitions so that they match what the datasheet says. They have only
been tested with PCIe 0.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>

arch/sh: pcie-sh7786: exclude unusable PCI MEM areas

Depending on the physical memory layout, some PCI MEM areas are not
usable. According to the SH7786 datasheet, the PCI MEM area from
1000_0000 to 13FF_FFFF is only usable if the physical memory layout
(in MMSELR) is 1, 2, 5 or 6. In all other configurations, this PCI MEM
area is not usable (because it overlaps with DRAM).

Therefore, this commit adjusts the PCI SH7786 initialization to mark
the relevant PCI resource as IORESOURCE_DISABLED if we can't use it.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>

arch/sh: pcie-sh7786: mark unavailable PCI resource as disabled

Some PCI MEM resources are marked as IORESOURCE_MEM_32BIT, which means
they are only usable when the SH core runs in 32-bit mode. In 29-bit
mode, such memory regions are not usable.

The existing code for SH7786 properly skips such regions when
configuring the PCIe controller registers. However, because such
regions are still described in the resource array, the
pcibios_scanbus() function in the SuperH pci.c will register them to
the PCI core. Due to this, the PCI core will allocate MEM areas from
this resource, and assign BARs pointing to this area, even though it's
unusable.

In order to prevent this from happening, we mark such regions as
IORESOURCE_DISABLED, which tells the SuperH pci.c pcibios_scanbus()
function to skip them.

Note that we separate marking the region as disabled from skipping it,
because other regions will be marked as disabled in follow-up patches.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>

arch/sh: pci: don't use disabled resources

In pcibios_scanbus(), we provide to the PCI core the usable MEM and IO
regions using pci_add_resource_offset(). We travel through all
resources available in the "struct pci_channel".

Also, in register_pci_controller(), we travel through all resources to
request them, making sure they don't conflict with already requested
resources.

However, some resources may be disabled, in which case they should not
be requested nor provided to the PCI core.

In the current situation, none of the resources are disabled. However,
follow-up patches in this series will make some resources disabled,
making this preliminary change necessary.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>

arch/sh: make the DMA mapping operations observe dev->dma_pfn_offset

Some devices may have a non-zero DMA offset, i.e an offset between the
DMA address and the physical address. Such an offset can be encoded
into the dma_pfn_offset field of "struct device", but the SuperH
implementation of the DMA mapping API does not observe this
information.

This commit fixes that by ensuring the DMA address is properly
calculated depending on this DMA offset.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>

arch/sh: add sh7786_mm_sel() function

The SH7786 has different physical memory layout configurations,
configurable through the MMSELR register. The configuration is
typically defined by the bootloader, so Linux generally doesn't care.

Except that depending on the configuration, some PCI MEM areas may or
may not be available. This commit adds a helper function that allows
to retrieve the current physical memory layout configuration. It will
be used in a following patch to exclude unusable PCI MEM areas during
the PCI initialization.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>

sh: fix debug trap failure to process signals before return to user

When responding to a debug trap (breakpoint) in userspace, the
kernel's trap handler raised SIGTRAP but returned from the trap via a
code path that ignored pending signals, resulting in an infinite loop
re-executing the trapping instruction.

Signed-off-by: Rich Felker <dalias@libc.org>

sh: fix memory corruption of unflattened device tree

unflatten_device_tree() makes use of memblock allocation, and
therefore must be called before paging_init() migrates the memblock
allocation data to the bootmem framework. Otherwise the record of the
allocation for the expanded device tree will be lost, and will
eventually be clobbered when allocated for another use.

Signed-off-by: Rich Felker <dalias@libc.org>

sh: fix futex FUTEX_OP_SET op on userspace addresses

Commit 00b73d8d1b71 ("sh: add working futex atomic ops on userspace
addresses for smp") changed the futex_atomic_op_inuser function to
use a loop. In case of the FUTEX_OP_SET op with a userspace address
containing a value different of 0, this loop is an endless loop.

Fix that by loading the value of oldval from the userspace before doing
the cmpxchg op, also for the FUTEX_OP_SET case.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Rich Felker <dalias@libc.org>

Merge branch 'drm-next-4.17' of git://people.freedesktop.org/~agd5f/linux into drm-next

- Add a PX quirk for radeon
- Fix flickering and stability issues with DC on some platforms
- Fix HDMI audio regression
- Few other misc DC and base driver fixes

* 'drm-next-4.17' of git://people.freedesktop.org/~agd5f/linux:
  Revert "drm/amd/display: disable CRTCs with NULL FB on their primary plane (V2)"
  Revert "drm/amd/display: fix dereferencing possible ERR_PTR()"
  drm/amd/display: Fix regamma not affecting full-intensity color values
  drm/amd/display: Fix FBC text console corruption
  drm/amd/display: Only register backlight device if embedded panel connected
  drm/amd/display: fix brightness level after resume from suspend
  drm/amd/display: HDMI has no sound after Panel power off/on
  drm/amdgpu: add MP1 and THM hw ip base reg offset
  drm/amdgpu: fix null pointer panic with direct fw loading on gpu reset
  drm/radeon: add PX quirk for Asus K73TK

Merge tag 'drm-misc-next-fixes-2018-04-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

omap: Fix crash on AM4 EVM, and all OMAP2/3 boards (Tomi)

Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
* tag 'drm-misc-next-fixes-2018-04-11' of git://anongit.freedesktop.org/drm/drm-misc:
drm/omap: fix crash if there's no video PLL

Merge tag 'xfs-4.17-merge-4' of git://git./fs/xfs/xfs-linux

Pull more xfs updates from Darrick Wong:
"Most of these are code cleanups, but there are a couple of notable
  use-after-free bug fixes.

  This series has been run through a full xfstests run over the week and
  through a quick xfstests run against this morning's master, with no
  major failures reported.

   - clean up unnecessary function call parameters

   - fix a use-after-free bug when aborting logging intents

   - refactor filestreams state data to avoid use-after-free bug

   - fix incorrect removal of cow extents when truncating extended
     attributes.

   - refactor open-coded __set_page_dirty in favor of using vfs
     function.

   - fix a deadlock when fstrim and fs shutdown race"

* tag 'xfs-4.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  Force log to disk before reading the AGF during a fstrim
  Export __set_page_dirty
  xfs: only cancel cow blocks when truncating the data fork
  xfs: non-scrub - remove unused function parameters
  xfs: remove filestream item xfs_inode reference
  xfs: fix intent use-after-free on abort
  xfs: Remove "committed" argument of xfs_dir_ialloc

Merge tag 'gfs2-4.17.fixes2' of git://git./linux/kernel/git/gfs2/linux-gfs2

Pull more gfs2 updates from Bob Peterson:
"We decided to request the latest three patches to be merged into this
  merge window while it's still open.

   - The first patch adds a new function to lockref:
     lockref_put_not_zero

   - The second patch fixes GFS2's glock dump code so it uses the new
     lockref function. This fixes a problem whereby lock dumps could
     miss glocks.

   - I made a minor patch to update some comments and fix the lock
     ordering text in our gfs2-glocks.txt Documentation file"

* tag 'gfs2-4.17.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Minor improvements to comments and documentation
  gfs2: Stop using rhashtable_walk_peek
  lockref: Add lockref_put_not_zero

Merge tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
"Stable bugfixes:
   - xprtrdma: Fix corner cases when handling device removal # v4.12+
   - xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+

  Features:
   - New sunrpc tracepoint for RPC pings
   - Finer grained NFSv4 attribute checking
   - Don't unnecessarily return NFS v4 delegations

  Other bugfixes and cleanups:
   - Several other small NFSoRDMA cleanups
   - Improvements to the sunrpc RTT measurements
   - A few sunrpc tracepoint cleanups
   - Various fixes for NFS v4 lock notifications
   - Various sunrpc and NFS v4 XDR encoding cleanups
   - Switch to the ida_simple API
   - Fix NFSv4.1 exclusive create
   - Forget acl cache after setattr operation
   - Don't advance the nfs_entry readdir cookie if xdr decoding fails"

* tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (47 commits)
  NFS: advance nfs_entry cookie only after decoding completes successfully
  NFSv3/acl: forget acl cache after setattr
  NFSv4.1: Fix exclusive create
  NFSv4: Declare the size up to date after it was set.
  nfs: Use ida_simple API
  NFSv4: Fix the nfs_inode_set_delegation() arguments
  NFSv4: Clean up CB_GETATTR encoding
  NFSv4: Don't ask for attributes when ACCESS is protected by a delegation
  NFSv4: Add a helper to encode/decode struct timespec
  NFSv4: Clean up encode_attrs
  NFSv4; Clean up XDR encoding of type bitmap4
  NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group
  SUNRPC: Add a helper for encoding opaque data inline
  SUNRPC: Add helpers for decoding opaque and string types
  NFSv4: Ignore change attribute invalidations if we hold a delegation
  NFS: More fine grained attribute tracking
  NFS: Don't force unnecessary cache invalidation in nfs_update_inode()
  NFS: Don't redirty the attribute cache in nfs_wcc_update_inode()
  NFS: Don't force a revalidation of all attributes if change is missing
  NFS: Convert NFS_INO_INVALID flags to unsigned long
  ...

Merge branch 'work.thaw' of git://git./linux/kernel/git/viro/vfs

Pull vfs thaw updates from Al Viro:
"An ancient series that has fallen through the cracks in the previous
  cycle"

* 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  buffer.c: call thaw_super during emergency thaw
  vfs: factor sb iteration out of do_emergency_remount

Revert "drm/amd/display: disable CRTCs with NULL FB on their primary plane (V2)"

This seems to cause flickering and lock-ups for a wide range of users.
Revert until we've found a proper fix for the flickering and lock-ups.

This reverts commit 36cc549d59864b7161f0e23d710c1c4d1b9cf022.

Cc: Shirish S <shirish.s@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Merge branch 'afs-dh' of git://git./linux/kernel/git/viro/vfs

Pull AFS updates from Al Viro:
"The AFS series posted by dhowells depended upon lookup_one_len()
  rework; now that prereq is in the mainline, that series had been
  rebased on top of it and got some exposure and testing..."

* 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs: Do better accretion of small writes on newly created content
  afs: Add stats for data transfer operations
  afs: Trace protocol errors
  afs: Locally edit directory data for mkdir/create/unlink/...
  afs: Adjust the directory XDR structures
  afs: Split the directory content defs into a header
  afs: Fix directory handling
  afs: Split the dynroot stuff out and give it its own ops tables
  afs: Keep track of invalid-before version for dentry coherency
  afs: Rearrange status mapping
  afs: Make it possible to get the data version in readpage
  afs: Init inode before accessing cache
  afs: Introduce a statistics proc file
  afs: Dump bad status record
  afs: Implement @cell substitution handling
  afs: Implement @sys substitution handling
  afs: Prospectively look up extra files when doing a single lookup
  afs: Don't over-increment the cell usage count when pinning it
  afs: Fix checker warnings
  vfs: Remove the const from dir_context::actor

Revert "drm/amd/display: fix dereferencing possible ERR_PTR()"

This reverts commit cd2d6c92a8e39d7e50a5af9fcc67d07e6a89e91d.

Cc: Shirish S <shirish.s@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix regamma not affecting full-intensity color values

Hardware understands the regamma LUT as a piecewise linear function,
with points spaced exponentially along the range. We previously
programmed the LUT for range [2^-10, 2^0). This causes (normalized)
color values of 1 (=2^0) to miss the programmed LUT, and fall onto the
end region.

For DCE, the end region is extrapolated using a single (base, slope)
pair, using the max y-value from the last point in the curve as base.
This presents a problem, since this value affects all three color
channels. Scaling down the intensity of say - the blue regamma curve -
will not affect it's end region. This is especially noticiable when
using RedShift. It scales down the blue and green channels, but leaves
full-intensity colors unshifted.

Therefore, extend the range to cover [2^-10, 2^1) by programming another
hardware segment, containing only one point. That way, we won't be
hitting the end region.

Note that things are a bit different for DCN, since the end region can
be set per-channel.

Signed-off-by: Leo (Sunpeng) Li <sunpeng.li@amd.com>
Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix FBC text console corruption

Signed-off-by: Roman Li <roman.li@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Only register backlight device if embedded panel connected

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Tony Cheng <Tony.Cheng@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) In ip_gre tunnel, handle the conflict between TUNNEL_{SEQ,CSUM} and
    GSO/LLTX properly. From Sabrina Dubroca.

2) Stop properly on error in lan78xx_read_otp(), from Phil Elwell.

3) Don't uncompress in slip before rstate is initialized, from Tejaswi
    Tanikella.

4) When using 1.x firmware on aquantia, issue a deinit before we
    hardware reset the chip, otherwise we break dirty wake WOL. From
    Igor Russkikh.

5) Correct log check in vhost_vq_access_ok(), from Stefan Hajnoczi.

6) Fix ethtool -x crashes in bnxt_en, from Michael Chan.

7) Fix races in l2tp tunnel creation and duplicate tunnel detection,
    from Guillaume Nault.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
  l2tp: fix race in duplicate tunnel detection
  l2tp: fix races in tunnel creation
  tun: send netlink notification when the device is modified
  tun: set the flags before registering the netdevice
  lan78xx: Don't reset the interface on open
  bnxt_en: Fix NULL pointer dereference at bnxt_free_irq().
  bnxt_en: Need to include RDMA rings in bnxt_check_rings().
  bnxt_en: Support max-mtu with VF-reps
  bnxt_en: Ignore src port field in decap filter nodes
  bnxt_en: do not allow wildcard matches for L2 flows
  bnxt_en: Fix ethtool -x crash when device is down.
  vhost: return bool from *_access_ok() functions
  vhost: fix vhost_vq_access_ok() log check
  vhost: Fix vhost_copy_to_user()
  net: aquantia: oops when shutdown on already stopped device
  net: aquantia: Regression on reset with 1.x firmware
  cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
  slip: Check if rstate is initialized before uncompressing
  lan78xx: Avoid spurious kevent 4 "error"
  lan78xx: Correctly indicate invalid OTP
  ...

Merge tag 'for-linus-4.17-rc1-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"A few fixes of Xen related core code and drivers"

* tag 'for-linus-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pvh: Indicate XENFEAT_linux_rsdp_unrestricted to Xen
  xen/acpi: off by one in read_acpi_id()
  xen/acpi: upload _PSD info for non Dom0 CPUs too
  x86/xen: Delay get_cpu_cap until stack canary is established
  xen: xenbus_dev_frontend: Verify body of XS_TRANSACTION_END
  xen: xenbus: Catch closing of non existent transactions
  xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling

Merge tag 'dma-mapping-4.17-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:
"Fix for one swiotlb regression in 2.16 from Takashi"

* tag 'dma-mapping-4.17-2' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: fix unexpected swiotlb_alloc_coherent failures

Merge tag 'mmc-v4.17-2' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
"MMC core:
   - Prevent bus reference leak in mmc_blk_init()

  MMC host:
   - tmio: Fix error handling when issuing CMD23
   - jz4740: Fix race condition in IRQ mask update"

* tag 'mmc-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: tmio: Fix error handling when issuing CMD23
  mmc: core: Prevent bus reference leak in mmc_blk_init()
  mmc: jz4740: Fix race condition in IRQ mask update

Merge tag 'for_linus-4.16' of git://git./linux/kernel/git/jwessel/kgdb

Pull kdb updates from Jason Wessel:

- fix 2032 time access issues and new compiler warnings

- minor regression test cleanup

- formatting fixes for end user use of kdb

* tag 'for_linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
  kdb: use memmove instead of overlapping memcpy
  kdb: use ktime_get_mono_fast_ns() instead of ktime_get_ts()
  kdb: bl: don't use tab character in output
  kdb: drop newline in unknown command output
  kdb: make "mdr" command repeat
  kdb: use __ktime_get_real_seconds instead of __current_kernel_time
  misc: kgdbts: Display progress of asynchronous tests

Merge tag 'microblaze-4.17-rc1' of git://git.monstr.eu/linux-2.6-microblaze

Pull microblaze updates from Michal Simek:
"Use generic pci_mmap_resource_range()"

* tag 'microblaze-4.17-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Use generic pci_mmap_resource_range()
microblaze: Provide pgprot_device/writecombine macros for nommu

GFS2: Minor improvements to comments and documentation

This patch simply fixes some comments and the gfs2-glocks.txt file:
Places where i_rwsem was called i_mutex, and adding i_rw_mutex.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>

gfs2: Stop using rhashtable_walk_peek

Function rhashtable_walk_peek is problematic because there is no
guarantee that the glock previously returned still exists; when that key
is deleted, rhashtable_walk_peek can end up returning a different key,
which will cause an inconsistent glock dump. Fix this by keeping track
of the current glock in the seq file iterator functions instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>

lockref: Add lockref_put_not_zero

Put a lockref unless the lockref is dead or its count would become zero.
This is the same as lockref_put_or_lock except that the lock is never
left held.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>

Merge tag 'asm-generic' of git://git./linux/kernel/git/arnd/asm-generic

Pull asm-generic fixes from Arnd Bergmann:
"I have one regression fix for a minor build problem after the
  architecture removal series, plus a rework of the barriers in the
  readl/writel functions, thanks to work by Sinan Kaya:

  This started from a discussion on the linuxpcc and rdma mailing
  lists[1]. To summarize, we decided that architectures are responsible
  to serialize readl() and writel() accesses on a device MMIO space
  relative to DMA performed by that device.

  This series provides a pessimistic implementation of that behavior for
  asm-generic/io.h, which is in turn used by a number of architectures
  (h8300, microblaze, nios2, openrisc, s390, sparc, um, unicore32, and
  xtensa). Some of those presumably need no extra barriers, or something
  weaker than rmb()/wmb(), and they are advised to override the new
  default for better performance.

  For inb()/outb(), the same barriers are used, but architectures might
  want to add another barrier to outb() here if that can guarantee
  non-posted behavior (some architectures can, others cannot do that).

  The readl_relaxed()/writel_relaxed() family of functions retains the
  existing behavior with no extra barriers"

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-March/170481.html

* tag 'asm-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  io: change writeX_relaxed() to remove barriers
  io: change readX_relaxed() to remove barriers
  dts: remove cris & metag dts hard link file
  io: change inX() to have their own IO barrier overrides
  io: change outX() to have their own IO barrier overrides
  io: define stronger ordering for the default writeX() implementation
  io: define stronger ordering for the default readX() implementation
  io: define several IO & PIO barrier types for the asm-generic version

nvme: expand nvmf_check_if_ready checks

The nvmf_check_if_ready() checks that were added are very simplistic.
As such, the routine allows a lot of cases to fail ios during windows
of reset or re-connection. In cases where there are not multi-path
options present, the error goes back to the callee - the filesystem
or application. Not good.

The common routine was rewritten and calling syntax slightly expanded
so that per-transport is_ready routines don't need to be present.
The transports now call the routine directly. The routine is now a
fabrics routine rather than an inline function.

The routine now looks at controller state to decide the action to
take. Some states mandate io failure. Others define the condition where
a command can be accepted. When the decision is unclear, a generic
queue-or-reject check is made to look for failfast or multipath ios and
only fails the io if it is so marked. Otherwise, the io will be queued
and wait for the controller state to resolve.

Admin commands issued via ioctl share a live admin queue with commands
from the transport for controller init. The ioctls could be intermixed
with the initialization commands. It's possible for the ioctl cmd to
be issued prior to the controller being enabled. To block this, the
ioctl admin commands need to be distinguished from admin commands used
for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to
reflect this division and set it on ioctls requests. As the
nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(),
ensure that commands allocated by the ioctl path (actually anything
in core.c) preps the nvme_req(req) before starting the io. This will
preserve the USERCMD flag during execution and/or retry.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.e>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme: Use admin command effects for admin commands

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvmet: fix space padding in serial number

Commit 42de82a8b544 previously attempted to fix this, and it did
correctly pad the MN and FR fields with spaces, but the SN field still
contains 0 bytes. The current code fills out the first 16 bytes with
hex2bin, leaving the last 4 bytes zeroed. Rather than adding a lot of
error-prone math to avoid overwriting SN twice, just set the whole thing
to spaces up front (it's only 20 bytes).

Fixes: 42de82a8b544 ("nvmet: don't report 0-bytes in serial number")
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme: check return value of init_srcu_struct function

Also add error flow in case srcu initialization function fails.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvmet: Fix nvmet_execute_write_zeroes sector count

We have to increment the number of logical blocks to a 1's based value
in the native format prior to converting to 512b units.

Signed-off-by: Rodrigo R. Galvao <rosattig@linux.vnet.ibm.com>
[changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme-pci: Separate IO and admin queue IRQ vectors

The admin and first IO queues shared the first irq vector, which has an
affinity mask including cpu0. If a system allows cpu0 to be offlined,
the admin queue may not be usable if no other CPUs in the affinity mask
are online. This is a problem since unlike IO queues, there is only
one admin queue that always needs to be usable.

To fix, this patch allocates one pre_vector for the admin queue that
is assigned all CPUs, so will always be accessible. The IO queues are
assigned the remaining managed vectors.

In case a controller has only one interrupt vector available, the admin
and IO queues will share the pre_vector with all CPUs assigned.

Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme-pci: Remove unused queue parameter

All the queue memory is allocated up front. We don't take the node
into consideration when creating queues anymore, so removing the unused
parameter.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme-pci: Skip queue deletion if there are no queues

User reported controller always retains CSTS.RDY to 1, which fails
controller disabling when resetting the controller. This is also before
the admin queue is allocated, and trying to disable an unallocated queue
results in a NULL dereference.

Reported-by: Alex Gagniuc <Alex_Gagniuc@Dellteam.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme: target: fix buffer overflow

nvmet_execute_get_disc_log_page() passes a fixed-length string into
nvmet_format_discovery_entry(), which then does a longer memcpy() on
it, as pointed out by gcc-8:

In function 'nvmet_format_discovery_entry',
inlined from 'nvmet_execute_get_disc_log_page' at drivers/nvme/target/discovery.c:126:4:
drivers/nvme/target/discovery.c:62:2: error: 'memcpy' forming offset [38, 223] is out of the bounds [0, 37] [-Werror=array-bounds]
memcpy(e->subnqn, subsys_nqn, NVMF_NQN_SIZE);

Using strncpy() will make this well-defined, filling the rest of the
buffer with zeroes, under the assumption that the input is either
a NUL-terminated string, or a byte sequence containing no zeroes.
If the input is a string that is longer than NVMF_NQN_SIZE, we
continue to have no NUL-termination in the output.

Fixes: a07b4970f464 ("nvmet: add a generic NVMe target")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme: don't send keep-alives to the discovery controller

NVMe over Fabrics 1.0 Section 5.2 "Discovery Controller Properties and
Command Support" Figure 31 "Discovery Controller – Admin Commands"
explicitly listst all commands but "Get Log Page" and "Identify" as
reserved, but NetApp report the Linux host is sending Keep Alive
commands to the discovery controller, which is a violation of the
Spec.

We're already checking for discovery controllers when configuring the
keep alive timeout but when creating a discovery controller we're not
hard wiring the keep alive timeout to 0 and thus remain on
NVME_DEFAULT_KATO for the discovery controller.

This can be easily remproduced when issuing a direct connect to the
discovery susbsystem using:
'nvme connect [...] --nqn=nqn.2014-08.org.nvmexpress.discovery'

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: 07bfcd09a288 ("nvme-fabrics: add a generic NVMe over Fabrics library")
Reported-by: Martin George <marting@netapp.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme: unexport nvme_start_keep_alive

nvme_start_keep_alive() isn't used outside core.c so unexport it and
make it static.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme-loop: fix kernel oops in case of unhandled command

When nvmet_req_init() fails, __nvmet_req_complete() is called
to handle the target request via .queue_response(), so
nvme_loop_queue_response() shouldn't be called again for
handling the failure.

This patch fixes this case by the following way:

- move blk_mq_start_request() before nvmet_req_init(), so
nvme_loop_queue_response() may work well to complete this
host request

- don't call nvme_cleanup_cmd() which is done in nvme_loop_complete_rq()

- don't call nvme_loop_queue_response() which is done via
.queue_response()

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[trimmed changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

nvme: enforce 64bit offset for nvme_get_log_ext fn

Compiling on 32 bits system produces a warning for the shift width
when shifting 32 bit integer with 64bit integer.

Make sure that offset always is 64bit, and use macros for retrieving
lower and upper bits of the offset.

Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

powerpc/mm/radix: Fix checkstops caused by invalid tlbiel

In tlbiel_radix_set_isa300() we use the PPC_TLBIEL() macro to
construct tlbiel instructions. The instruction takes 5 fields, two of
which are registers, and the others are constants. But because it's
constructed with inline asm the compiler doesn't know that.

We got the constraint wrong on the 'r' field, using "r" tells the
compiler to put the value in a register. The value we then get in the
macro is the *register number*, not the value of the field.

That means when we mask the register number with 0x1 we get 0 or 1
depending on which register the compiler happens to put the constant
in, eg:

  li      r10,1
  tlbiel  r8,r9,2,0,0

  li      r7,1
  tlbiel  r10,r6,0,0,1

If we're unlucky we might generate an invalid instruction form, for
example RIC=0, PRS=1 and R=0, tlbiel r8,r7,0,1,0, this has been
observed to cause machine checks:

  Oops: Machine check, sig: 7 [#1]
  CPU: 24 PID: 0 Comm: swapper
  NIP:  00000000000385f4 LR: 000000000100ed00 CTR: 000000000000007f
  REGS: c00000000110bb40 TRAP: 0200
  MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 48002222  XER: 20040000
  CFAR: 00000000000385d0 DAR: 0000000000001c00 DSISR: 00000200 SOFTE: 1

If the machine check happens early in boot while we have MSR_ME=0 it
will escalate into a checkstop and kill the box entirely.

To fix it we could change the inline asm constraint to "i" which
tells the compiler the value is a constant. But a better fix is to just
pass a literal 1 into the macro, which bypasses any problems with inline
asm constraints.

Fixes: d4748276ae14 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>

ovl: update documentation w.r.t "xino" feature

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>