Colin Ian King [Tue, 27 Sep 2016 19:13:18 +0000 (12:13 -0700)]
s390/dasd: add missing \n to end of dev_err messages
Trival fix, dev_err messages are missing a \n, so add it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Michael Holzheu [Mon, 26 Sep 2016 17:13:04 +0000 (19:13 +0200)]
s390/config: Enable config options for Docker
The following config options are required/recommended for running Docker:
Networking:
- CONFIG_NF_NAT_MASQUERADE_IPV4=m
- CONFIG_NF_NAT_MASQUERADE_IPV6=m
- CONFIG_IPVLAN=m
- CGROUP_NET_PRIO=y
Storage drivers:
- CONFIG_DM_THIN_PROVISIONING=m
- CONFIG_OVERLAY_FS=m
Scheduling:
- CONFIG_FAIR_GROUP_SCHED=y
- CONFIG_CFS_BANDWIDTH=y
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Stefan Haberland [Thu, 22 Sep 2016 08:49:40 +0000 (10:49 +0200)]
s390/dasd: make query host access interruptible
If the DASD device gets blocked for any reason, e.g. because it is reserved
somewhere, the host_access_count sysfs entry or the host_access_list
debugfs entry may sleep forever. Make it interruptible so that userspace
can use ^C to abort the operation.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Stefan Haberland [Tue, 20 Sep 2016 08:42:38 +0000 (10:42 +0200)]
s390/dasd: fix panic during offline processing
A DASD device consists of the device itself and a discipline with a
corresponding private structure. These fields are set up during online
processing right after the device is created and before it is processed by
the state machine and made available for I/O.
During offline processing the discipline pointer and the private data gets
freed within the state machine and without protection of the existing
reference count. This might lead to a kernel panic because a function might
have taken a device reference and accesses the discipline pointer and/or
private data of the device while this is already freed.
Fix by freeing the discipline pointer and the private data after ensuring
that there is no reference to the device left.
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Stefan Haberland [Tue, 20 Sep 2016 08:29:22 +0000 (10:29 +0200)]
s390/dasd: fix hanging offline processing
Internal I/O is processed by the _sleep_on_function which might wait for a
device to get operational. During offline processing this will never happen
and therefore the refcount of the device will not drop to zero and the
offline processing blocks as well.
Fix by letting requests fail in the _sleep_on function during offline
processing. No further handling of the requests is necessary since this is
internal I/O and the device is thrown away afterwards.
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Thu, 8 Sep 2016 11:25:01 +0000 (13:25 +0200)]
s390/pci_dma: improve lazy flush for unmap
Lazy unmap (defer tlb flush after unmap until dma address reuse) can
greatly reduce the number of RPCIT instructions in the best case. In
reality we are often far away from the best case scenario because our
implementation suffers from the following problem:
To create dma addresses we maintain an iommu bitmap and a pointer into
that bitmap to mark the start of the next search. That pointer moves from
the start to the end of that bitmap and we issue a global tlb flush
once that pointer wraps around. To prevent address reuse before we issue
the tlb flush we even have to move the next pointer during unmaps - when
clearing a bit > next. This could lead to a situation where we only use
the rear part of that bitmap and issue more tlb flushes than expected.
To fix this we no longer clear bits during unmap but maintain a 2nd
bitmap which we use to mark addresses that can't be reused until we issue
the global tlb flush after wrap around.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Mon, 5 Sep 2016 15:49:17 +0000 (17:49 +0200)]
s390/pci_dma: split dma_update_trans
Split dma_update_trans into __dma_update_trans which handles updating
the dma translation tables and __dma_purge_tlb which takes care of
purging associated entries in the dma translation lookaside buffer.
The map_sg API makes use of this split approach by calling
__dma_update_trans once per physically contiguous address range but
__dma_purge_tlb only once per dma contiguous address range.
This results in less invocations of the expensive RPCIT instruction
when using map_sg.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 19 Aug 2016 07:12:09 +0000 (09:12 +0200)]
s390/pci_dma: improve map_sg
Our map_sg implementation mapped sg entries independently of each other.
For ease of use and possible performance improvements this patch changes
the implementation to try to map as many (likely physically non-contiguous)
sglist entries as possible into a contiguous DMA segment.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Wed, 17 Aug 2016 11:51:11 +0000 (13:51 +0200)]
s390/pci_dma: simplify dma address calculation
Simplify the code we use to calculate dma addresses by putting
everything related in a dma_alloc_address function. Also provide
a dma_free_address counterpart.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Wed, 17 Aug 2016 11:39:46 +0000 (13:39 +0200)]
s390/pci_dma: remove dma address range check
We calculate dma addresses using an iommu bitmap. Since commit
69eea95c ("s390/pci_dma: fix DMA table corruption with > 4 TB main memory")
we've made sure that addresses created using that bitmap are below
the maximum reported by firmware. Thus the additional check for
that address to be within range can be removed.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Tue, 23 Aug 2016 13:59:15 +0000 (15:59 +0200)]
iommu/s390: simplify registration of I/O address translation parameters
When a new function is attached to an iommu domain we need to register
I/O address translation parameters. Since commit
69eea95c ("s390/pci_dma: fix DMA table corruption with > 4 TB main memory")
start_dma and end_dma correctly describe the range of usable I/O addresses.
Simplify the code by using these values directly.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Paul Gortmaker [Mon, 19 Sep 2016 21:54:56 +0000 (17:54 -0400)]
s390: migrate exception table users off module.h and onto extable.h
These files were only including module.h for exception table
related functions. We've now separated that content out into its
own file "extable.h" so now move over to that and avoid all the
extra header content in module.h that we don't really need to compile
these files.
The additions of uaccess.h are to deal with implict includes like:
arch/s390/kernel/traps.c: In function 'do_report_trap':
arch/s390/kernel/traps.c:56:4: error: implicit declaration of function 'extable_fixup' [-Werror=implicit-function-declaration]
arch/s390/kernel/traps.c: In function 'illegal_op':
arch/s390/kernel/traps.c:173:3: error: implicit declaration of function 'get_user' [-Werror=implicit-function-declaration]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 16 Sep 2016 15:01:46 +0000 (17:01 +0200)]
s390: export header for CLP ioctl
Export clp.h for usage by userspace.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Sebastian Ott [Fri, 19 Aug 2016 17:57:49 +0000 (19:57 +0200)]
s390/vmur: fix irq pointer dereference in int handler
"irq" in vmur's int handler can be an error pointer. Don't dereference
this pointer in that case.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Stefan Haberland [Wed, 31 Aug 2016 11:31:10 +0000 (13:31 +0200)]
s390/dasd: add missing KOBJ_CHANGE event for unformatted devices
The DASD device driver throws change events for the DASD blockdevice
after the online processing is done so that udev rules can take
actions after it.
The change event was missing for unformatted devices.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Christian Borntraeger [Mon, 12 Sep 2016 12:37:20 +0000 (14:37 +0200)]
s390: enable UBSAN
This enables UBSAN for s390. We have to disable the null sanitizer
as s390 code does access memory via a null pointer (the prefix page).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Christian Borntraeger [Mon, 12 Sep 2016 12:37:19 +0000 (14:37 +0200)]
ubsan: allow to disable the null sanitizer
Some architectures use a hardware defined structure at address zero.
Checking for a null pointer will result in many ubsan reports.
Allow users to disable the null sanitizer.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Masahiro Yamada [Mon, 12 Sep 2016 18:10:39 +0000 (03:10 +0900)]
s390/crashdump: use list_first_entry_or_null
The combo of list_empty() check and return list_first_entry()
can be replaced with list_first_entry_or_null().
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Christian Borntraeger [Mon, 12 Sep 2016 11:13:38 +0000 (13:13 +0200)]
s390: claim efficient unaligned access
most unaligned accesses are reasonable efficient (no kernel emulation)
on s390, let's announce it
This also
- removes the ubsan false positives for unaligned accesses on s390 with
default config
- uses simpler arithmetic in several functions in several other areas
of the kernel like ethernet frame classification
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Tue, 6 Sep 2016 08:46:36 +0000 (10:46 +0200)]
s390: add assembler include path for vx-insn.h
With git commit
0eab11c7e0d30de14a15ccd8269eef238321a8e1
"s390/vx: allow to include vx-insn.h with .include"
and an older gcc we get errors like this:
{standard input}:6: Error: can't open asm/vx-insn.h for reading:
No such file or directory
arch/s390/kernel/fpu.c:57: Error: Unrecognized opcode: `vstm'
To solve this issue simply add the path to arch/s390/include to
all assembler runs.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Colin Ian King [Tue, 6 Sep 2016 07:01:31 +0000 (09:01 +0200)]
s390/crypto: avoid returning garbage value
Static analysis with cppcheck detected that ret is not initialized
and hence garbage is potentially being returned in the case where
prng_data->ppnows.reseed_counter <= prng_reseed_limit.
Thanks to Martin Schwidefsky for spotting a mistake in my original
fix.
Fixes: 0177db01adf26cf9 ("s390/crypto: simplify return code handling")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Bhaktipriya Shridhar [Tue, 30 Aug 2016 20:27:20 +0000 (01:57 +0530)]
s390: Remove deprecated create_singlethread_workqueue
The workqueue "appldata_wq" has been replaced with an ordered dedicated
workqueue.
WQ_MEM_RECLAIM has not been set since the workqueue is not being used on
a memory reclaim path.
The adapter->work_queue queues multiple work items viz
&adapter->scan_work, &port->rport_work, &adapter->ns_up_work,
&adapter->stat_work, adapter->work_queue, &adapter->events.work,
&port->gid_pn_work, &port->test_link_work. Hence, an ordered
dedicated workqueue has been used.
WQ_MEM_RECLAIM has been set to ensure forward progress under memory
pressure.
Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Wed, 31 Aug 2016 07:27:35 +0000 (09:27 +0200)]
RAID/s390: provide raid6 recovery optimization
The XC instruction can be used to improve the speed of the raid6
recovery. The loops now operate on blocks of 256 bytes.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Mon, 15 Aug 2016 13:17:52 +0000 (15:17 +0200)]
s390/crypto: simplify CPACF encryption / decryption functions
The double while loops of the CTR mode encryption / decryption functions
are overly complex for little gain. Simplify the functions to a single
while loop at the cost of an additional memcpy of a few bytes for every
4K page worth of data.
Adapt the other crypto functions to make them all look alike.
Reviewed-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Thu, 18 Aug 2016 10:59:46 +0000 (12:59 +0200)]
s390/crypto: cpacf function detection
The CPACF code makes some assumptions about the availablity of hardware
support. E.g. if the machine supports KM(AES-256) without chaining it is
assumed that KMC(AES-256) with chaining is available as well. For the
existing CPUs this is true but the architecturally correct way is to
check each CPACF functions on its own. This is what the query function
of each instructions is all about.
Reviewed-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Thu, 18 Aug 2016 10:34:34 +0000 (12:34 +0200)]
s390/crypto: simplify init / exit functions
The aes and the des module register multiple crypto algorithms
dependent on the availability of specific CPACF instructions.
To simplify the deregistration with crypto_unregister_alg add
an array with pointers to the successfully registered algorithms
and use it for the error handling in the init function and in
the module exit function.
Reviewed-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Mon, 15 Aug 2016 08:41:52 +0000 (10:41 +0200)]
s390/crypto: simplify return code handling
The CPACF instructions can complete with three different condition codes:
CC=0 for successful completion, CC=1 if the protected key verification
failed, and CC=3 for partial completion.
The inline functions will restart the CPACF instruction for partial
completion, this removes the CC=3 case. The CC=1 case is only relevant
for the protected key functions of the KM, KMC, KMAC and KMCTR
instructions. As the protected key functions are not used by the
current code, there is no need for any kind of return code handling.
Reviewed-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Mon, 15 Aug 2016 07:19:16 +0000 (09:19 +0200)]
s390/crypto: cleanup cpacf function codes
Use a separate define for the decryption modifier bit instead of
duplicating the function codes for encryption / decrypton.
In addition use an unsigned type for the function code.
Reviewed-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Tue, 23 Aug 2016 11:30:24 +0000 (13:30 +0200)]
RAID/s390: add SIMD implementation for raid6 gen/xor
Using vector registers is slightly faster:
raid6: vx128x8 gen() 19705 MB/s
raid6: vx128x8 xor() 11886 MB/s
raid6: using algorithm vx128x8 gen() 19705 MB/s
raid6: .... xor() 11886 MB/s, rmw enabled
vs the software algorithms:
raid6: int64x1 gen() 3018 MB/s
raid6: int64x1 xor() 1429 MB/s
raid6: int64x2 gen() 4661 MB/s
raid6: int64x2 xor() 3143 MB/s
raid6: int64x4 gen() 5392 MB/s
raid6: int64x4 xor() 3509 MB/s
raid6: int64x8 gen() 4441 MB/s
raid6: int64x8 xor() 3207 MB/s
raid6: using algorithm int64x4 gen() 5392 MB/s
raid6: .... xor() 3509 MB/s, rmw enabled
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Mon, 22 Aug 2016 12:40:06 +0000 (14:40 +0200)]
s390/nmi: improve revalidation of fpu / vector registers
The machine check handler will do one of two things if the floating-point
control, a floating point register or a vector register can not be
revalidated:
1) if the PSW indicates user mode the process is terminated
2) if the PSW indicates kernel mode the system is stopped
To unconditionally stop the system for 2) is incorrect.
There are three possible outcomes if the floating-point control, a
floating point register or a vector registers can not be revalidated:
1) The kernel is inside a kernel_fpu_begin/kernel_fpu_end block and
needs the register. The system is stopped.
2) No active kernel_fpu_begin/kernel_fpu_end block and the CIF_CPU bit
is not set. The user space process needs the register and is killed.
3) No active kernel_fpu_begin/kernel_fpu_end block and the CIF_FPU bit
is set. Neither the kernel nor the user space process needs the
lost register. Just revalidate it and continue.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Mon, 22 Aug 2016 10:06:21 +0000 (12:06 +0200)]
s390/fpu: improve kernel_fpu_[begin|end]
In case of nested user of the FPU or vector registers in the kernel
the current code uses the mask of the FPU/vector registers of the
previous contexts to decide which registers to save and restore.
E.g. if the previous context used KERNEL_VXR_V0V7 and the next
context wants to use KERNEL_VXR_V24V31 the first 8 vector registers
are stored to the FPU state structure. But this is not necessary
as the next context does not use these registers.
Rework the FPU/vector register save and restore code. The new code
does a few things differently:
1) A lowcore field is used instead of a per-cpu variable.
2) The kernel_fpu_end function now has two parameters just like
kernel_fpu_begin. The register flags are required by both
functions to save / restore the minimal register set.
3) The inline functions kernel_fpu_begin/kernel_fpu_end now do the
update of the register masks. If the user space FPU registers
have already been stored neither save_fpu_regs nor the
__kernel_fpu_begin/__kernel_fpu_end functions have to be called
for the first context. In this case kernel_fpu_begin adds 7
instructions and kernel_fpu_end adds 4 instructions.
3) The inline assemblies in __kernel_fpu_begin / __kernel_fpu_end
to save / restore the vector registers are simplified a bit.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Thu, 25 Aug 2016 08:40:19 +0000 (10:40 +0200)]
s390/vx: allow to include vx-insn.h with .include
To make the vx-insn.h more versatile avoid cpp preprocessor macros
and allow to use plain numbers for vector and general purpose register
operands. With that you can emit an .include from a C file into the
assembler text and then use the vx-insn macros in inline assemblies.
For example:
asm (".include \"asm/vx-insn.h\"");
static inline void xor_vec(int x, int y, int z)
{
asm volatile("VX %0,%1,%2"
: : "i" (x), "i" (y), "i" (z));
}
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
David Hildenbrand [Mon, 18 Jul 2016 15:10:17 +0000 (17:10 +0200)]
s390/time: avoid races when updating tb_update_count
The increment might not be atomic and we're not holding the
timekeeper_lock. Therefore we might lose an update to count, resulting in
VDSO being trapped in a loop. As other archs also simply update the
values and count doesn't seem to have an impact on reloading of these
values in VDSO code, let's just remove the update of tb_update_count.
Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
David Hildenbrand [Thu, 14 Jul 2016 12:46:56 +0000 (14:46 +0200)]
s390/time: fixup the clock comparator on all cpus
By leaving fixup_cc unset, only the clock comparator of the cpu actually
doing the sync is fixed up until now.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
David Hildenbrand [Thu, 14 Jul 2016 11:38:06 +0000 (13:38 +0200)]
s390/time: cleanup etr leftovers
There are still some etr leftovers and wrong comments, let's clean that up.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
David Hildenbrand [Thu, 14 Jul 2016 11:09:57 +0000 (13:09 +0200)]
s390/time: simplify stp time syncs
The way we call do_adjtimex() today is broken. It has 0 effect, as
ADJ_OFFSET_SINGLESHOT (0x0001) in the kernel maps to !ADJ_ADJTIME
(in contrast to user space where it maps to ADJ_OFFSET_SINGLESHOT |
ADJ_ADJTIME - 0x8001). !ADJ_ADJTIME will silently ignore all adjustments
without STA_PLL being active. We could switch to ADJ_ADJTIME or turn
STA_PLL on, but still we would run into some problems:
- Even when switching to nanoseconds, we lose accuracy.
- Successive calls to do_adjtimex() will simply overwrite any leftovers
from the previous call (if not fully handled)
- Anything that NTP does using the sysctl heavily interferes with our
use.
- !ADJ_ADJTIME will silently round stuff > or < than 0.5 seconds
Reusing do_adjtimex() here just feels wrong. The whole STP synchronization
works right now *somehow* only, as do_adjtimex() does nothing and our
TOD clock jumps in time, although it shouldn't. This is especially bad
as the clock could jump backwards in time. We will have to find another
way to fix this up.
As leap seconds are also not properly handled yet, let's just get rid of
all this complex logic altogether and use the correct clock_delta for
fixing up the clock comparator and keeping the sched_clock monotonic.
This change should have 0 effect on the current STP mechanism. Once we
know how to best handle sync events and leap second updates, we'll start
with a fresh implementation.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Fri, 26 Aug 2016 14:27:15 +0000 (16:27 +0200)]
Merge branch 's390forkvm' of git://git./linux/kernel/git/kvms390/linux
Pull facility mask patch from the KVM tree.
* tag 's390forkvm' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
KVM: s390: generate facility mask from readable list
Heiko Carstens [Tue, 16 Aug 2016 08:31:10 +0000 (10:31 +0200)]
KVM: s390: generate facility mask from readable list
Automatically generate the KVM facility mask out of a readable list.
Manually changing the masks is very error prone, especially if the
special IBM bit numbering has to be considered.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Markus Elfring [Sat, 20 Aug 2016 17:25:34 +0000 (19:25 +0200)]
s390/tape: Use memdup_user() rather than duplicating its implementation
Reuse existing functionality from memdup_user() instead of keeping
duplicate source code.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Mon, 18 Jul 2016 12:05:21 +0000 (14:05 +0200)]
s390/pci: add zpci_report_error interface
The 'report_error' interface for PCI devices found on s390 can be
used by a user space program to inject an adapter error notification.
Add a new kernel interface zpci_report_error to allow a PCI device
driver to inject these error notifications without a detour over
user space.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Dong Jia Shi [Mon, 8 Aug 2016 02:27:15 +0000 (04:27 +0200)]
s390: cio: remove redundant cio_cancel declaration
cio_cancel was declared twice. Remove one of them.
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Tue, 14 Jun 2016 10:41:35 +0000 (12:41 +0200)]
s390/mm: merge local / non-local IDTE helper
Merge the __p[m|u]xdp_idte and __p[m|u]dp_idte_local functions into a
single __p[m|u]dp_idte function with an additional parameter.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Tue, 14 Jun 2016 10:38:40 +0000 (12:38 +0200)]
s390/mm: merge local / non-local IPTE helper
Merge the __ptep_ipte and __ptep_ipte_local functions into a single
__ptep_ipte function with an additional parameter. The __pte_ipte_range
function is still extra as the while loops makes it hard to merge.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Mon, 13 Jun 2016 08:36:00 +0000 (10:36 +0200)]
s390/mm,kvm: flush gmap address space with IDTE
The __tlb_flush_mm() helper uses a global flush if the mm struct
has a gmap structure attached to it. Replace the global flush with
two individual flushes by means of the IDTE instruction if only a
single gmap is attached the the mm.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Martin Schwidefsky [Fri, 10 Jun 2016 08:56:44 +0000 (10:56 +0200)]
s390/mm: no local TLB flush for clearing-by-ASCE IDTE
The local-clearing control of the IDTE instruction does not have any effect
for the clearing-by-ASCE operation. Only the invalidation-and-clearing
operation respects the local-clearing bit.
Remove __tlb_flush_idte_local and simplify the batched TLB flushing code.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Linus Torvalds [Wed, 24 Aug 2016 00:24:27 +0000 (20:24 -0400)]
Merge tag 'for-f2fs-v4.8-rc4' of git://git./linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
- fsmark regression
- i_size race condition
- wrong conditions in f2fs_move_file_range
* tag 'for-f2fs-v4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: avoid potential deadlock in f2fs_move_file_range
f2fs: allow copying file range only in between regular files
Revert "f2fs: move i_size_write in f2fs_write_end"
Revert "f2fs: use percpu_rw_semaphore"
Linus Torvalds [Tue, 23 Aug 2016 18:32:38 +0000 (14:32 -0400)]
Merge tag 'usercopy-v4.8-rc4' of git://git./linux/kernel/git/kees/linux
Pull hardened usercopy fixes from Kees Cook:
- avoid signed math problems on unexpected compilers
- avoid false positives at very end of kernel text range checks
* tag 'usercopy-v4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
usercopy: fix overlap check for kernel text
usercopy: avoid potentially undefined behavior in pointer math
Linus Torvalds [Tue, 23 Aug 2016 18:29:00 +0000 (14:29 -0400)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a number of memory corruption bugs in the newly added
sha256-mb/sha256-mb code"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sha512-mb - fix ctx pointer
crypto: sha256-mb - fix ctx pointer and digest copy
Josh Poimboeuf [Mon, 22 Aug 2016 16:53:59 +0000 (11:53 -0500)]
usercopy: fix overlap check for kernel text
When running with a local patch which moves the '_stext' symbol to the
very beginning of the kernel text area, I got the following panic with
CONFIG_HARDENED_USERCOPY:
usercopy: kernel memory exposure attempt detected from
ffff88103dfff000 (<linear kernel text>) (4096 bytes)
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:79!
invalid opcode: 0000 [#1] SMP
...
CPU: 0 PID: 4800 Comm: cp Not tainted 4.8.0-rc3.after+ #1
Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.5.4 01/22/2016
task:
ffff880817444140 task.stack:
ffff880816274000
RIP: 0010:[<
ffffffff8121c796>] __check_object_size+0x76/0x413
RSP: 0018:
ffff880816277c40 EFLAGS:
00010246
RAX:
000000000000006b RBX:
ffff88103dfff000 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
ffff88081f80dfa8 RDI:
ffff88081f80dfa8
RBP:
ffff880816277c90 R08:
000000000000054c R09:
0000000000000000
R10:
0000000000000005 R11:
0000000000000006 R12:
0000000000001000
R13:
ffff88103e000000 R14:
ffff88103dffffff R15:
0000000000000001
FS:
00007fb9d1750800(0000) GS:
ffff88081f800000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00000000021d2000 CR3:
000000081a08f000 CR4:
00000000001406f0
Stack:
ffff880816277cc8 0000000000010000 000000043de07000 0000000000000000
0000000000001000 ffff880816277e60 0000000000001000 ffff880816277e28
000000000000c000 0000000000001000 ffff880816277ce8 ffffffff8136c3a6
Call Trace:
[<
ffffffff8136c3a6>] copy_page_to_iter_iovec+0xa6/0x1c0
[<
ffffffff8136e766>] copy_page_to_iter+0x16/0x90
[<
ffffffff811970e3>] generic_file_read_iter+0x3e3/0x7c0
[<
ffffffffa06a738d>] ? xfs_file_buffered_aio_write+0xad/0x260 [xfs]
[<
ffffffff816e6262>] ? down_read+0x12/0x40
[<
ffffffffa06a61b1>] xfs_file_buffered_aio_read+0x51/0xc0 [xfs]
[<
ffffffffa06a6692>] xfs_file_read_iter+0x62/0xb0 [xfs]
[<
ffffffff812224cf>] __vfs_read+0xdf/0x130
[<
ffffffff81222c9e>] vfs_read+0x8e/0x140
[<
ffffffff81224195>] SyS_read+0x55/0xc0
[<
ffffffff81003a47>] do_syscall_64+0x67/0x160
[<
ffffffff816e8421>] entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:[<
00007fb9d0c33c00>] 0x7fb9d0c33c00
RSP: 002b:
00007ffc9c262f28 EFLAGS:
00000246 ORIG_RAX:
0000000000000000
RAX:
ffffffffffffffda RBX:
fffffffffff8ffff RCX:
00007fb9d0c33c00
RDX:
0000000000010000 RSI:
00000000021c3000 RDI:
0000000000000004
RBP:
00000000021c3000 R08:
0000000000000000 R09:
00007ffc9c264d6c
R10:
00007ffc9c262c50 R11:
0000000000000246 R12:
0000000000010000
R13:
00007ffc9c2630b0 R14:
0000000000000004 R15:
0000000000010000
Code: 81 48 0f 44 d0 48 c7 c6 90 4d a3 81 48 c7 c0 bb b3 a2 81 48 0f 44 f0 4d 89 e1 48 89 d9 48 c7 c7 68 16 a3 81 31 c0 e8 f4 57 f7 ff <0f> 0b 48 8d 90 00 40 00 00 48 39 d3 0f 83 22 01 00 00 48 39 c3
RIP [<
ffffffff8121c796>] __check_object_size+0x76/0x413
RSP <
ffff880816277c40>
The checked object's range [
ffff88103dfff000,
ffff88103e000000) is
valid, so there shouldn't have been a BUG. The hardened usercopy code
got confused because the range's ending address is the same as the
kernel's text starting address at 0xffff88103e000000. The overlap check
is slightly off.
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Eric Biggers [Fri, 19 Aug 2016 19:15:22 +0000 (12:15 -0700)]
usercopy: avoid potentially undefined behavior in pointer math
check_bogus_address() checked for pointer overflow using this expression,
where 'ptr' has type 'const void *':
ptr + n < ptr
Since pointer wraparound is undefined behavior, gcc at -O2 by default
treats it like the following, which would not behave as intended:
(long)n < 0
Fortunately, this doesn't currently happen for kernel code because kernel
code is compiled with -fno-strict-overflow. But the expression should be
fixed anyway to use well-defined integer arithmetic, since it could be
treated differently by different compilers in the future or could be
reported by tools checking for undefined behavior.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Linus Torvalds [Mon, 22 Aug 2016 22:53:02 +0000 (17:53 -0500)]
Merge tag 'arc-4.8-rc4-fixes' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- support for Syscall ABI v4 with upstream gcc 6.x
- lockdep fix (Daniel Mentz)
- gdb register clobber (Liav Rehana)
- couple of missing exports for modules
- other fixes here and there
* tag 'arc-4.8-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: export __udivdi3 for modules
ARC: mm: fix build breakage with STRICT_MM_TYPECHECKS
ARC: export kmap
ARC: Support syscall ABI v4
ARC: use correct offset in pt_regs for saving/restoring user mode r25
ARC: Elide redundant setup of DMA callbacks
ARC: Call trace_hardirqs_on() before enabling irqs
Linus Torvalds [Mon, 22 Aug 2016 22:51:21 +0000 (17:51 -0500)]
Merge tag 'gpio-v4.8-2' of git://git./linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Here are a few GPIO fixes for v4.8.
I was expecting some fallout from the new chardev rework but nothing
like that turned up att all. Instead a Kconfig confusion that I think
I have finally nailed, then some ordinary driver noise and trivia.
This fixes a Kconfig issue with UM: when I made GPIOLIB available to
all archs, that included UM, but the OF part of GPIOLIB requires
HAS_IOMEM, so we add HAS_IOMEM as a dependency to OF_GPIO.
This in turn exposed the fact that a few GPIO drivers were implicitly
assuming OF_GPIO as their dependency but instead depended on OF alone
(the typical problem being a pointer inside gpio_chip not existing
unless OF_GPIO is selected) and then UM would fail to compile with
these drivers instead. Then I lost patience and made any GPIO driver
depending on just OF depend on OF_GPIO instead, that is certainly what
they meant and the only thing that makes sense anyway. GPIO with just
OF but !OF_GPIO does not make sense.
Also a fix for the max730x driver data pointer, and a minor comment
fix for the GPIO tools"
* tag 'gpio-v4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: make any OF dependent driver depend on OF_GPIO
gpio: Fix OF build problem on UM
gpio: max730x: set gpiochip data pointer before using it
tools/gpio: fix gpio-event-mon header comment
Linus Torvalds [Sun, 21 Aug 2016 23:14:10 +0000 (16:14 -0700)]
Linux 4.8-rc3
Linus Torvalds [Sun, 21 Aug 2016 21:28:24 +0000 (14:28 -0700)]
Merge branch 'parisc-4.8-2' of git://git./linux/kernel/git/deller/parisc-linux
Pull two parisc fixes from Helge Deller:
"The first patch ensures that the high-res cr16 clocksource (which was
added in kernel 4.7) gets choosen as default clocksource for parisc.
The second patch moves the #define of EREFUSED down inside errno.h and
thus unbreaks building the gccgo compiler"
* 'parisc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix order of EREFUSED define in errno.h
parisc: Fix automatic selection of cr16 clocksource
Tony Luck [Sat, 20 Aug 2016 23:27:58 +0000 (16:27 -0700)]
EDAC, skx_edac: Add EDAC driver for Skylake
This is an entirely new driver instead of yet another set of patches
to sb_edac.c because:
1) Mapping from PCI devices to socket/memory controller is significantly
different. Skylake scatters devices on a socket across a number of
PCI buses.
2) There is an extra level of interleaving via the "mcroute" register
that would be a little messy to squeeze into the old driver.
3) Validation is getting too expensive. Changes to sb_edac need to
be checked against Sandy Bridge, Ivy Bridge, Haswell, Broadwell and
Knights Landing.
Acked-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Helge Deller [Sat, 20 Aug 2016 09:51:38 +0000 (11:51 +0200)]
parisc: Fix order of EREFUSED define in errno.h
When building gccgo in userspace, errno.h gets parsed and the go include file
sysinfo.go is generated.
Since EREFUSED is defined to the same value as ECONNREFUSED, and ECONNREFUSED
is defined later on in errno.h, this leads to go complaining that EREFUSED
isn't defined yet.
Fix this trivial problem by moving the define of EREFUSED down after
ECONNREFUSED in errno.h (and clean up the indenting while touching this line).
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Helge Deller [Fri, 19 Aug 2016 20:39:02 +0000 (22:39 +0200)]
parisc: Fix automatic selection of cr16 clocksource
Commit
54b66800907 (parisc: Add native high-resolution sched_clock()
implementation) added support to use the CPU-internal cr16 counters as reliable
clocksource with the help of HAVE_UNSTABLE_SCHED_CLOCK.
Sadly the commit missed to remove the hack which prevented cr16 to become the
default clocksource even on SMP systems.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.7+
Vineet Gupta [Fri, 19 Aug 2016 20:59:02 +0000 (13:59 -0700)]
ARC: export __udivdi3 for modules
Some module using div_u64() was failing to link because the libgcc 64-bit
divide assist routine was not being exported for modules
Reported-by: avinashp@quantenna.com
Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Linus Torvalds [Fri, 19 Aug 2016 19:47:01 +0000 (12:47 -0700)]
Make the hardened user-copy code depend on having a hardened allocator
The kernel test robot reported a usercopy failure in the new hardened
sanity checks, due to a page-crossing copy of the FPU state into the
task structure.
This happened because the kernel test robot was testing with SLOB, which
doesn't actually do the required book-keeping for slab allocations, and
as a result the hardening code didn't realize that the task struct
allocation was one single allocation - and the sanity checks fail.
Since SLOB doesn't even claim to support hardening (and you really
shouldn't use it), the straightforward solution is to just make the
usercopy hardening code depend on the allocator supporting it.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 19 Aug 2016 19:10:06 +0000 (12:10 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has some pretty standard driver bugfixes and one minor cleanup"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: meson: Use complete() instead of complete_all()
i2c: brcmstb: Use complete() instead of complete_all()
i2c: bcm-kona: Use complete() instead of complete_all()
i2c: bcm-iproc: Use complete() instead of complete_all()
i2c: at91: fix support of the "alternative command" feature
i2c: ocores: add missed clk_disable_unprepare() on failure paths
i2c: cros-ec-tunnel: Fix usage of cros_ec_cmd_xfer()
i2c: mux: demux-pinctrl: properly roll back when adding adapter fails
Vineet Gupta [Wed, 17 Aug 2016 01:27:07 +0000 (18:27 -0700)]
ARC: mm: fix build breakage with STRICT_MM_TYPECHECKS
| CC mm/memory.o
| In file included from ../mm/memory.c:53:0:
| ../include/linux/pfn_t.h: In function ‘pfn_t_pte’:
| ../include/linux/pfn_t.h:78:2: error: conversion to non-scalar type requested
| return pfn_pte(pfn_t_to_pfn(pfn), pgprot);
With STRICT_MM_TYPECHECKS pte_t is a struct and the offending code
forces a cast which ends up shifting a struct and hence the gcc warning.
Note that in recent past some of the arches (aarch64, s390) made
STRICT_MM_TYPECHECKS default, but we don't for ARC as this leads to slightly
worse generated code, given ARC ABI definition of returning structs
(which pte_t would become)
Quoting from ARC ABI...
"Results of type struct are returned in a caller-supplied temporary
variable whose address is passed in r0.
For such functions, the arguments are shifted so that they are
passed in r1 and up."
So
- struct to be returned would be allocated on stack requiring extra
code at call sites
- callee updates stack memory to facilitate the return (vs. simple
MOV into return reg r0)
Hence STRICT_MM_TYPECHECKS is not enabled by default for ARC
Cc: <stable@vger.kernel.org> #4.4+
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Vineet Gupta [Thu, 18 Aug 2016 00:34:46 +0000 (17:34 -0700)]
ARC: export kmap
| MODPOST 7 modules
| ERROR: "kmap" [fs/ext2/ext2.ko] undefined!
| ../scripts/Makefile.modpost:91: recipe for target '__modpost' failed
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Vineet Gupta [Wed, 10 Aug 2016 21:10:57 +0000 (14:10 -0700)]
ARC: Support syscall ABI v4
The syscall ABI includes the gcc functional calling ABI since a syscall
implies userland caller and kernel callee.
The current gcc ABI (v3) for ARCv2 ISA required 64-bit data be passed in
even-odd register pairs, (potentially punching reg holes when passing such
values as args). This was partly driven by the fact that the double-word
LDD/STD instructions in ARCv2 expect the register alignment and thus gcc
forcing this avoids extra MOV at the cost of a few unused register (which we
have plenty anyways).
This however was rejected as part of upstreaming gcc port to HS. So the new
ABI v4 doesn't enforce the even-odd reg restriction.
Do note that for ARCompact ISA builds v3 and v4 are practically the same in
terms of gcc code generation.
In terms of change management, we infer the new ABI if gcc 6.x onwards
is used for building the kernel.
This also needs a stable backport to enable older kernels to work with
new tools/user-space
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Liav Rehana [Tue, 16 Aug 2016 07:55:35 +0000 (10:55 +0300)]
ARC: use correct offset in pt_regs for saving/restoring user mode r25
User mode callee regs are explicitly collected before signal delivery or
breakpoint trap. r25 is special for kernel as it serves as task pointer,
so user mode value is clobbered very early. It is saved in pt_regs where
generally only scratch (aka caller saved) regs are saved.
The code to access the corresponding pt_regs location had a subtle bug as
it was using load/store with scaling of offset, whereas the offset was already
byte wise correct. So fix this by replacing LD.AS with a standard LD
Cc: <stable@vger.kernel.org>
Signed-off-by: Liav Rehana <liavr@mellanox.com>
Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta: rewrote title and commit log]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Linus Torvalds [Fri, 19 Aug 2016 16:32:48 +0000 (09:32 -0700)]
Merge tag 'dm-4.8-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a stable fix for DM round robin multipath path selector to disable
preemption before using this_cpu_ptr()
- a slight increase in DM crypt's mempool reserves to make swap ontop
of DM crypt more performant
- a few DM raid fixes to issues found while testing changes that were
merged in v4.8-rc1
* tag 'dm-4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm raid: support raid0 with missing metadata devices
dm raid: enhance attempt_restore_of_faulty_devices() to support more devices
dm raid: fix restoring of failed devices regression
dm raid: fix frozen recovery regression
dm crypt: increase mempool reserve to better support swapping
dm round robin: do not use this_cpu_ptr() without having preemption disabled
Linus Torvalds [Fri, 19 Aug 2016 16:22:50 +0000 (09:22 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six fairly small fixes. The ipr, mpt3sas and ses ones all trigger
oopses. The megaraid one fixes an attach failure on io mapped only
cards, the fcoe one is an obvious problem in the error path and the
aacraid one is a theoretical security issue (ability to trick the
kernel into a buffer overrun)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
ses: Fix racy cleanup of /sys in remove_dev()
mpt3sas: Fix resume on WarpDrive flash cards
ipr: Fix sync scsi scan
megaraid_sas: Fix probing cards without io port
aacraid: Check size values after double-fetch from user
fcoe: Use kfree_skb() instead of kfree()
Linus Torvalds [Fri, 19 Aug 2016 16:21:24 +0000 (09:21 -0700)]
Merge tag 'usb-4.8-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of USB fixes for reported issues for your tree.
The normal amount of gadget fixes, xhci fixes, new device ids, and a
few other minor things. All of them have been in linux-next for a
while, the full details are in the shortlog below"
* tag 'usb-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (43 commits)
xhci: don't dereference a xhci member after removing xhci
usb: xhci: Fix panic if disconnect
xhci: really enqueue zero length TRBs.
xhci: always handle "Command Ring Stopped" events
cdc-acm: fix wrong pipe type on rx interrupt xfers
usb: misc: usbtest: add fix for driver hang
usb: dwc3: gadget: stop processing on HWO set
usb: dwc3: don't set last bit for ISOC endpoints
usb: gadget: rndis: free response queue during REMOTE_NDIS_RESET_MSG
usb: udc: core: fix error handling
usb: gadget: fsl_qe_udc: off by one in setup_received_handle()
usb/gadget: fix gadgetfs aio support.
usb: gadget: composite: Fix return value in case of error
usb: gadget: uvc: Fix return value in case of error
usb: gadget: fix check in sync read from ep in gadgetfs
usb: misc: usbtest: usbtest_do_ioctl may return positive integer
usb: dwc3: fix missing platform_set_drvdata() in dwc3_of_simple_probe()
usb: phy: omap-otg: Fix missing platform_set_drvdata() in omap_otg_probe()
usb: gadget: configfs: add mutex lock before unregister gadget
usb: gadget: u_ether: fix dereference after null check coverify warning
...
Linus Torvalds [Fri, 19 Aug 2016 16:06:41 +0000 (09:06 -0700)]
Merge tag 'xfs-iomap-for-linus-4.8-rc3' of git://git./linux/kernel/git/dgc/linux-xfs
Pull xfs and iomap fixes from Dave Chinner:
"Changes in this update:
Regression fixes for XFS changes introduce in 4.8-rc1:
- buffer IO accounting assert failure
- ENOSPC block accounting reservation issue
- DAX IO path page cache invalidation fix
- rmapbt on-disk block count in agf
- correct classification of rmap block type when updating AGFL.
- iomap support for attribute fork mapping
Regression fixes for iomap infrastructure in 4.8-rc1:
- fiemap: honor FIEMAP_FLAG_SYNC
- fiemap: implement FIEMAP_FLAG_XATTR support to fix XFS regression
- make mark_page_accessed and pagefault_disable usage consistent with
other IO paths"
* tag 'xfs-iomap-for-linus-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: remove OWN_AG rmap when allocating a block from the AGFL
xfs: (re-)implement FIEMAP_FLAG_XATTR
xfs: simplify xfs_file_iomap_begin
iomap: mark ->iomap_end as optional
iomap: prepare iomap_fiemap for attribute mappings
iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag
iomap: remove superflous pagefault_disable from iomap_write_actor
iomap: remove superflous mark_page_accessed from iomap_write_actor
xfs: store rmapbt block count in the AGF
xfs: don't invalidate whole file on DAX read/write
xfs: fix bogus space reservation in xfs_iomap_write_allocate
xfs: don't assert fail on non-async buffers on ioacct decrement
Linus Torvalds [Fri, 19 Aug 2016 15:52:17 +0000 (08:52 -0700)]
Merge tag 'hwmon-for-linus-v4.8-rc2' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix a bug in it87 driver and URLs in ftsteutates driver"
* tag 'hwmon-for-linus-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ftsteutates) Correct ftp urls in driver documentation
hwmon: (it87) Features mask must be 32 bit wide
Linus Walleij [Tue, 16 Aug 2016 10:07:31 +0000 (12:07 +0200)]
gpio: make any OF dependent driver depend on OF_GPIO
The drivers that depend on OF but not OF_GPIO are wreaking havoc
with the autobuilders for archs that have all requirements for
OF but not for OF_GPIO, particularly the UM (Usermode) arch does
not have iomem (NO_IOMEM) which result in configuring GPIOLIB but
without OF_GPIO which is wrong if the driver is using the .of_node
of the gpiochip, which only appears with OF_GPIO.
After a brief look at the drivers just depending on OF it seems
most if not all of them actually require stuff from gpiolib-of so
the dependency is wrong in the first place.
This simply patches the Kconfig so that all GPIO drivers using OF
depend on OF_GPIO rather than just OF.
Cc: Rabin Vincent <rabin@rab.in>
Cc: Pramod Gurav <pramod.gurav@smartplayin.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Laxman Dewangan <ldewangan@nvidia.com>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Phil Reid <preid@electromag.com.au>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Walleij [Tue, 16 Aug 2016 07:58:25 +0000 (09:58 +0200)]
gpio: Fix OF build problem on UM
The UserMode (UM) Linux build was failing in gpiolib-of as it requires
ioremap()/iounmap() to exist, which is absent from UM. The non-existence
of IO memory is negatively defined as CONFIG_NO_IOMEM which means we
need to depend on HAS_IOMEM.
Cc: stable@vger.kernel.org
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Torvalds [Fri, 19 Aug 2016 02:38:18 +0000 (19:38 -0700)]
Merge tag 'drm-fixes-for-4.8-rc3-2' of git://people.freedesktop.org/~airlied/linux
Pull more drm fixes from Dave Airlie:
"Daniel pointed out I'd missed some i915 fixes, and I also found a
single etnaviv fix I missed.
So here they are"
* tag 'drm-fixes-for-4.8-rc3-2' of git://people.freedesktop.org/~airlied/linux:
drm/etnaviv: take GPU lock later in the submit process
drm/i915: Fix modeset handling during gpu reset, v5.
drm/i915: fix aliasing_ppgtt leak
drm/i915: fix WaInsertDummyPushConstPs
drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2
drm/i915/gen9: Give one extra block per line for SKL plane WM calculations
drm/i915: Acquire audio powerwell for HD-Audio registers
drm/i915: Add missing rpm wakelock to GGTT pread
drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake
drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0
drm/i915: Program iboost settings for HDMI/DVI on SKL
drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
drm/i915: Handle ENOSPC after failing to insert a mappable node
drm/i915: Flush GT idle status upon reset
Linus Torvalds [Fri, 19 Aug 2016 02:31:08 +0000 (19:31 -0700)]
Merge tag 'devicetree-fixes-for-4.8' of git://git./linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
- a couple of DT node ref counting fixes
- fix __unflatten_device_tree for PPC PCI hotplug case
- rework marking irq controllers as OF_POPULATED in cases where real
driver is used.
- disable of_platform_default_populate_init on PPC. The change in
initcall order causes problems which need to be sorted out later.
* tag 'devicetree-fixes-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: fix reference counting in of_graph_get_endpoint_by_regs
of/platform: disable the of_platform_default_populate_init() for all the ppc boards
ARM: imx6: mark GPC node as not populated after irq init to probe pm domain driver
of/irq: Mark interrupt controllers as populated before initialisation
drivers/of: Validate device node in __unflatten_device_tree()
of: Delete an unnecessary check before the function call "of_node_put"
Chao Yu [Thu, 4 Aug 2016 12:13:03 +0000 (20:13 +0800)]
f2fs: avoid potential deadlock in f2fs_move_file_range
Thread A Thread B
- inode_lock fileA
- inode_lock fileB
- inode_lock fileA
- inode_lock fileB
We may encounter above potential deadlock during moving file range in
concurrent scenario. This patch fixes the issue by using inode_trylock
instead.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Thu, 4 Aug 2016 12:13:02 +0000 (20:13 +0800)]
f2fs: allow copying file range only in between regular files
Only if two input files are regular files, we allow copying data in
range of them, otherwise, deny it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Chao Yu [Sat, 6 Aug 2016 13:09:41 +0000 (21:09 +0800)]
Revert "f2fs: move i_size_write in f2fs_write_end"
This reverts commit
a2ee0a300344a6da76186129b078113354fe13d2.
When testing with generic/032 of xfstest suit, failure message will be
reported as below:
generic/032 8s ... [failed, exit status 1] - output mismatch (see results/generic/032.out.bad)
--- tests/generic/032.out 2015-01-11 16:52:27.
643681072 +0800
+++ results/generic/032.out.bad 2016-08-06 13:44:43.
861330500 +0800
@@ -1,5 +1,5 @@
QA output created by 032
-100 iterations
-
0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
-*
-
0100000
+1: [768..775]: unwritten
+Unwritten extents found!
...
(Run 'diff -u tests/generic/032.out results/generic/032.out.bad' to see the entire diff)
Ran: generic/032
Failures: generic/032
Failed 1 of 1 tests
In write_end(), we should update i_size of inode before unlock page,
otherwise, we will lose newly updated data in following race condition.
Thread A Thread B
- write_end
- unlock page
- writepages
- lock_page
- writepage
if page is out-of-range of file size,
we will skip writting the page.
- update i_size
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Jaegeuk Kim [Thu, 4 Aug 2016 18:38:25 +0000 (11:38 -0700)]
Revert "f2fs: use percpu_rw_semaphore"
LKP reported -36.3% regression of fsmark.files_per_sec due to this patch.
I've confirmed that fxmark [1] has also slight regression for DWAL.
[1] https://github.com/sslab-gatech/fxmark
This reverts commit
ec795418c41850056feb956534edf059dc1155d4.
Linus Torvalds [Fri, 19 Aug 2016 01:54:40 +0000 (18:54 -0700)]
Merge tag '4.8-doc-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"Three small fixes for Sphinx-formatted documentation generation"
* tag '4.8-doc-fixes' of git://git.lwn.net/linux:
doc-rst: customize RTD theme, drop padding of inline literal
docs: kernel-documentation: remove some highlight directives
docs: Set the Sphinx default highlight language to "guess"
Dave Airlie [Thu, 18 Aug 2016 22:51:13 +0000 (08:51 +1000)]
Merge tag 'drm-intel-fixes-2016-08-15' of git://anongit.freedesktop.org/drm-intel into drm-fixes
Collection of i915 fixes.
* tag 'drm-intel-fixes-2016-08-15' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Fix modeset handling during gpu reset, v5.
drm/i915: fix aliasing_ppgtt leak
drm/i915: fix WaInsertDummyPushConstPs
drm/i915: Fix iboost setting for SKL Y/U DP DDI buffer translation entry 2
drm/i915/gen9: Give one extra block per line for SKL plane WM calculations
drm/i915: Acquire audio powerwell for HD-Audio registers
drm/i915: Add missing rpm wakelock to GGTT pread
drm/i915/fbc: FBC causes display flicker when VT-d is enabled on Skylake
drm/i915: Clean up the extra RPM ref on CHV with i915.enable_rc6=0
drm/i915: Program iboost settings for HDMI/DVI on SKL
drm/i915: Fix iboost setting for DDI with 4 lanes on SKL
drm/i915: Handle ENOSPC after failing to insert a mappable node
drm/i915: Flush GT idle status upon reset
Dave Airlie [Thu, 18 Aug 2016 22:50:42 +0000 (08:50 +1000)]
Merge branch 'drm-etnaviv-fixes' of git://git.pengutronix.de/git/lst/linux into drm-fixes
Single GPU recovery fix
* 'drm-etnaviv-fixes' of git://git.pengutronix.de/git/lst/linux:
drm/etnaviv: take GPU lock later in the submit process
Linus Torvalds [Thu, 18 Aug 2016 22:09:41 +0000 (15:09 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"An initrd microcode loading fix, and an SMP bootup topology setup fix
to resolve crashes on SGI/UV systems if the BIOS is configured in a
certain way"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Fix __max_logical_packages value setup
x86/microcode/AMD: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
Linus Torvalds [Thu, 18 Aug 2016 22:08:31 +0000 (15:08 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Three clocksource driver fixes"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/mips-gic-timer: Make gic_clocksource_of_init() return int
clocksource/drivers/kona: Fix get_counter() error handling
clocksource/drivers/time-armada-370-xp: Fix the clock reference
Linus Torvalds [Thu, 18 Aug 2016 22:07:21 +0000 (15:07 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two cputime fixes - hopefully the last ones"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cputime: Resync steal time when guest & host lose sync
sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
Linus Torvalds [Thu, 18 Aug 2016 22:04:53 +0000 (15:04 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, but also start/stop filter related fixes, a perf
event read() fix, a fix uncovered by fuzzing, and an uprobes leak fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Check return value of the perf_event_read() IPI
perf/core: Enable mapping of the stop filters
perf/core: Update filters only on executable mmap
perf/core: Fix file name handling for start/stop filters
perf/core: Fix event_function_local()
uprobes: Fix the memcg accounting
perf intel-pt: Fix occasional decoding errors when tracing system-wide
tools: Sync kvm related header files for arm64 and s390
perf probe: Release resources on error when handling exit paths
perf probe: Check for dup and fdopen failures
perf symbols: Fix annotation of objects with debuginfo files
perf script: Don't disable use_callchain if input is pipe
perf script: Show proper message when failed list scripts
perf jitdump: Add the right header to get the major()/minor() definitions
perf ppc64le: Fix build failure when libelf is not present
perf tools mem: Fix -t store option for record command
perf intel-pt: Fix ip compression
Linus Torvalds [Thu, 18 Aug 2016 20:45:48 +0000 (13:45 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Two lockless_dereference() related fixes"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/barriers: Suppress sparse warnings in lockless_dereference()
Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"
Linus Torvalds [Thu, 18 Aug 2016 18:17:13 +0000 (11:17 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Avoid a literal load with the MMU off on the CPU resume path
(potential inconsistency between cache and RAM)
- Build error with CONFIG_ACPI=n fixed
- Compiler warning in the arch/arm64/mm/dump.c code fixed
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix shift warning in arch/arm64/mm/dump.c
arm64: kernel: avoid literal load of virtual address with MMU off
arm64: Fix NUMA build error when !CONFIG_ACPI
Linus Torvalds [Thu, 18 Aug 2016 18:13:20 +0000 (11:13 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Only three fixes this time:
- Emil found an overflow problem with the memory layout sanity check.
- Ard Biesheuvel noticed that late-allocated page tables (for EFI)
weren't being properly constructed.
- Guenter Roeck reported a problem found on qemu caused by the recent
addr_limit changes"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: fix address limit restoration for undefined instructions
ARM: 8591/1: mm: use fully constructed struct pages for EFI pgd allocations
ARM: 8590/1: sanity_check_meminfo(): avoid overflow on vmalloc_limit
Linus Torvalds [Thu, 18 Aug 2016 18:09:43 +0000 (11:09 -0700)]
Merge tag 'pm-4.8-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"More hibernation-related material: one fix for a recent regression in
the core, one small cleanup of the x86-64 resume code and a
documentation update.
Specifics:
- Fix a hibernate core regression resulting from uncovering a latent
bug in its implementation of memory bitmaps by a recent commit
(James Morse).
- Use __pa() to compute a physical address in the x86-64 code
finalizing resume from hibernation (Rafael Wysocki).
- Update power management documentation related to system sleep
states to remove outdated information from it and to add a
description of a recently introduced hibernation debug feature to
it (Rafael Wysocki)"
* tag 'pm-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
x86/power/64: Use __pa() for physical address computation
PM / sleep: Update some system sleep documentation
Linus Torvalds [Thu, 18 Aug 2016 17:58:50 +0000 (10:58 -0700)]
Merge tag 'drm-fixes-for-4.8-rc3' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Pretty quiet so far:
- a few amdgpu/radeon fixup for pcie pm changes
- a couple of amdgpu fixes
- some build fixes
- printk fix"
* tag 'drm-fixes-for-4.8-rc3' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: Change GART offset to 64-bit
drm/mediatek: add ARM_SMCCC dependency
drm/mediatek: add CONFIG_OF dependency
drm/mediatek: add COMMON_CLK dependency
drm/amdgpu: Fix memory trashing if UVD ring test fails
drm/amdgpu: fix vm init error path
drm/amdkfd: print doorbell offset as a hex value
Revert "drm/radeon: work around lack of upstream ACPI support for D3cold"
Revert "drm/amdgpu: work around lack of upstream ACPI support for D3cold"
Johannes Berg [Thu, 11 Aug 2016 09:50:22 +0000 (11:50 +0200)]
locking/barriers: Suppress sparse warnings in lockless_dereference()
After Peter's commit:
331b6d8c7afc ("locking/barriers: Validate lockless_dereference() is used on a pointer type")
... we get a lot of sparse warnings (one for every rcu_dereference, and more)
since the expression here is assigning to the wrong address space.
Instead of validating that 'p' is a pointer this way, instead make
it fail compilation when it's not by using sizeof(*(p)). This will
not cause any sparse warnings (tested, likely since the address
space is irrelevant for sizeof), and will fail compilation when
'p' isn't a pointer type.
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 331b6d8c7afc ("locking/barriers: Validate lockless_dereference() is used on a pointer type")
Link: http://lkml.kernel.org/r/1470909022-687-2-git-send-email-johannes@sipsolutions.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Johannes Berg [Thu, 11 Aug 2016 09:50:21 +0000 (11:50 +0200)]
Revert "drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference"
This reverts commit:
fa7d81bb3c269 ("drm/fb-helper: Reduce READ_ONCE(master) to lockless_dereference")
As Peter explained:
[...] lockless_dereference() is _stronger_ than READ_ONCE(), not weaker.
[...]
Also, clue is in the name: 'dereference', you don't actually dereference
the pointer here, only load it.
My next patch breaks the compile without this revert, because it assumes
you want to deference and thus also need the struct type visible (which
it isn't here), so revert it.
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1470909022-687-1-git-send-email-johannes@sipsolutions.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Catalin Marinas [Fri, 5 Dec 2014 12:34:54 +0000 (12:34 +0000)]
arm64: Fix shift warning in arch/arm64/mm/dump.c
When building with 48-bit VAs and 16K page configuration, it's possible
to get the following warning when building the arm64 page table dumping
code:
arch/arm64/mm/dump.c: In function ‘walk_pud’:
arch/arm64/mm/dump.c:274:102: warning: right shift count >= width of type [-Wshift-count-overflow]
This is because pud_offset(pgd, 0) performs a shift to the right by 36
while the value 0 has the type 'int' by default, therefore 32-bit.
This patch modifies all the p*_offset() uses in arch/arm64/mm/dump.c to
use 0UL for the address argument.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Wanpeng Li [Wed, 17 Aug 2016 02:05:46 +0000 (10:05 +0800)]
sched/cputime: Resync steal time when guest & host lose sync
Commit:
57430218317e ("sched/cputime: Count actually elapsed irq & softirq time")
... fixed a bug but also triggered a regression:
On an i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four
CPU hog processes(for loop) running in the guest, I hot-unplug the pCPUs
on host one by one until there is only one left, then observe CPU utilization
via 'top' in the guest, it shows:
100% st for cpu0(housekeeping)
75% st for other CPUs (nohz full mode)
However, w/o this commit it shows the correct 75% for all four CPUs.
When a guest is interrupted for a longer amount of time, missed clock ticks
are not redelivered later. Because of that, we should not limit the amount
of steal time accounted to the amount of time that the calling functions
think have passed.
However, the interval returned by account_other_time() is NOT rounded down
to the nearest jiffy, while the base interval in get_vtime_delta() it is
subtracted from is, so the max cputime limit is required to avoid underflow.
This patch fixes the regression by limiting the account_other_time() from
get_vtime_delta() to avoid underflow, and lets the other three call sites
(in account_other_time() and steal_account_process_time()) account however
much steal time the host told us elapsed.
Suggested-by: Rik van Riel <riel@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1471399546-4069-1-git-send-email-wanpeng.li@hotmail.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Mon, 15 Aug 2016 16:38:42 +0000 (18:38 +0200)]
sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
Mike reports:
Roughly 10% of the time, ltp testcase getrusage04 fails:
getrusage04 0 TINFO : Expected timers granularity is 4000 us
getrusage04 0 TINFO : Using 1 as multiply factor for max [us]time increment (1000+4000us)!
getrusage04 0 TINFO : utime: 0us; stime: 179us
getrusage04 0 TINFO : utime: 3751us; stime: 0us
getrusage04 1 TFAIL : getrusage04.c:133: stime increased > 5000us:
And tracked it down to the case where the task simply doesn't get
_any_ [us]time ticks.
Update the code to assume all rtime is utime when we lack information,
thus ensuring a task that elides the tick gets time accounted.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Fredrik Markstrom <fredrik.markstrom@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: stable@vger.kernel.org # 4.3+
Fixes: 9d7fb0427648 ("sched/cputime: Guarantee stime + utime == rtime")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
David Carrillo-Cisneros [Wed, 17 Aug 2016 20:55:04 +0000 (13:55 -0700)]
perf/core: Check return value of the perf_event_read() IPI
The call to smp_call_function_single in perf_event_read() may fail if
an invalid or not online CPU index is passed. Warn user if such bug is
present and return error.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1471467307-61171-2-git-send-email-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathieu Poirier [Mon, 18 Jul 2016 16:43:07 +0000 (10:43 -0600)]
perf/core: Enable mapping of the stop filters
At this time the perf_addr_filter_needs_mmap() function will _not_
return true on a user space 'stop' filter. But stop filters need
exactly the same kind of mapping that range and start filters get.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1468860187-318-4-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathieu Poirier [Mon, 18 Jul 2016 16:43:06 +0000 (10:43 -0600)]
perf/core: Update filters only on executable mmap
Function perf_event_mmap() is called by the MM subsystem each time
part of a binary is loaded in memory. There can be several mapping
for a binary, many times unrelated to the code section.
Each time a section of a binary is mapped address filters are
updated, event when the map doesn't pertain to the code section.
The end result is that filters are configured based on the last map
event that was received rather than the last mapping of the code
segment.
For example if we have an executable 'main' that calls library
'libcstest.so.1.0', and that we want to collect traces on code
that is in that library. The perf cmd line for this scenario
would be:
perf record -e cs_etm// --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main
Resulting in binaries being mapped this way:
root@linaro-nano:~# cat /proc/1950/maps
00400000-
00401000 r-xp
00000000 08:02 33169 /home/linaro/main
00410000-
00411000 r--p
00000000 08:02 33169 /home/linaro/main
00411000-
00412000 rw-p
00001000 08:02 33169 /home/linaro/main
7fa2464000-
7fa2474000 rw-p
00000000 00:00 0
7fa2474000-
7fa25a4000 r-xp
00000000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25a4000-
7fa25b3000 ---p
00130000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b3000-
7fa25b7000 r--p
0012f000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b7000-
7fa25b9000 rw-p
00133000 08:02 543 /lib/aarch64-linux-gnu/libc-2.21.so
7fa25b9000-
7fa25bd000 rw-p
00000000 00:00 0
7fa25bd000-
7fa25be000 r-xp
00000000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25be000-
7fa25cd000 ---p
00001000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25cd000-
7fa25ce000 r--p
00000000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25ce000-
7fa25cf000 rw-p
00001000 08:02 38308 /opt/lib/libcstest.so.1.0
7fa25cf000-
7fa25eb000 r-xp
00000000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7fa25ef000-
7fa25f2000 rw-p
00000000 00:00 0
7fa25f7000-
7fa25f9000 rw-p
00000000 00:00 0
7fa25f9000-
7fa25fa000 r--p
00000000 00:00 0 [vvar]
7fa25fa000-
7fa25fb000 r-xp
00000000 00:00 0 [vdso]
7fa25fb000-
7fa25fc000 r--p
0001c000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7fa25fc000-
7fa25fe000 rw-p
0001d000 08:02 574 /lib/aarch64-linux-gnu/ld-2.21.so
7ff2ea8000-
7ff2ec9000 rw-p
00000000 00:00 0 [stack]
root@linaro-nano:~#
Before 'main()' can execute 'libcstest.so.1.0' has to be loaded in
memory. Once that has been done perf_event_mmap() has been called
4 times, with the last map starting at address 0x7fa25ce000 and
the address filter configured to start filtering when the
IP has passed over address 0x0x7fa25ce72c (0x7fa25ce000 + 0x72c).
But that is wrong since the code segment for library 'libcstest.so.1.0'
as been mapped at 0x7fa25bd000, resulting in traces not being
collected.
This patch corrects the situation by requesting that address
filters be updated only if the mapped event is for a code
segment.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1468860187-318-3-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathieu Poirier [Mon, 18 Jul 2016 16:43:05 +0000 (10:43 -0600)]
perf/core: Fix file name handling for start/stop filters
Binary file names have to be supplied for both range and start/stop
filters but the current code only processes the filename if an
address range filter is specified. This code adds processing of
the filename for start/stop filters.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1468860187-318-2-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 16 Aug 2016 11:33:26 +0000 (13:33 +0200)]
perf/core: Fix event_function_local()
Vincent reported triggering the WARN_ON_ONCE() in event_function_local().
While thinking through cases I noticed that by using event_function()
directly, we miss the inactive case usually handled by
event_function_call().
Therefore construct a blend of event_function_call() and
event_function() that handles the cases relevant to
event_function_local().
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # 4.5+
Fixes: fae3fde65138 ("perf: Collapse and fix event_function_call() users")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Mon, 15 Aug 2016 10:17:00 +0000 (12:17 +0200)]
x86/smp: Fix __max_logical_packages value setup
Frank reported kernel panic when he disabled several cores in BIOS
via following option:
Core Disable Bitmap(Hex) [0]
with number 0xFFE, which leaves 16 CPUs in system (out of 48).
The kernel panic below goes along with following messages:
smpboot: Max logical packages: 2^M
smpboot: APIC(0) Converting physical 0 to logical package 0^M
smpboot: APIC(20) Converting physical 1 to logical package 1^M
smpboot: APIC(40) Package 2 exceeds logical package map^M
smpboot: CPU 8 APICId 40 disabled^M
smpboot: APIC(60) Package 3 exceeds logical package map^M
smpboot: CPU 12 APICId 60 disabled^M
...
general protection fault: 0000 [#1] SMP^M
Modules linked in:^M
CPU: 15 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc5+ #1^M
Hardware name: SGI UV300/UV300, BIOS SGI UV 300 series BIOS 05/25/2016^M
task:
ffff8801673e0000 ti:
ffff8801673ac000 task.ti:
ffff8801673ac000^M
RIP: 0010:[<
ffffffff81014d54>] [<
ffffffff81014d54>] uncore_change_context+0xd4/0x180^M
...
[<
ffffffff810158ac>] uncore_event_init_cpu+0x6c/0x70^M
[<
ffffffff81d8c91c>] intel_uncore_init+0x1c2/0x2dd^M
[<
ffffffff81d8c75a>] ? uncore_cpu_setup+0x17/0x17^M
[<
ffffffff81002190>] do_one_initcall+0x50/0x190^M
[<
ffffffff810ab193>] ? parse_args+0x293/0x480^M
[<
ffffffff81d87365>] kernel_init_freeable+0x1a5/0x249^M
[<
ffffffff81d86a35>] ? set_debug_rodata+0x12/0x12^M
[<
ffffffff816dc19e>] kernel_init+0xe/0x110^M
[<
ffffffff816e93bf>] ret_from_fork+0x1f/0x40^M
[<
ffffffff816dc190>] ? rest_init+0x80/0x80^M
The reason for the panic is wrong value of __max_logical_packages,
which lets logical_package_map uninitialized and the uncore code
relying on this map being properly initialized (maybe we should
add some safety checks there as well).
The __max_logical_packages is computed as:
DIV_ROUND_UP(total_cpus, ncpus);
- ncpus being number of cores
With above BIOS setup we get total_cpus == 16 which set
__max_logical_packages to 2 (ncpus is 12).
Once topology_update_package_map processes CPU with logical
pkg over 2 we display above messages and fail to initialize
the physical_to_logical_pkg map, which makes the uncore code
crash.
The fix is to remove logical_package_map bitmap completely
and keep and update the logical_packages number instead.
After we enumerate all the present CPUs, we check if the
enumerated logical packages count is within its computed
maximum from BIOS data.
If it's not the case, we set this maximum to the new enumerated
value and freeze any new addition of logical packages.
The freeze is because lot of init code like uncore/rapl/cqm
depends on having maximum logical package value set to allocate
their data, so we can't change it later on.
Prarit Bhargava tested the patch and confirms that it solves
the problem:
From dmidecode:
Core Count: 24
Core Enabled: 24
Thread Count: 48
Orig kernel boot log:
[ 0.464981] smpboot: Max logical packages: 19
[ 0.469861] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.477261] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.484760] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.492258] smpboot: APIC(c0) Converting physical 3 to logical package 3
1. nr_cpus=8, should stop enumerating in package 0:
[ 0.533664] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.539596] smpboot: Max logical packages: 19
2. max_cpus=8, should still enumerate all packages:
[ 0.526494] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.532428] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.538456] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.544486] smpboot: APIC(c0) Converting physical 3 to logical package 3
[ 0.550524] smpboot: Max logical packages: 19
3. nr_cpus=49 ( 2 socket + 1 core on 3rd socket), should stop enumerating in
package 2:
[ 0.521378] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.527314] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.533345] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.539368] smpboot: Max logical packages: 19
4. maxcpus=49, should still enumerate all packages:
[ 0.525591] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.531525] smpboot: APIC(40) Converting physical 1 to logical package 1
[ 0.537547] smpboot: APIC(80) Converting physical 2 to logical package 2
[ 0.543579] smpboot: APIC(c0) Converting physical 3 to logical package 3
[ 0.549624] smpboot: Max logical packages: 19
5. kdump (nr_cpus=1) works as well.
Reported-by: Frank Ramsay <framsay@redhat.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160815101700.GA30090@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>