Peter Zijlstra [Mon, 18 Jan 2016 14:27:07 +0000 (15:27 +0100)]
sched/rt: Fix PI handling vs. sched_setscheduler()
Andrea Parri reported:
> I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not
> handled correctly:
>
> T1 (prio = 20)
> lock(rtmutex);
>
> T2 (prio = 20)
> blocks on rtmutex (rt_nr_boosted = 0 on T1's rq)
>
> T1 (prio = 20)
> sys_set_scheduler(prio = 0)
> [new_effective_prio == oldprio]
> T1 prio = 20 (rt_nr_boosted = 0 on T1's rq)
>
> The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted());
> in particular, if we continue with
>
> T1 (prio = 20)
> unlock(rtmutex)
> wakeup(T2)
> adjust_prio(T1)
> [prio != rt_mutex_getprio(T1)]
> dequeue(T1)
> rt_nr_boosted = (unsigned long)(-1)
> ...
> T1 prio = 0
>
> then we end up leaving rt_nr_boosted in an "inconsistent" state.
>
> The simple program attached could reproduce the previous scenario; note
> that, as a consequence of the presence of this state, the "assertion"
>
> WARN_ON(!rt_nr_running && rt_nr_boosted)
>
> from dec_rt_group() may trigger.
So normally we dequeue/enqueue tasks in sched_setscheduler(), which
would ensure the accounting stays correct. However in the early PI path
we fail to do so.
So this was introduced at around v3.14, by:
c365c292d059 ("sched: Consider pi boosting in setscheduler()")
which fixed another problem exactly because that dequeue/enqueue, joy.
Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it
preserve runqueue location with that option. This requires decoupling
the on_rt_rq() state from being on the list.
In order to allow for explicit movement during the SAVE/RESTORE,
introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these
cases to preserve other invariants.
Respecting the SAVE/RESTORE flags also has the (nice) side-effect that
things like sys_nice()/sys_sched_setaffinity() also do not reorder
FIFO tasks (whereas they used to before this patch).
Reported-by: Andrea Parri <parri.andrea@gmail.com>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dongsheng Yang [Wed, 13 Jan 2016 08:42:38 +0000 (16:42 +0800)]
sched/core: Remove duplicated sched_group_set_shares() prototype
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1452674558-31897-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Frederic Weisbecker [Wed, 13 Jan 2016 16:01:29 +0000 (17:01 +0100)]
sched/fair: Consolidate nohz CPU load update code
Lets factorize a bit of code there. We'll even have a third user soon.
While at it, standardize the idle update function name against the
others.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Byungchul Park <byungchul.park@lge.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1452700891-21807-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Byungchul Park [Fri, 15 Jan 2016 07:07:49 +0000 (16:07 +0900)]
sched/fair: Avoid using decay_load_missed() with a negative value
decay_load_missed() cannot handle nagative values, so we need to prevent
using the function with a negative value.
Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: perterz@infradead.org
Fixes: 59543275488d ("sched/fair: Prepare __update_cpu_load() to handle active tickless")
Link: http://lkml.kernel.org/r/20160115070749.GA1914@X58A-UD3R
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Mon, 29 Feb 2016 08:42:07 +0000 (09:42 +0100)]
Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 23 Feb 2016 12:28:22 +0000 (13:28 +0100)]
sched/deadline: Always calculate end of period on sched_yield()
Steven noticed that occasionally a sched_yield() call would not result
in a wait for the next period edge as expected.
It turns out that when we call update_curr_dl() and end up with
delta_exec <= 0, we will bail early and fail to throttle.
Further inspection of the yield code revealed that yield_task_dl()
clearing dl.runtime is wrong too, it will not account the last bit of
runtime which could result in dl.runtime < 0, which in turn means that
replenish would gift us with too much runtime.
Fix both issues by not relying on the dl.runtime value for yield.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160223122822.GP6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 21 Jan 2016 21:24:16 +0000 (22:24 +0100)]
sched/cgroup: Fix cgroup entity load tracking tear-down
When a cgroup's CPU runqueue is destroyed, it should remove its
remaining load accounting from its parent cgroup.
The current site for doing so it unsuited because its far too late and
unordered against other cgroup removal (->css_free() will be, but we're also
in an RCU callback).
Put it in the ->css_offline() callback, which is the start of cgroup
destruction, right after the group has been made unavailable to
userspace. The ->css_offline() callbacks are called in hierarchical order
after the following v4.4 commit:
aa226ff4a1ce ("cgroup: make sure a parent css isn't offlined before its children")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160121212416.GL6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Sun, 28 Feb 2016 16:41:20 +0000 (08:41 -0800)]
Linux 4.5-rc6
Linus Torvalds [Sun, 28 Feb 2016 15:52:00 +0000 (07:52 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A rather largish series of 12 patches addressing a maze of race
conditions in the perf core code from Peter Zijlstra"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Robustify task_function_call()
perf: Fix scaling vs. perf_install_in_context()
perf: Fix scaling vs. perf_event_enable()
perf: Fix scaling vs. perf_event_enable_on_exec()
perf: Fix ctx time tracking by introducing EVENT_TIME
perf: Cure event->pending_disable race
perf: Fix race between event install and jump_labels
perf: Fix cloning
perf: Only update context time when active
perf: Allow perf_release() with !event->ctx
perf: Do not double free
perf: Close install vs. exit race
Linus Torvalds [Sun, 28 Feb 2016 15:49:23 +0000 (07:49 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"This update contains:
- Hopefully the last ASM CLAC fixups
- A fix for the Quark family related to the IMR lock which makes
kexec work again
- A off-by-one fix in the MPX code. Ironic, isn't it?
- A fix for X86_PAE which addresses once more an unsigned long vs
phys_addr_t hickup"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mpx: Fix off-by-one comparison with nr_registers
x86/mm: Fix slow_virt_to_phys() for X86_PAE again
x86/entry/compat: Add missing CLAC to entry_INT80_32
x86/entry/32: Add an ASM_CLAC to entry_SYSENTER_32
x86/platform/intel/quark: Change the kernel's IMR lock bit to false
Linus Torvalds [Sun, 28 Feb 2016 15:48:01 +0000 (07:48 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixlet from Thomas Gleixner:
"A trivial printk typo fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix trivial typo in printk() message
Linus Torvalds [Sun, 28 Feb 2016 15:45:58 +0000 (07:45 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Four small fixes for irqchip drivers:
- Add missing low level irq handler initialization on mxs, so
interrupts can acutally be delivered
- Add a missing barrier to the GIC driver
- Two fixes for the GIC-V3-ITS driver, addressing a double EOI write
and a cache flush beyond the actual region"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Add missing barrier to 32bit version of gic_read_iar()
irqchip/mxs: Add missing set_handle_irq()
irqchip/gicv3-its: Avoid cache flush beyond ITS_BASERn memory size
irqchip/gic-v3-its: Fix double ICC_EOIR write for LPI in EOImode==1
Linus Torvalds [Sun, 28 Feb 2016 15:39:15 +0000 (07:39 -0800)]
Merge tag 'staging-4.5-rc6' of git://git./linux/kernel/git/gregkh/staging
Pull staging/android fix from Greg KH:
"Here is one patch, for the android binder driver, to resolve a
reported problem. Turns out it has been around for a while (since
3.15), so it is good to finally get it resolved.
It has been in linux-next for a while with no reported issues"
* tag 'staging-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
drivers: android: correct the size of struct binder_uintptr_t for BC_DEAD_BINDER_DONE
Linus Torvalds [Sun, 28 Feb 2016 15:37:30 +0000 (07:37 -0800)]
Merge tag 'usb-4.5-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a few USB fixes for 4.5-rc6
They fix a reported bug for some USB 3 devices by reverting the recent
patch, a MAINTAINERS change for some drivers, some new device ids, and
of course, the usual bunch of USB gadget driver fixes.
All have been in linux-next for a while with no reported issues"
* tag 'usb-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
MAINTAINERS: drop OMAP USB and MUSB maintainership
usb: musb: fix DMA for host mode
usb: phy: msm: Trigger USB state detection work in DRD mode
usb: gadget: net2280: fix endpoint max packet for super speed connections
usb: gadget: gadgetfs: unregister gadget only if it got successfully registered
usb: gadget: remove driver from pending list on probe error
Revert "usb: hub: do not clear BOS field during reset device"
usb: chipidea: fix return value check in ci_hdrc_pci_probe()
usb: chipidea: error on overflow for port_test_write
USB: option: add "4G LTE usb-modem U901"
USB: cp210x: add IDs for GE B650V3 and B850V3 boards
USB: option: add support for SIM7100E
usb: musb: Fix DMA desired mode for Mentor DMA engine
usb: gadget: fsl_qe_udc: fix IS_ERR_VALUE usage
usb: dwc2: USB_DWC2 should depend on HAS_DMA
usb: dwc2: host: fix the data toggle error in full speed descriptor dma
usb: dwc2: host: fix logical omissions in dwc2_process_non_isoc_desc
usb: dwc3: Fix assignment of EP transfer resources
usb: dwc2: Add extra delay when forcing dr_mode
Linus Torvalds [Sun, 28 Feb 2016 01:10:32 +0000 (17:10 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
do_last(): ELOOP failure exit should be done after leaving RCU mode
should_follow_link(): validate ->d_seq after having decided to follow
namei: ->d_inode of a pinned dentry is stable only for positives
do_last(): don't let a bogus return value from ->open() et.al. to confuse us
fs: return -EOPNOTSUPP if clone is not supported
hpfs: don't truncate the file when delete fails
Linus Torvalds [Sun, 28 Feb 2016 00:58:32 +0000 (16:58 -0800)]
Merge tag 'armsoc-fixes' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"We didn't have a batch last week, so this one is slightly larger.
None of them are scary though, a handful of fixes for small DT pieces,
replacing properties with newer conventions.
Highlights:
- N900 fix for setting system revision
- onenand init fix to avoid filesystem corruption
- Clock fix for audio on Beaglebone-x15
- Fixes on shmobile to deal with CONFIG_DEBUG_RODATA (default y in 4.6)
+ misc smaller stuff"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
MAINTAINERS: Extend info, add wiki and ml for meson arch
MAINTAINERS: alpine: add a new maintainer and update the entry
ARM: at91/dt: fix typo in sama5d2 pinmux descriptions
ARM: OMAP2+: Fix onenand initialization to avoid filesystem corruption
Revert "regulator: tps65217: remove tps65217.dtsi file"
ARM: shmobile: Remove shmobile_boot_arg
ARM: shmobile: Move shmobile_smp_{mpidr, fn, arg}[] from .text to .bss
ARM: shmobile: r8a7779: Remove remainings of removed SCU boot setup code
ARM: shmobile: Move shmobile_scu_base from .text to .bss
ARM: OMAP2+: Fix omap_device for module reload on PM runtime forbid
ARM: OMAP2+: Improve omap_device error for driver writers
ARM: DTS: am57xx-beagle-x15: Select SYS_CLK2 for audio clocks
ARM: dts: am335x/am57xx: replace gpio-key,wakeup with wakeup-source property
ARM: OMAP2+: Set system_rev from ATAGS for n900
ARM: dts: orion5x: fix the missing mtd flash on linkstation lswtgl
ARM: dts: kirkwood: use unique machine name for ds112
ARM: dts: imx6: remove bogus interrupt-parent from CAAM node
Al Viro [Sun, 28 Feb 2016 00:37:37 +0000 (19:37 -0500)]
do_last(): ELOOP failure exit should be done after leaving RCU mode
... or we risk seeing a bogus value of d_is_symlink() there.
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 28 Feb 2016 00:31:01 +0000 (19:31 -0500)]
should_follow_link(): validate ->d_seq after having decided to follow
... otherwise d_is_symlink() above might have nothing to do with
the inode value we've got.
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 28 Feb 2016 00:23:16 +0000 (19:23 -0500)]
namei: ->d_inode of a pinned dentry is stable only for positives
both do_last() and walk_component() risk picking a NULL inode out
of dentry about to become positive, *then* checking its flags and
seeing that it's not negative anymore and using (already stale by
then) value they'd fetched earlier. Usually ends up oopsing soon
after that...
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 28 Feb 2016 00:17:33 +0000 (19:17 -0500)]
do_last(): don't let a bogus return value from ->open() et.al. to confuse us
... into returning a positive to path_openat(), which would interpret that
as "symlink had been encountered" and proceed to corrupt memory, etc.
It can only happen due to a bug in some ->open() instance or in some LSM
hook, etc., so we report any such event *and* make sure it doesn't trick
us into further unpleasantness.
Cc: stable@vger.kernel.org # v3.6+, at least
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Christoph Hellwig [Fri, 26 Feb 2016 17:53:12 +0000 (18:53 +0100)]
fs: return -EOPNOTSUPP if clone is not supported
-EBADF is a rather confusing error if an operations is not supported,
and nfsd gets rather upset about it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mikulas Patocka [Thu, 25 Feb 2016 17:17:38 +0000 (18:17 +0100)]
hpfs: don't truncate the file when delete fails
The delete opration can allocate additional space on the HPFS filesystem
due to btree split. The HPFS driver checks in advance if there is
available space, so that it won't corrupt the btree if we run out of space
during splitting.
If there is not enough available space, the HPFS driver attempted to
truncate the file, but this results in a deadlock since the commit
7dd29d8d865efdb00c0542a5d2c87af8c52ea6c7 ("HPFS: Introduce a global mutex
and lock it on every callback from VFS").
This patch removes the code that tries to truncate the file and -ENOSPC is
returned instead. If the user hits -ENOSPC on delete, he should try to
delete other files (that are stored in a leaf btree node), so that the
delete operation will make some space for deleting the file stored in
non-leaf btree node.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: stable@vger.kernel.org # 2.6.39+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 27 Feb 2016 20:46:16 +0000 (12:46 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
dax: move writeback calls into the filesystems
dax: give DAX clearing code correct bdev
ext4: online defrag not supported with DAX
ext2, ext4: only set S_DAX for regular inodes
block: disable block device DAX by default
ocfs2: unlock inode if deleting inode from orphan fails
mm: ASLR: use get_random_long()
drivers: char: random: add get_random_long()
mm: numa: quickly fail allocations for NUMA balancing on full nodes
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
Linus Torvalds [Sat, 27 Feb 2016 20:40:49 +0000 (12:40 -0800)]
Merge tag 'tags/ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext2/4 DAX fix from Ted Ts'o:
"This fixes a file system corruption bug with DAX"
* tag 'tags/ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()
Linus Torvalds [Sat, 27 Feb 2016 20:33:42 +0000 (12:33 -0800)]
Merge tag 'pci-v4.5-fixes-3' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Enumeration:
Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas)
Marvell MVEBU host bridge driver:
Restrict build to 32-bit ARM (Thierry Reding)"
* tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: mvebu: Restrict build to 32-bit ARM
Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"
Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"
Ross Zwisler [Sat, 27 Feb 2016 19:01:13 +0000 (14:01 -0500)]
ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()
As it is currently written ext4_dax_mkwrite() assumes that the call into
__dax_mkwrite() will not have to do a block allocation so it doesn't create
a journal entry. For a read that creates a zero page to cover a hole
followed by a write that actually allocates storage this is incorrect. The
ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
get_blocks() to allocate storage.
Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
as this function already has all the logic needed to allocate a journal
entry and call __dax_fault().
Also update the ext2 fault handlers in this same way to remove duplicate
code and keep the logic between ext2 and ext4 the same.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Linus Torvalds [Sat, 27 Feb 2016 18:30:14 +0000 (10:30 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"One small fix to keep OMAP platforms working across a suspend/resume
cycle"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: ti: omap3+: dpll: use non-locking version of clk_get_rate
Ross Zwisler [Fri, 26 Feb 2016 23:19:55 +0000 (15:19 -0800)]
dax: move writeback calls into the filesystems
Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().
dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode->i_sb->s_bdev. This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.
Instead, call dax_writeback_mapping_range() directly from the filesystem
->writepages function so that it can supply us with a valid block
device. This also fixes DAX code to properly flush caches in response
to sync(2).
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Fri, 26 Feb 2016 23:19:52 +0000 (15:19 -0800)]
dax: give DAX clearing code correct bdev
dax_clear_blocks() needs a valid struct block_device and previously it
was using inode->i_sb->s_bdev in all cases. This is correct for normal
inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
DAX raw block devices and for XFS real-time devices.
Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
its arguments to take a bdev and a sector instead of an inode and a
block. This better reflects what the function does, and it allows the
filesystem and raw block device code to pass in an appropriate struct
block_device.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Fri, 26 Feb 2016 23:19:49 +0000 (15:19 -0800)]
ext4: online defrag not supported with DAX
Online defrag operations for ext4 are hard coded to use the page cache.
See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page()
When combined with DAX I/O, which circumvents the page cache, this can
result in data corruption. This was observed with xfstests ext4/307 and
ext4/308.
Fix this by only allowing online defrag for non-DAX files.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Fri, 26 Feb 2016 23:19:46 +0000 (15:19 -0800)]
ext2, ext4: only set S_DAX for regular inodes
When S_DAX is set on an inode we assume that if there are pages attached
to the mapping (mapping->nrpages != 0), those pages are clean zero pages
that were used to service reads from holes. Any dirty data associated
with the inode should be in the form of DAX exceptional entries
(mapping->nrexceptional) that is written back via
dax_writeback_mapping_range().
With the current code, though, this isn't always true. For example,
ext2 and ext4 directory inodes can have S_DAX set, but have their dirty
data stored as dirty page cache entries. For these types of inodes,
having S_DAX set doesn't really make sense since their I/O doesn't
actually happen through the DAX code path.
Instead, only allow S_DAX to be set for regular inodes for ext2 and
ext4. This allows us to have strict DAX vs non-DAX paths in the
writeback code.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Fri, 26 Feb 2016 23:19:43 +0000 (15:19 -0800)]
block: disable block device DAX by default
The recent *sync enabling discovered that we are inserting into the
block_device pagecache counter to the expectations of the dirty data
tracking for dax mappings. This can lead to data corruption.
We want to support DAX for block devices eventually, but it requires
wider changes to properly manage the pagecache.
dump_stack+0x85/0xc2
dax_writeback_mapping_range+0x60/0xe0
blkdev_writepages+0x3f/0x50
do_writepages+0x21/0x30
__filemap_fdatawrite_range+0xc6/0x100
filemap_write_and_wait+0x4a/0xa0
set_blocksize+0x70/0xd0
sb_set_blocksize+0x1d/0x50
ext4_fill_super+0x75b/0x3360
mount_bdev+0x180/0x1b0
ext4_mount+0x15/0x20
mount_fs+0x38/0x170
Mark the support broken so its disabled by default, but otherwise still
available for testing.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Guozhonghua [Fri, 26 Feb 2016 23:19:40 +0000 (15:19 -0800)]
ocfs2: unlock inode if deleting inode from orphan fails
When doing append direct io cleanup, if deleting inode fails, it goes
out without unlocking inode, which will cause the inode deadlock.
This issue was introduced by commit
cf1776a9e834 ("ocfs2: fix a tiny
race when truncate dio orohaned entry").
Signed-off-by: Guozhonghua <guozhonghua@h3c.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org> [4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Cashman [Fri, 26 Feb 2016 23:19:37 +0000 (15:19 -0800)]
mm: ASLR: use get_random_long()
Replace calls to get_random_int() followed by a cast to (unsigned long)
with calls to get_random_long(). Also address shifting bug which, in
case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.
Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Cashman [Fri, 26 Feb 2016 23:19:34 +0000 (15:19 -0800)]
drivers: char: random: add get_random_long()
Commit
d07e22597d1d ("mm: mmap: add new /proc tunable for mmap_base
ASLR") added the ability to choose from a range of values to use for
entropy count in generating the random offset to the mmap_base address.
The maximum value on this range was set to 32 bits for 64-bit x86
systems, but this value could be increased further, requiring more than
the 32 bits of randomness provided by get_random_int(), as is already
possible for arm64. Add a new function: get_random_long() which more
naturally fits with the mmap usage of get_random_int() but operates
exactly the same as get_random_int().
Also, fix the shifting constant in mmap_rnd() to be an unsigned long so
that values greater than 31 bits generate an appropriate mask without
overflow. This is especially important on x86, as its shift instruction
uses a 5-bit mask for the shift operand, which meant that any value for
mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base
randomization.
Finally, replace calls to get_random_int() with get_random_long() where
appropriate.
This patch (of 2):
Add get_random_long().
Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 26 Feb 2016 23:19:31 +0000 (15:19 -0800)]
mm: numa: quickly fail allocations for NUMA balancing on full nodes
Commit
4167e9b2cf10 ("mm: remove GFP_THISNODE") removed the GFP_THISNODE
flag combination due to confusing semantics. It noted that
alloc_misplaced_dst_page() was one such user after changes made by
commit
e97ca8e5b864 ("mm: fix GFP_THISNODE callers and clarify").
Unfortunately when GFP_THISNODE was removed, users of
alloc_misplaced_dst_page() started waking kswapd and entering direct
reclaim because the wrong GFP flags are cleared. The consequence is
that workloads that used to fit into memory now get reclaimed which is
addressed by this patch.
The problem can be demonstrated with "mutilate" that exercises memcached
which is software dedicated to memory object caching. The configuration
uses 80% of memory and is run 3 times for varying numbers of clients.
The results on a 4-socket NUMA box are
mutilate
4.4.0 4.4.0
vanilla numaswap-v1
Hmean 1 8394.71 ( 0.00%) 8395.32 ( 0.01%)
Hmean 4 30024.62 ( 0.00%) 34513.54 ( 14.95%)
Hmean 7 32821.08 ( 0.00%) 70542.96 (114.93%)
Hmean 12 55229.67 ( 0.00%) 93866.34 ( 69.96%)
Hmean 21 39438.96 ( 0.00%) 85749.21 (117.42%)
Hmean 30 37796.10 ( 0.00%) 50231.49 ( 32.90%)
Hmean 47 18070.91 ( 0.00%) 38530.13 (113.22%)
The metric is queries/second with the more the better. The results are
way outside of the noise and the reason for the improvement is obvious
from some of the vmstats
4.4.0 4.4.0
vanillanumaswap-v1r1
Minor Faults
1929399272 2146148218
Major Faults
19746529 3567
Swap Ins
57307366 9913
Swap Outs
50623229 17094
Allocation stalls 35909 443
DMA allocs 0 0
DMA32 allocs
72976349 170567396
Normal allocs
5306640898 5310651252
Movable allocs 0 0
Direct pages scanned
404130893 799577
Kswapd pages scanned
160230174 0
Kswapd pages reclaimed
55928786 0
Direct pages reclaimed
1843936 41921
Page writes file 2391 0
Page writes anon
50623229 17094
The vanilla kernel is swapping like crazy with large amounts of direct
reclaim and kswapd activity. The figures are aggregate but it's known
that the bad activity is throughout the entire test.
Note that simple streaming anon/file memory consumers also see this
problem but it's not as obvious. In those cases, kswapd is awake when
it should not be.
As there are at least two reclaim-related bugs out there, it's worth
spelling out the user-visible impact. This patch only addresses bugs
related to excessive reclaim on NUMA hardware when the working set is
larger than a NUMA node. There is a bug related to high kswapd CPU
usage but the reports are against laptops and other UMA hardware and is
not addressed by this patch.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Fri, 26 Feb 2016 23:19:28 +0000 (15:19 -0800)]
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).
While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable(). The old pmd_trans_huge() left a
tiny window for a race.
Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.
[akpm@linux-foundation.org: tidy up comment grammar/layout]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thierry Reding [Thu, 18 Feb 2016 13:32:10 +0000 (14:32 +0100)]
PCI: mvebu: Restrict build to 32-bit ARM
This driver uses PCI glue that is only available on 32-bit ARM. This used
to work fine as long as ARCH_MVEBU and ARCH_DOVE were exclusively 32-bit,
but there's a patch in the pipe to make ARCH_MVEBU also available on 64-bit
ARM.
[bhelgaas: changelog; patch is coming but not merged yet]
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Bjorn Helgaas [Wed, 17 Feb 2016 18:26:42 +0000 (12:26 -0600)]
Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") appeared in v4.3 and helps support IOAPIC hotplug.
Олег reported that the Elcus-1553 TA1-PCI driver worked in v4.2 but not
v4.3 and bisected it to
991de2e59090. Sunjin reported that the RocketRAID
272x driver worked in v4.2 but not v4.3. In both cases booting with
"pci=routirq" is a workaround.
I think the problem is that after
991de2e59090, we no longer call
pcibios_enable_irq() for upstream bridges. Prior to
991de2e59090, when a
driver called pci_enable_device(), we recursively called
pcibios_enable_irq() for upstream bridges via pci_enable_bridge().
After
991de2e59090, we call pcibios_enable_irq() from pci_device_probe()
instead of the pci_enable_device() path, which does *not* call
pcibios_enable_irq() for upstream bridges.
Revert
991de2e59090 to fix these driver regressions.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Reported-and-tested-by: Олег Мороз <oleg.moroz@mcc.vniiem.ru>
Reported-by: Sunjin Yang <fan4326@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>
Colin Ian King [Fri, 26 Feb 2016 18:55:31 +0000 (18:55 +0000)]
x86/mpx: Fix off-by-one comparison with nr_registers
In the unlikely event that regno == nr_registers then we get an array
overrun on regoff because the invalid register check is currently
off-by-one. Fix this with a check that regno is >= nr_registers instead.
Detected with static analysis using CoverityScan.
Fixes: fcc7ffd67991 "x86, mpx: Decode MPX instruction to get bound violation information"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1456512931-3388-1-git-send-email-colin.king@canonical.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Fri, 26 Feb 2016 17:35:03 +0000 (09:35 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"There are two small messenger bug fixes and a log spam regression fix"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: don't spam dmesg with stray reply warnings
libceph: use the right footer size when skipping a message
libceph: don't bail early from try_read() when skipping a message
Linus Torvalds [Fri, 26 Feb 2016 17:27:21 +0000 (09:27 -0800)]
Merge tag 'sound-4.5-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Things got calmed down for rc6, as it seems, and we have only a few
HD-audio fixes at this time: a fix for Skylake codec probe errors, a
fix for missing interrupt handling, and a few Dell and HP quirks"
* tag 'sound-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Loop interrupt handling until really cleared
ALSA: hda - Fix headset support and noise on HP EliteBook 755 G2
ALSA: hda - Fixup speaker pass-through control for nid 0x14 on ALC225
ALSA: hda - Fixing background noise on Dell Inspiron 3162
ALSA: hda - Apply clock gate workaround to Skylake, too
Linus Torvalds [Fri, 26 Feb 2016 17:21:48 +0000 (09:21 -0800)]
Merge tag 'pm+acpi-4.5-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These are two reverts of recent PCI-related ACPI core changes (one of
which caused some systems to crash on boot and the other was a cleanup
on top of it) and a devfreq fix for Tegra.
Specifics:
- Revert an ACPI core change related to IRQ management in PCI that
introduced code relying on the use of kmalloc() which turned out to
also run during early init when that's not available yet and caused
some systems to crash on boot for this reason along with a cleanup
on top of it (Rafael Wysocki).
- Prevent devfreq from flooding the kernel log with useless messages
on Tegra (which started to happen after some recent changes in the
devfreq core) by fixing the driver to follow the documentation and
the core's expectations in its ->target callback (Tomeu Vizoso)"
* tag 'pm+acpi-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI, PCI, irq: remove interrupt count restriction"
Revert "ACPI / PCI: Simplify acpi_penalize_isa_irq()"
PM / devfreq: tegra: Set freq in rate callback
Rafael J. Wysocki [Fri, 26 Feb 2016 12:50:55 +0000 (13:50 +0100)]
Merge branches 'pm-devfreq' and 'acpi-pci'
* pm-devfreq:
PM / devfreq: tegra: Set freq in rate callback
* acpi-pci:
Revert "ACPI, PCI, irq: remove interrupt count restriction"
Revert "ACPI / PCI: Simplify acpi_penalize_isa_irq()"
James Morris [Fri, 26 Feb 2016 08:32:16 +0000 (19:32 +1100)]
Merge branch 'stable-4.5' of git://git.infradead.org/users/pcmoore/selinux into for-linus
Takashi Iwai [Tue, 23 Feb 2016 14:54:47 +0000 (15:54 +0100)]
ALSA: hda - Loop interrupt handling until really cleared
Currently the interrupt handler of HD-audio driver assumes that no irq
update is needed while processing the irq. But in reality, it has
been confirmed that the HW irq is issued even during the irq
handling. Since we clear the irq status at the beginning, process the
interrupt, then exits from the handler, the lately issued interrupt is
left untouched without being properly processed.
This patch changes the interrupt handler code to loop over the
check-and-process. The handler tries repeatedly as long as the IRQ
status are turned on, and either stream or CORB/RIRB is handled.
For checking the stream handling, snd_hdac_bus_handle_stream_irq()
returns a value indicating the stream indices bits. Other than that,
the change is only in the irq handler itself.
Reported-by: Libin Yang <libin.yang@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Fri, 26 Feb 2016 04:12:09 +0000 (20:12 -0800)]
Merge tag 'trace-fixes-v4.5-rc5-2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Another small bug reported to me by Chunyu Hu.
When perf added a "reg" function to the function tracing event (not a
tracepoint), it caused that event to be displayed as a tracepoint and
could cause errors in tracepoint handling. That was solved by adding
a flag to ignore ftrace non-tracepoint events. But that flag was
missed when displaying events in available_events, which should only
contain tracepoint events.
This broke a documented way to enable all events with:
cat available_events > set_event
As the function non-tracepoint event would cause that to error out.
The commit here fixes that by having the available_events file not
list events that have the ignore flag set"
* tag 'trace-fixes-v4.5-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix showing function event in available_events
Linus Torvalds [Fri, 26 Feb 2016 03:53:54 +0000 (19:53 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"KVM/ARM fixes:
- Fix per-vcpu vgic bitmap allocation
- Do not give copy random memory on MMIO read
- Fix GICv3 APR register restore order
KVM/x86 fixes:
- Fix ubsan warning
- Fix hardware breakpoints in a guest vs. preempt notifiers
- Fix Hurd
Generic:
- use __GFP_NOWARN together with GFP_NOWAIT"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: MMU: fix ubsan index-out-of-range warning
arm64: KVM: vgic-v3: Restore ICH_APR0Rn_EL2 before ICH_APR1Rn_EL2
KVM: async_pf: do not warn on page allocation failures
KVM: x86: fix conversion of addresses to linear in 32-bit protected mode
KVM: x86: fix missed hardware breakpoints
arm/arm64: KVM: Feed initialized memory to MMIO accesses
KVM: arm/arm64: vgic: Ensure bitmaps are long enough
Linus Torvalds [Fri, 26 Feb 2016 03:47:01 +0000 (19:47 -0800)]
Merge tag 'renesas-sh-drivers-fixes-for-v4.5' of git://git./linux/kernel/git/horms/renesas
Pull SuperH driver fix from Simon Horman:
"Restore legacy clock domain on SuperH platforms"
* tag 'renesas-sh-drivers-fixes-for-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
drivers: sh: Restore legacy clock domain on SuperH platforms
Linus Torvalds [Fri, 26 Feb 2016 03:41:53 +0000 (19:41 -0800)]
Merge tag 'powerpc-4.5-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- eeh: Fix partial hotplug criterion from Gavin Shan
- mm: Clear the invalid slot information correctly from Aneesh Kumar K.V
* tag 'powerpc-4.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm/hash: Clear the invalid slot information correctly
powerpc/eeh: Fix partial hotplug criterion
Linus Torvalds [Fri, 26 Feb 2016 03:36:33 +0000 (19:36 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 bugfixes from Martin Schwidefsky:
"Two critical bug fixes for the signal handling"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/fpu: signals vs. floating point control register
s390/compat: correct restore of high gprs on signal return
Linus Torvalds [Fri, 26 Feb 2016 03:31:01 +0000 (19:31 -0800)]
Merge tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfix from Bruce Fields:
"One fix for a bug that could cause a NULL write past the end of a
buffer in case of unusually long writes to some system interfaces used
by mountd and other nfs support utilities"
* tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux:
sunrpc/cache: fix off-by-one in qword_get()
Linus Torvalds [Fri, 26 Feb 2016 03:01:42 +0000 (19:01 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This is a bit larger than Id like, but I asked the Intel guys to pull
in some Skylake fixes in the possibly vain hope that Skylake might be
more functional now that I'm seeing production hardware shipping.
For i915, it's mostly the same patch in a few places, making sure the
hw doesn't turn off when we are programming it.
Apart from that are two nouveau fixes, one for a module defer bug, and
one for using nouveau on new Lenovo P50 models.
Then there are a bunch of AMDGPU fixes, one is a fix for v4.4 vblank
regressions, and some PM fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (26 commits)
drm/nouveau/disp/dp: ensure sink is powered up before attempting link training
drm/nouveau: platform: Fix deferred probe
drm/amdgpu: disable direct VM updates when vm_debug is set
amdgpu: fix NULL pointer dereference at tonga_check_states_equal
drm/i915/gen9: Verify and enforce dc6 state writes
drm/i915/gen9: Check for DC state mismatch
drm/radeon/pm: adjust display configuration after powerstate
drm/amdgpu/pm: adjust display configuration after powerstate
drm/amdgpu/pm: add some checks for PX
drm/amdgpu: fix locking in force performance level
drm/amdgpu/gfx8: fix priv reg interrupt enable
drm/i915/skl: Ensure HW is powered during DDB HW state readout
drm/i915/lvds: Ensure the HW is powered during HW state readout
drm/i915/hdmi: Ensure the HW is powered during HW state readout
drm/i915/dsi: Ensure the HW is powered during HW state readout
drm/i915/dp: Ensure the HW is powered during HW state readout
drm/i915: Ensure the HW is powered when accessing the CRC HW block
drm/i915/ddi: Ensure the HW is powered during HW state readout
drm/i915/crt: Ensure the HW is powered during HW state readout
drm/i915: Ensure the HW is powered during HW access in assert_pipe
...
Linus Torvalds [Fri, 26 Feb 2016 02:54:53 +0000 (18:54 -0800)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- Two fixes for compatibility with the ACPI 6.1 specification.
Without these fixes multi-interface DIMMs will fail to be probed, and
address range scrub commands to find memory errors will give results
that the kernel will mis-interpret. For multi-interface DIMMs Linux
will accept either the original 6.0 implementation or 6.1.
For address range scrub we'll only support 6.1 since ACPI formalized
this DSM differently than the original example [1] implemented in
v4.2. The expectation is that production systems will only ever ship
the ACPI 6.1 address range scrub command definition.
- The wider async address range scrub work targeting 4.6 discovered
that the original synchronous implementation in 4.5 is not sizing its
return buffer correctly.
- Arnd caught that my recent fix to the size of the pfn_t flags missed
updating the flags variable used in the pmem driver.
- Toshi found that we mishandle the memremap() return value in
devm_memremap().
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: use 'u64' for pfn flags
devm_memremap: Fix error value when memremap failed
nfit: update address range scrub commands to the acpi 6.1 format
libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing
nfit: fix multi-interface dimm handling, acpi6.1 compatibility
Linus Torvalds [Fri, 26 Feb 2016 02:42:08 +0000 (18:42 -0800)]
Merge tag 'for-v4.5-rc' of git://git./linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
"Add a regression fix for changed sysfs path of bq27xxx_battery and
update MAINTAINERS file"
* tag 'for-v4.5-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: bq27xxx_battery: Restore device name
MAINTAINERS: update bq27xxx driver
Dexuan Cui [Thu, 25 Feb 2016 09:58:12 +0000 (01:58 -0800)]
x86/mm: Fix slow_virt_to_phys() for X86_PAE again
"
d1cd12108346: x86, pageattr: Prevent overflow in slow_virt_to_phys() for
X86_PAE" was unintentionally removed by the recent "
34437e67a672: x86/mm: Fix
slow_virt_to_phys() to handle large PAT bit".
And, the variable 'phys_addr' was defined as "unsigned long" by mistake -- it should
be "phys_addr_t".
As a result, Hyper-V network driver in 32-PAE Linux guest can't work again.
Fixes: commit 34437e67a672: "x86/mm: Fix slow_virt_to_phys() to handle large PAT bit"
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Cc: olaf@aepfle.de
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: driverdev-devel@linuxdriverproject.org
Cc: linux-mm@kvack.org
Cc: apw@canonical.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Link: http://lkml.kernel.org/r/1456394292-9030-1-git-send-email-decui@microsoft.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Takashi Iwai [Thu, 25 Feb 2016 13:31:59 +0000 (14:31 +0100)]
ALSA: hda - Fix headset support and noise on HP EliteBook 755 G2
HP EliteBook 755 G2 with ALC3228 (ALC280) codec [103c:221c] requires
the known fixup (ALC269_FIXUP_HEADSET_MIC) for making the headset mic
working. Also, it suffers from the loopback noise problem, so we
should disable aamix path as well.
Reported-by: Derick Eddington <derick.eddington@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Paul Gortmaker [Fri, 19 Feb 2016 08:46:41 +0000 (09:46 +0100)]
rcu: Use simple wait queues where possible in rcutree
As of commit
dae6e64d2bcfd ("rcu: Introduce proper blocking to no-CBs kthreads
GP waits") the RCU subsystem started making use of wait queues.
Here we convert all additions of RCU wait queues to use simple wait queues,
since they don't need the extra overhead of the full wait queue features.
Originally this was done for RT kernels[1], since we would get things like...
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt
Pid: 8, comm: rcu_preempt Not tainted
Call Trace:
[<
ffffffff8106c8d0>] __might_sleep+0xd0/0xf0
[<
ffffffff817d77b4>] rt_spin_lock+0x24/0x50
[<
ffffffff8106fcf6>] __wake_up+0x36/0x70
[<
ffffffff810c4542>] rcu_gp_kthread+0x4d2/0x680
[<
ffffffff8105f910>] ? __init_waitqueue_head+0x50/0x50
[<
ffffffff810c4070>] ? rcu_gp_fqs+0x80/0x80
[<
ffffffff8105eabb>] kthread+0xdb/0xe0
[<
ffffffff8106b912>] ? finish_task_switch+0x52/0x100
[<
ffffffff817e0754>] kernel_thread_helper+0x4/0x10
[<
ffffffff8105e9e0>] ? __init_kthread_worker+0x60/0x60
[<
ffffffff817e0750>] ? gs_change+0xb/0xb
...and hence simple wait queues were deployed on RT out of necessity
(as simple wait uses a raw lock), but mainline might as well take
advantage of the more streamline support as well.
[1] This is a carry forward of work from v3.10-rt; the original conversion
was by Thomas on an earlier -rt version, and Sebastian extended it to
additional post-3.10 added RCU waiters; here I've added a commit log and
unified the RCU changes into one, and uprev'd it to match mainline RCU.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1455871601-27484-6-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Daniel Wagner [Fri, 19 Feb 2016 08:46:40 +0000 (09:46 +0100)]
rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock
rcu_nocb_gp_cleanup() is called while holding rnp->lock. Currently,
this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will
not enable the IRQs. lockdep is happy.
By switching over using swait this is not true anymore. swake_up_all()
enables the IRQs while processing the waiters. __do_softirq() can now
run and will eventually call rcu_process_callbacks() which wants to
grap nrp->lock.
Let's move the rcu_nocb_gp_cleanup() call outside the lock before we
switch over to swait.
If we would hold the rnp->lock and use swait, lockdep reports
following:
=================================
[ INFO: inconsistent lock state ]
4.2.0-rc5-00025-g9a73ba0 #136 Not tainted
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes:
(rcu_node_1){+.?...}, at: [<
ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0
{IN-SOFTIRQ-W} state was registered at:
[<
ffffffff81109b9f>] __lock_acquire+0xd5f/0x21e0
[<
ffffffff8110be0f>] lock_acquire+0xdf/0x2b0
[<
ffffffff81841cc9>] _raw_spin_lock_irqsave+0x59/0xa0
[<
ffffffff81136991>] rcu_process_callbacks+0x141/0x3c0
[<
ffffffff810b1a9d>] __do_softirq+0x14d/0x670
[<
ffffffff810b2214>] irq_exit+0x104/0x110
[<
ffffffff81844e96>] smp_apic_timer_interrupt+0x46/0x60
[<
ffffffff81842e70>] apic_timer_interrupt+0x70/0x80
[<
ffffffff810dba66>] rq_attach_root+0xa6/0x100
[<
ffffffff810dbc2d>] cpu_attach_domain+0x16d/0x650
[<
ffffffff810e4b42>] build_sched_domains+0x942/0xb00
[<
ffffffff821777c2>] sched_init_smp+0x509/0x5c1
[<
ffffffff821551e3>] kernel_init_freeable+0x172/0x28f
[<
ffffffff8182cdce>] kernel_init+0xe/0xe0
[<
ffffffff8184231f>] ret_from_fork+0x3f/0x70
irq event stamp: 76
hardirqs last enabled at (75): [<
ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60
hardirqs last disabled at (76): [<
ffffffff8184116f>] _raw_spin_lock_irq+0x1f/0x90
softirqs last enabled at (0): [<
ffffffff810a8df2>] copy_process.part.26+0x602/0x1cf0
softirqs last disabled at (0): [< (null)>] (null)
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(rcu_node_1);
<Interrupt>
lock(rcu_node_1);
*** DEADLOCK ***
1 lock held by rcu_preempt/8:
#0: (rcu_node_1){+.?...}, at: [<
ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0
stack backtrace:
CPU: 0 PID: 8 Comm: rcu_preempt Not tainted
4.2.0-rc5-00025-g9a73ba0 #136
Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014
0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0
0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b
0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f
Call Trace:
[<
ffffffff818379e0>] dump_stack+0x4f/0x7b
[<
ffffffff8110813b>] print_usage_bug+0x1db/0x1e0
[<
ffffffff8102fa4f>] ? save_stack_trace+0x2f/0x50
[<
ffffffff811087ad>] mark_lock+0x66d/0x6e0
[<
ffffffff81107790>] ? check_usage_forwards+0x150/0x150
[<
ffffffff81108898>] mark_held_locks+0x78/0xa0
[<
ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60
[<
ffffffff81108a28>] trace_hardirqs_on_caller+0x168/0x220
[<
ffffffff81108aed>] trace_hardirqs_on+0xd/0x10
[<
ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60
[<
ffffffff810fd1c7>] swake_up_all+0xb7/0xe0
[<
ffffffff811386e1>] rcu_gp_kthread+0xab1/0xeb0
[<
ffffffff811089bf>] ? trace_hardirqs_on_caller+0xff/0x220
[<
ffffffff81841341>] ? _raw_spin_unlock_irq+0x41/0x60
[<
ffffffff81137c30>] ? rcu_barrier+0x20/0x20
[<
ffffffff810d2014>] kthread+0x104/0x120
[<
ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60
[<
ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260
[<
ffffffff8184231f>] ret_from_fork+0x3f/0x70
[<
ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1455871601-27484-5-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Marcelo Tosatti [Fri, 19 Feb 2016 08:46:39 +0000 (09:46 +0100)]
KVM: Use simple waitqueue for vcpu->wq
The problem:
On -rt, an emulated LAPIC timer instances has the following path:
1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled
This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.
The solution:
Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.
Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.
cyclictest command line:
This patch reduces the average latency in my tests from 14us to 11us.
Daniel writes:
Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
benchmark on mainline. The test was run 1000 times on
tip/sched/core
4.4.0-rc8-01134-g0905f04:
./x86-run x86/tscdeadline_latency.flat -cpu host
with idle=poll.
The test seems not to deliver really stable numbers though most of
them are smaller. Paolo write:
"Anything above ~10000 cycles means that the host went to C1 or
lower---the number means more or less nothing in that case.
The mean shows an improvement indeed."
Before:
min max mean std
count 1000.000000 1000.000000 1000.000000 1000.000000
mean 5162.596000
2019270.084000 5824.491541 20681.645558
std 75.431231 622607.723969 89.575700 6492.272062
min 4466.000000 23928.000000 5537.926500 585.864966
25% 5163.000000
1613252.750000 5790.132275 16683.745433
50% 5175.000000
2281919.000000 5834.654000 23151.990026
75% 5190.000000
2382865.750000 5861.412950 24148.206168
max 5228.000000
4175158.000000 6254.827300 46481.048691
After
min max mean std
count 1000.000000 1000.00000 1000.000000 1000.000000
mean 5143.511000
2076886.10300 5813.312474 21207.357565
std 77.668322 610413.09583 86.541500 6331.915127
min 4427.000000 25103.00000 5529.756600 559.187707
25% 5148.000000
1691272.75000 5784.889825 17473.518244
50% 5160.000000
2308328.50000 5832.025000 23464.837068
75% 5172.000000
2393037.75000 5853.177675 24223.969976
max 5222.000000
3922458.00000 6186.720500 42520.379830
[Patch was originaly based on the swait implementation found in the -rt
tree. Daniel ported it to mainline's version and gathered the
benchmark numbers for tscdeadline_latency test.]
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Daniel Wagner [Fri, 19 Feb 2016 08:46:38 +0000 (09:46 +0100)]
kbuild: Add option to turn incompatible pointer check into error
With the introduction of the simple wait API we have two very
similar APIs in the kernel. For example wake_up() and swake_up()
is only one character away. Although the compiler will warn
happily the wrong usage it keeps on going an even links the kernel.
Thomas and Peter would rather like to see early missuses reported
as error early on.
In a first attempt we tried to wrap all swait and wait calls
into a macro which has an compile time type assertion. The result
was pretty ugly and wasn't able to catch all wrong usages.
woken_wake_function(), autoremove_wake_function() and wake_bit_function()
are assigned as function pointers. Wrapping them with a macro around is
not possible. Prefixing them with '_' was also not a real option
because there some users in the kernel which do use them as well.
All in all this attempt looked to intrusive and too ugly.
An alternative is to turn the pointer type check into an error which
catches wrong type uses. Obviously not only the swait/wait ones. That
isn't a bad thing either.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1455871601-27484-3-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra (Intel) [Fri, 19 Feb 2016 08:46:37 +0000 (09:46 +0100)]
wait.[ch]: Introduce the simple waitqueue (swait) implementation
The existing wait queue support has support for custom wake up call
backs, wake flags, wake key (passed to call back) and exclusive
flags that allow wakers to be tagged as exclusive, for limiting
the number of wakers.
In a lot of cases, none of these features are used, and hence we
can benefit from a slimmed down version that lowers memory overhead
and reduces runtime overhead.
The concept originated from -rt, where waitqueues are a constant
source of trouble, as we can't convert the head lock to a raw
spinlock due to fancy and long lasting callbacks.
With the removal of custom callbacks, we can use a raw lock for
queue list manipulations, hence allowing the simple wait support
to be used in -rt.
[Patch is from PeterZ which is based on Thomas version. Commit message is
written by Paul G.
Daniel: - Fixed some compile issues
- Added non-lazy implementation of swake_up_locked as suggested
by Boqun Feng.]
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1455871601-27484-2-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
David Henningsson [Thu, 25 Feb 2016 08:37:05 +0000 (09:37 +0100)]
ALSA: hda - Fixup speaker pass-through control for nid 0x14 on ALC225
On one of the machines we enable, we found that the actual speaker volume
did not always correspond to the volume set in alsamixer. This patch
fixes that problem.
This patch was orginally written by Kailang @ Realtek, I've rebased it
to fit sound git master.
Cc: stable@vger.kernel.org
BugLink: https://bugs.launchpad.net/bugs/1549660
Co-Authored-By: Kailang <kailang@realtek.com>
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Paolo Bonzini [Thu, 25 Feb 2016 08:53:55 +0000 (09:53 +0100)]
Merge tag 'kvm-arm-for-4.5-rc6' of git://git./linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/ARM fixes for 4.5-rc6
- Fix per-vcpu vgic bitmap allocation
- Do not give copy random memory on MMIO read
- Fix GICv3 APR register restore order
Mike Krinkin [Wed, 24 Feb 2016 18:02:31 +0000 (21:02 +0300)]
KVM: x86: MMU: fix ubsan index-out-of-range warning
Ubsan reports the following warning due to a typo in
update_accessed_dirty_bits template, the patch fixes
the typo:
[ 168.791851] ================================================================================
[ 168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15
[ 168.791866] index 4 is out of range for type 'u64 [4]'
[ 168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G O L 4.5.0-rc5-next-
20160222 #7
[ 168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013
[ 168.791876]
0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3
[ 168.791882]
ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0
[ 168.791886]
0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600
[ 168.791891] Call Trace:
[ 168.791899] [<
ffffffff81c9f780>] dump_stack+0xcc/0x12c
[ 168.791904] [<
ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4
[ 168.791910] [<
ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a
[ 168.791914] [<
ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3
[ 168.791918] [<
ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd
[ 168.791922] [<
ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360
[ 168.791954] [<
ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm]
[ 168.791958] [<
ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360
[ 168.791987] [<
ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm]
[ 168.792014] [<
ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
[ 168.792019] [<
ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
[ 168.792044] [<
ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
[ 168.792076] [<
ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm]
[ 168.792121] [<
ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm]
[ 168.792130] [<
ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[ 168.792178] [<
ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm]
[ 168.792208] [<
ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm]
[ 168.792234] [<
ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm]
[ 168.792238] [<
ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[ 168.792263] [<
ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm]
[ 168.792290] [<
ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm]
[ 168.792314] [<
ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm]
[ 168.792340] [<
ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm]
[ 168.792367] [<
ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm]
[ 168.792374] [<
ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel]
[ 168.792400] [<
ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm]
[ 168.792424] [<
ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm]
[ 168.792449] [<
ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm]
[ 168.792474] [<
ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm]
[ 168.792499] [<
ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm]
[ 168.792524] [<
ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm]
[ 168.792532] [<
ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel]
[ 168.792539] [<
ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel]
[ 168.792546] [<
ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel]
[ 168.792572] [<
ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
[ 168.792597] [<
ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm]
[ 168.792621] [<
ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
[ 168.792627] [<
ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630
[ 168.792651] [<
ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm]
[ 168.792656] [<
ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190
[ 168.792681] [<
ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm]
[ 168.792704] [<
ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm]
[ 168.792727] [<
ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm]
[ 168.792732] [<
ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
[ 168.792735] [<
ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40
[ 168.792740] [<
ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40
[ 168.792744] [<
ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0
[ 168.792747] [<
ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10
[ 168.792751] [<
ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
[ 168.792756] [<
ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0
[ 168.792759] [<
ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210
[ 168.792763] [<
ffffffff8174aef3>] ? __fget+0x273/0x4a0
[ 168.792766] [<
ffffffff8174acd0>] ? __fget+0x50/0x4a0
[ 168.792770] [<
ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0
[ 168.792773] [<
ffffffff81726bf9>] SyS_ioctl+0x79/0x90
[ 168.792777] [<
ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1
[ 168.792780] ================================================================================
Signed-off-by: Mike Krinkin <krinkin.m.u@gmail.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Kai-Heng Feng [Thu, 25 Feb 2016 07:19:38 +0000 (15:19 +0800)]
ALSA: hda - Fixing background noise on Dell Inspiron 3162
After login to the desktop on Dell Inspiron 3162,
there's a very loud background noise comes from the builtin speaker.
The noise does not go away even if the speaker is muted.
The noise disappears after using the aamix fixup.
Codec: Realtek ALC3234
Address: 0
AFG Function Id: 0x1 (unsol 1)
Vendor Id: 0x10ec0255
Subsystem Id: 0x10280725
Revision Id: 0x100002
No Modem Function Group found
BugLink: http://bugs.launchpad.net/bugs/1549620
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:51 +0000 (18:45 +0100)]
perf: Robustify task_function_call()
Since there is no serialization between task_function_call() doing
task_curr() and the other CPU doing context switches, we could end
up not sending an IPI even if we had to.
And I'm not sure I still buy my own argument we're OK.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174948.340031200@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:50 +0000 (18:45 +0100)]
perf: Fix scaling vs. perf_install_in_context()
Completely reworks perf_install_in_context() (again!) in order to
ensure that there will be no ctx time hole between add_event_to_ctx()
and any potential ctx_sched_in().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174948.279399438@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:49 +0000 (18:45 +0100)]
perf: Fix scaling vs. perf_event_enable()
Similar to the perf_enable_on_exec(), ensure that event timings are
consistent across perf_event_enable().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174948.218288698@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:48 +0000 (18:45 +0100)]
perf: Fix scaling vs. perf_event_enable_on_exec()
The recent commit
3e349507d12d ("perf: Fix perf_enable_on_exec() event
scheduling") caused this by moving task_ctx_sched_out() from before
__perf_event_mask_enable() to after it.
The overlooked consequence of that change is that task_ctx_sched_out()
would update the ctx time fields, and now __perf_event_mask_enable()
uses stale time.
In order to fix this, explicitly stop our context's time before
enabling the event(s).
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Fixes: 3e349507d12d ("perf: Fix perf_enable_on_exec() event scheduling")
Link: http://lkml.kernel.org/r/20160224174948.159242158@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:47 +0000 (18:45 +0100)]
perf: Fix ctx time tracking by introducing EVENT_TIME
Currently any ctx_sched_in() call will re-start the ctx time tracking,
this means that calls like:
ctx_sched_in(.event_type = EVENT_PINNED);
ctx_sched_in(.event_type = EVENT_FLEXIBLE);
will have a hole in their ctx time tracking. This is likely harmless
but can confuse things a little. By adding EVENT_TIME, we can have the
first ctx_sched_in() (is_active: 0 -> !0) start the time and any
further ctx_sched_in() will leave the timestamps alone.
Secondly, this allows for an early disable like:
ctx_sched_out(.event_type = EVENT_TIME);
which would update the ctx time (if the ctx is active) and any further
calls to ctx_sched_out() would not further modify the ctx time.
For ctx_sched_in() any 0 -> !0 transition will automatically include
EVENT_TIME.
For ctx_sched_out(), any transition that clears EVENT_ALL will
automatically clear EVENT_TIME.
These two rules ensure that under normal circumstances we need not
bother with EVENT_TIME and get natural ctx time behaviour.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174948.100446561@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:46 +0000 (18:45 +0100)]
perf: Cure event->pending_disable race
Because event_sched_out() checks event->pending_disable _before_
actually disabling the event, it can happen that the event fires after
it checks but before it gets disabled.
This would leave event->pending_disable set and the queued irq_work
will try and process it.
However, if the event trigger was during schedule(), the event might
have been de-scheduled by the time the irq_work runs, and
perf_event_disable_local() will fail.
Fix this by checking event->pending_disable _after_ we call
event->pmu->del(). This depends on the latter being a compiler
barrier, such that the compiler does not lift the load and re-creates
the problem.
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174948.040469884@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:45 +0000 (18:45 +0100)]
perf: Fix race between event install and jump_labels
perf_install_in_context() relies upon the context switch hooks to have
scheduled in events when the IPI misses its target -- after all, if
the task has moved from the CPU (or wasn't running at all), it will
have to context switch to run elsewhere.
This however doesn't appear to be happening.
It is possible for the IPI to not happen (task wasn't running) only to
later observe the task running with an inactive context.
The only possible explanation is that the context switch hooks are not
called. Therefore put in a sync_sched() after toggling the jump_label
to guarantee all CPUs will have them enabled before we install an
event.
A simple if (0->1) sync_sched() will not in fact work, because any
further increment can race and complete before the sync_sched().
Therefore we must jump through some hoops.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174947.980211985@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:44 +0000 (18:45 +0100)]
perf: Fix cloning
Alexander reported that when the 'original' context gets destroyed, no
new clones happen.
This can happen irrespective of the ctx switch optimization, any task
can die, even the parent, and we want to continue monitoring the task
hierarchy until we either close the event or no tasks are left in the
hierarchy.
perf_event_init_context() will attempt to pin the 'parent' context
during clone(). At that point current is the parent, and since current
cannot have exited while executing clone(), its context cannot have
passed through perf_event_exit_task_context(). Therefore
perf_pin_task_context() cannot observe ctx->task == TASK_TOMBSTONE.
However, since inherit_event() does:
if (parent_event->parent)
parent_event = parent_event->parent;
it looks at the 'original' event when it does: is_orphaned_event().
This can return true if the context that contains the this event has
passed through perf_event_exit_task_context(). And thus we'll fail to
clone the perf context.
Fix this by adding a new state: STATE_DEAD, which is set by
perf_release() to indicate that the filedesc (or kernel reference) is
dead and there are no observers for our data left.
Only for STATE_DEAD will is_orphaned_event() be true and inhibit
cloning.
STATE_EXIT is otherwise preserved such that is_event_hup() remains
functional and will report when the observed task hierarchy becomes
empty.
Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events")
Link: http://lkml.kernel.org/r/20160224174947.919845295@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:43 +0000 (18:45 +0100)]
perf: Only update context time when active
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174947.860690919@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:42 +0000 (18:45 +0100)]
perf: Allow perf_release() with !event->ctx
In the err_file: fput(event_file) case, the event will not yet have
been attached to a context. However perf_release() does assume it has
been. Cure this.
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174947.793996260@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:41 +0000 (18:45 +0100)]
perf: Do not double free
In case of: err_file: fput(event_file), we'll end up calling
perf_release() which in turn will free the event.
Do not then free the event _again_.
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174947.697350349@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 24 Feb 2016 17:45:40 +0000 (18:45 +0100)]
perf: Close install vs. exit race
Consider the following scenario:
CPU0 CPU1
ctx = find_get_ctx();
perf_event_exit_task_context()
mutex_lock(&ctx->mutex);
perf_install_in_context(ctx, ...);
/* NO-OP */
mutex_unlock(&ctx->mutex);
...
perf_release()
WARN_ON_ONCE(event->state != STATE_EXIT);
Since the event doesn't pass through perf_remove_from_context()
because perf_install_in_context() NO-OPs because the ctx is dead, and
perf_event_exit_task_context() will not observe the event because its
not attached yet, the event->state will not be set.
Solve this by revalidating ctx->task after we acquire ctx->mutex and
failing the event creation as a whole.
Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174947.626853419@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andy Lutomirski [Wed, 24 Feb 2016 20:18:49 +0000 (12:18 -0800)]
x86/entry/compat: Add missing CLAC to entry_INT80_32
This doesn't seem to fix a regression -- I don't think the CLAC was
ever there.
I double-checked in a debugger: entries through the int80 gate do
not automatically clear AC.
Stable maintainers: I can provide a backport to 4.3 and earlier if
needed. This needs to be backported all the way to 3.10.
Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # v3.10 and later
Fixes: 63bcff2a307b ("x86, smap: Add STAC and CLAC instructions to control user space access")
Link: http://lkml.kernel.org/r/b02b7e71ae54074be01fc171cbd4b72517055c0e.1456345086.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Thu, 25 Feb 2016 03:17:50 +0000 (13:17 +1000)]
Merge branch 'linux-4.5' of git://github.com/skeggsb/linux into drm-fixes
single for for eDP panel issues on Lenovo P50
* 'linux-4.5' of git://github.com/skeggsb/linux:
drm/nouveau/disp/dp: ensure sink is powered up before attempting link training
Ben Skeggs [Wed, 17 Feb 2016 22:14:19 +0000 (08:14 +1000)]
drm/nouveau/disp/dp: ensure sink is powered up before attempting link training
This can happen under some annoying circumstances, and is a quick fix
until more substantial changes can be made.
Fixed eDP mode changes on (at least) the Lenovo P50.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Cc: stable@vger.kernel.org
Thierry Reding [Wed, 24 Feb 2016 17:34:43 +0000 (18:34 +0100)]
drm/nouveau: platform: Fix deferred probe
The error cleanup paths aren't quite correct and will crash upon
deferred probe.
Cc: stable@vger.kernel.org # v4.3+
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Geert Uytterhoeven [Wed, 24 Feb 2016 08:43:23 +0000 (09:43 +0100)]
drivers: sh: Restore legacy clock domain on SuperH platforms
CONFIG_ARCH_SHMOBILE is not only enabled for Renesas ARM platforms
(which are DT based and multi-platform), but also on a select set of
Renesas SuperH platforms (SH7722/SH7723/SH7724/SH7343/SH7366). Hence
since commit
0ba58de231066e47 ("drivers: sh: Get rid of
CONFIG_ARCH_SHMOBILE_MULTI"), the legacy clock domain is no longer
installed on these SuperH platforms, and module clocks may not be
enabled when needed, leading to driver failures.
To fix this, add an additional check for CONFIG_OF.
Fixes: 0ba58de231066e47 ("drivers: sh: Get rid of CONFIG_ARCH_SHMOBILE_MULTI").
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Dave Airlie [Wed, 24 Feb 2016 22:22:43 +0000 (08:22 +1000)]
Merge tag 'drm-intel-fixes-2016-02-22' of git://anongit.freedesktop.org/drm-intel into drm-fixes
This is a bit large, but it really helps Skylake bugs we are seeing
on a number of laptops.
Most of the commits are quite similar, ensuring the display power
doesn't vanish under us during hardware access. Also do note that it's
not just Skylake that's affected.
* tag 'drm-intel-fixes-2016-02-22' of git://anongit.freedesktop.org/drm-intel:
drm/i915/gen9: Verify and enforce dc6 state writes
drm/i915/gen9: Check for DC state mismatch
drm/i915/skl: Ensure HW is powered during DDB HW state readout
drm/i915/lvds: Ensure the HW is powered during HW state readout
drm/i915/hdmi: Ensure the HW is powered during HW state readout
drm/i915/dsi: Ensure the HW is powered during HW state readout
drm/i915/dp: Ensure the HW is powered during HW state readout
drm/i915: Ensure the HW is powered when accessing the CRC HW block
drm/i915/ddi: Ensure the HW is powered during HW state readout
drm/i915/crt: Ensure the HW is powered during HW state readout
drm/i915: Ensure the HW is powered during HW access in assert_pipe
drm/i915: Ensure the HW is powered when disabling VGA
drm/i915/ibx: Ensure the HW is powered during PLL HW readout
drm/i915: Ensure the HW is powered during display pipe HW readout
drm/i915: Add helper to get a display power ref if it was already enabled
Dave Airlie [Wed, 24 Feb 2016 22:21:33 +0000 (08:21 +1000)]
Merge branch 'drm-fixes-4.5' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A few radeon and amdgpu fixes for 4.5. A few further fixes for the vblank
regressions in 4.4 and a couple of other minor fixes.
* 'drm-fixes-4.5' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: disable direct VM updates when vm_debug is set
amdgpu: fix NULL pointer dereference at tonga_check_states_equal
drm/radeon/pm: adjust display configuration after powerstate
drm/amdgpu/pm: adjust display configuration after powerstate
drm/amdgpu/pm: add some checks for PX
drm/amdgpu: fix locking in force performance level
drm/amdgpu/gfx8: fix priv reg interrupt enable
drm/amdgpu: Don't hang in amdgpu_flip_work_func on disabled crtc.
drm/radeon: Don't hang in radeon_flip_work_func on disabled crtc. (v2)
Linus Torvalds [Wed, 24 Feb 2016 22:06:17 +0000 (14:06 -0800)]
Merge tag 'arc-4.5-rc6-fixes-upd' of git://git./linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- Fix for csd deadlock due to missing self IPI
- Accompanying IPI cleanups / optimization
- Brown paper bag bug in one of the cleanups above
- Boot reporting updates for new hardware features
- Don't force DEVTMPFS if INITRAMFS
* tag 'arc-4.5-rc6-fixes-upd' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
arc: SMP: CONFIG_ARC_IPI_DBG cleanup
ARC: SMP: No need for CONFIG_ARC_IPI_DBG
ARCv2: Elide sending new cross core intr if receiver didn't ack prev
ARCv2: SMP: Push IPI_IRQ into IPI provider
ARC: [intc-compact] Remove IPI setup from ARCompact port
ARCv2: SMP: Emulate IPI to self using software triggered interrupt
arc: get rid of DEVTMPFS dependency on INITRAMFS_SOURCE
ARCv2: boot report CCMs (Closely Coupled Memories)
ARCv2: boot print Low Latency Memory
ARC: Assume multiplier is always present
Linus Torvalds [Wed, 24 Feb 2016 22:00:26 +0000 (14:00 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"Assorted fixes - xattr one from this cycle, the rest - stable fodder"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/pnode.c: treat zero mnt_group_id-s as unequal
affs_do_readpage_ofs(): just use kmap_atomic() around memcpy()
xattr handlers: plug a lock leak in simple_xattr_list
fs: allow no_seek_end_llseek to actually seek
Ilya Dryomov [Fri, 19 Feb 2016 10:38:57 +0000 (11:38 +0100)]
libceph: don't spam dmesg with stray reply warnings
Commit
d15f9d694b77 ("libceph: check data_len in ->alloc_msg()")
mistakenly bumped the log level on the "tid %llu unknown, skipping"
message. Turn it back into a dout() - stray replies are perfectly
normal when OSDs flap, crash, get killed for testing purposes, etc.
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Ilya Dryomov [Fri, 19 Feb 2016 10:38:57 +0000 (11:38 +0100)]
libceph: use the right footer size when skipping a message
ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13.
Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated.
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Ilya Dryomov [Wed, 17 Feb 2016 19:04:08 +0000 (20:04 +0100)]
libceph: don't bail early from try_read() when skipping a message
The contract between try_read() and try_write() is that when called
each processes as much data as possible. When instructed by osd_client
to skip a message, try_read() is violating this contract by returning
after receiving and discarding a single message instead of checking for
more. try_write() then gets a chance to write out more requests,
generating more replies/skips for try_read() to handle, forcing the
messenger into a starvation loop.
Cc: stable@vger.kernel.org # 3.10+
Reported-by: Varada Kari <Varada.Kari@sandisk.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Varada Kari <Varada.Kari@sandisk.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Kirill A. Shutemov [Wed, 24 Feb 2016 15:58:03 +0000 (18:58 +0300)]
thp: call pmdp_invalidate() with correct virtual address
Sebastian Ott and Gerald Schaefer reported random crashes on s390.
It was bisected to my THP refcounting patchset.
The problem is that pmdp_invalidated() called with wrong virtual
address. It got offset up by HPAGE_PMD_SIZE by loop over ptes.
The solution is to introduce new variable to be used in loop and don't
touch 'haddr'.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-and-tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reported-and-tested-by Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christian König [Fri, 19 Feb 2016 09:03:03 +0000 (10:03 +0100)]
drm/amdgpu: disable direct VM updates when vm_debug is set
That should make user space bugs more obvious.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Bradley Pankow [Tue, 23 Feb 2016 01:11:47 +0000 (20:11 -0500)]
amdgpu: fix NULL pointer dereference at tonga_check_states_equal
The event_data passed from pem_fini was not cleared upon initialization.
This caused NULL checks to pass and cast_const_phw_tonga_power_state to
attempt to dereference an invalid pointer. Clear the event_data in
pem_init and pem_fini before calling pem_handle_event.
Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Bradley Pankow <btpankow@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Marc Zyngier [Wed, 17 Feb 2016 10:25:05 +0000 (10:25 +0000)]
arm64: KVM: vgic-v3: Restore ICH_APR0Rn_EL2 before ICH_APR1Rn_EL2
The GICv3 architecture spec says:
Writing to the active priority registers in any order other than
the following order will result in UNPREDICTABLE behavior:
- ICH_AP0R<n>_EL2.
- ICH_AP1R<n>_EL2.
So let's not pointlessly go against the rule...
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Greg Kroah-Hartman [Wed, 24 Feb 2016 17:04:21 +0000 (09:04 -0800)]
Merge tag 'fixes-for-v4.5-rc6' of git./linux/kernel/git/balbi/usb into usb-linus
Felipe writes:
usb: fixes for v4.5-rc6
The most important fixes here are:
a) yet another fix to dwc3's EP transfer resource
assignment logic. This time around we will be
pre-allocating transfer resources to avoid any
future issues;
b) two DMA fixes for the old MUSB driver.
c) dwc2's data toggle fix for FS
Other than these, we have a few other minor fixes
elsewhere.
Olof Johansson [Wed, 24 Feb 2016 16:48:22 +0000 (08:48 -0800)]
Merge tag 'renesas-soc-fixes-for-v4.5' of git://git./linux/kernel/git/horms/renesas into fixes
Renesas ARM Based SoC Fixes for v4.5
* Avoid writing to .text
* tag 'renesas-soc-fixes-for-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: shmobile: Remove shmobile_boot_arg
ARM: shmobile: Move shmobile_smp_{mpidr, fn, arg}[] from .text to .bss
ARM: shmobile: r8a7779: Remove remainings of removed SCU boot setup code
ARM: shmobile: Move shmobile_scu_base from .text to .bss
Signed-off-by: Olof Johansson <olof@lixom.net>
Steven Rostedt (Red Hat) [Wed, 24 Feb 2016 14:04:24 +0000 (09:04 -0500)]
tracing: Fix showing function event in available_events
The ftrace:function event is only displayed for parsing the function tracer
data. It is not used to enable function tracing, and does not include an
"enable" file in its event directory.
Originally, this event was kept separate from other events because it did
not have a ->reg parameter. But perf added a "reg" parameter for its use
which caused issues, because it made the event available to functions where
it was not compatible for.
Commit
9b63776fa3ca9 "tracing: Do not enable function event with enable"
added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event
from being enabled by normal trace events. But this commit missed keeping
the function event from being displayed by the "available_events" directory,
which is used to show what events can be enabled by set_event.
One documented way to enable all events is to:
cat available_events > set_event
But because the function event is displayed in the available_events, this
now causes an INVALID error:
cat: write error: Invalid argument
Reported-by: Chunyu Hu <chuhu@redhat.com>
Fixes: 9b63776fa3ca9 "tracing: Do not enable function event with enable"
Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Christian Borntraeger [Fri, 19 Feb 2016 12:11:46 +0000 (13:11 +0100)]
KVM: async_pf: do not warn on page allocation failures
In async_pf we try to allocate with NOWAIT to get an element quickly
or fail. This code also handle failures gracefully. Lets silence
potential page allocation failures under load.
qemu-system-s39: page allocation failure: order:0,mode:0x2200000
[...]
Call Trace:
([<
00000000001146b8>] show_trace+0xf8/0x148)
[<
000000000011476a>] show_stack+0x62/0xe8
[<
00000000004a36b8>] dump_stack+0x70/0x98
[<
0000000000272c3a>] warn_alloc_failed+0xd2/0x148
[<
000000000027709e>] __alloc_pages_nodemask+0x94e/0xb38
[<
00000000002cd36a>] new_slab+0x382/0x400
[<
00000000002cf7ac>] ___slab_alloc.constprop.30+0x2dc/0x378
[<
00000000002d03d0>] kmem_cache_alloc+0x160/0x1d0
[<
0000000000133db4>] kvm_setup_async_pf+0x6c/0x198
[<
000000000013dee8>] kvm_arch_vcpu_ioctl_run+0xd48/0xd58
[<
000000000012fcaa>] kvm_vcpu_ioctl+0x372/0x690
[<
00000000002f66f6>] do_vfs_ioctl+0x3be/0x510
[<
00000000002f68ec>] SyS_ioctl+0xa4/0xb8
[<
0000000000781c5e>] system_call+0xd6/0x264
[<
000003ffa24fa06a>] 0x3ffa24fa06a
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 19 Feb 2016 17:07:21 +0000 (18:07 +0100)]
KVM: x86: fix conversion of addresses to linear in 32-bit protected mode
Commit
e8dd2d2d641c ("Silence compiler warning in arch/x86/kvm/emulate.c",
2015-09-06) broke boot of the Hurd. The bug is that the "default:"
case actually could modify "la", but after the patch this change is
not reflected in *linear.
The bug is visible whenever a non-zero segment base causes the linear
address to wrap around the 4GB mark.
Fixes: e8dd2d2d641cb2724ee10e76c0ad02e04289c017
Cc: stable@vger.kernel.org
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 10 Feb 2016 16:50:23 +0000 (17:50 +0100)]
KVM: x86: fix missed hardware breakpoints
Sometimes when setting a breakpoint a process doesn't stop on it.
This is because the debug registers are not loaded correctly on
VCPU load.
The following simple reproducer from Oleg Nesterov tries using debug
registers in two threads. To see the bug, run a 2-VCPU guest with
"taskset -c 0" and run "./bp 0 1" inside the guest.
#include <unistd.h>
#include <signal.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <sys/user.h>
#include <asm/debugreg.h>
#include <assert.h>
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
{
unsigned long dr7;
dr7 = ((len | type) & 0xf)
<< (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
if (enable)
dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
return dr7;
}
int write_dr(int pid, int dr, unsigned long val)
{
return ptrace(PTRACE_POKEUSER, pid,
offsetof (struct user, u_debugreg[dr]),
val);
}
void set_bp(pid_t pid, void *addr)
{
unsigned long dr7;
assert(write_dr(pid, 0, (long)addr) == 0);
dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
assert(write_dr(pid, 7, dr7) == 0);
}
void *get_rip(int pid)
{
return (void*)ptrace(PTRACE_PEEKUSER, pid,
offsetof(struct user, regs.rip), 0);
}
void test(int nr)
{
void *bp_addr = &&label + nr, *bp_hit;
int pid;
printf("test bp %d\n", nr);
assert(nr < 16); // see 16 asm nops below
pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
kill(getpid(), SIGSTOP);
for (;;) {
label: asm (
"nop; nop; nop; nop;"
"nop; nop; nop; nop;"
"nop; nop; nop; nop;"
"nop; nop; nop; nop;"
);
}
}
assert(pid == wait(NULL));
set_bp(pid, bp_addr);
for (;;) {
assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
assert(pid == wait(NULL));
bp_hit = get_rip(pid);
if (bp_hit != bp_addr)
fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
bp_hit - &&label, nr);
}
}
int main(int argc, const char *argv[])
{
while (--argc) {
int nr = atoi(*++argv);
if (!fork())
test(nr);
}
while (wait(NULL) > 0)
;
return 0;
}
Cc: stable@vger.kernel.org
Suggested-by: Nadav Amit <namit@cs.technion.ac.il>
Reported-by: Andrey Wagin <avagin@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>