Linus Torvalds [Fri, 26 Jul 2019 17:32:12 +0000 (10:32 -0700)]
Merge tag 'for-linus-
20190726' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Several io_uring fixes/improvements:
- Blocking fix for O_DIRECT (me)
- Latter page slowness for registered buffers (me)
- Fix poll hang under certain conditions (me)
- Defer sequence check fix for wrapped rings (Zhengyuan)
- Mismatch in async inc/dec accounting (Zhengyuan)
- Memory ordering issue that could cause stall (Zhengyuan)
- Track sequential defer in bytes, not pages (Zhengyuan)
- NVMe pull request from Christoph
- Set of hang fixes for wbt (Josef)
- Redundant error message kill for libahci (Ding)
- Remove unused blk_mq_sched_started_request() and related ops (Marcos)
- drbd dynamic alloc shash descriptor to reduce stack use (Arnd)
- blkcg ->pd_stat() non-debug print (Tejun)
- bcache memory leak fix (Wei)
- Comment fix (Akinobu)
- BFQ perf regression fix (Paolo)
* tag 'for-linus-
20190726' of git://git.kernel.dk/linux-block: (24 commits)
io_uring: ensure ->list is initialized for poll commands
Revert "nvme-pci: don't create a read hctx mapping without read queues"
nvme: fix multipath crash when ANA is deactivated
nvme: fix memory leak caused by incorrect subsystem free
nvme: ignore subnqn for ADATA SX6000LNP
drbd: dynamically allocate shash descriptor
block: blk-mq: Remove blk_mq_sched_started_request and started_request
bcache: fix possible memory leak in bch_cached_dev_run()
io_uring: track io length in async_list based on bytes
io_uring: don't use iov_iter_advance() for fixed buffers
block: properly handle IOCB_NOWAIT for async O_DIRECT IO
blk-mq: allow REQ_NOWAIT to return an error inline
io_uring: add a memory barrier before atomic_read
rq-qos: use a mb for got_token
rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule
rq-qos: don't reset has_sleepers on spurious wakeups
rq-qos: fix missed wake-ups in rq_qos_throttle
wait: add wq_has_single_sleeper helper
block, bfq: check also in-flight I/O in dispatch plugging
block: fix sysfs module parameters directory path in comment
...
Linus Torvalds [Fri, 26 Jul 2019 17:23:45 +0000 (10:23 -0700)]
Merge tag 'sound-5.3-rc2' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All relatively small changes:
- a regression fix for PCM link code with CONFIG_REFCOUNT_FULL;
stumbled on a slight difference between atomic_t and refcount_t
- a couple of HD-audio stabilization patches addressing the too slow
PM resume seen on some Intel chips
- a series of ALSA compress-offload API fixes, including the
regression by the previous capture stream support
- trivial LINE6 USB-audio driver fixes, a new Conexant HD-audio chip
coverage, and a fix in AC97 bus error path"
* tag 'sound-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add a conexant codec entry to let mute led work
ALSA: hda - Fix intermittent CORB/RIRB stall on Intel chips
ALSA: ac97: Fix double free of ac97_codec_device
ALSA: compress: Be more restrictive about when a drain is allowed
ALSA: compress: Don't allow paritial drain operations on capture streams
ALSA: compress: Prevent bypasses of set_params
ALSA: compress: Fix regression on compressed capture streams
ALSA: line6: Fix a typo
ALSA: pcm: Fix refcount_inc() on zero usage
ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1
ALSA: hda - Optimize resume for codecs without jack detection
Linus Torvalds [Fri, 26 Jul 2019 17:04:19 +0000 (10:04 -0700)]
Merge tag 'iommu-fixes-v5.3-rc1' of git://git./linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- revert an Intel VT-d patch that caused boot problems on some machines
- fix AMD IOMMU interrupts with x2apic enabled
- fix a potential crash when Intel VT-d domain allocation fails
- fix crash in Intel VT-d driver when accessing a domain without a
flush queue
- formatting fix for new Intel VT-d debugfs code
- fix for use-after-free bug in IOVA code
- fix for a NULL-pointer dereference in Intel VT-d driver when PCI
hotplug is used
- compilation fix for one of the previous fixes
* tag 'iommu-fixes-v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Add support for X2APIC IOMMU interrupts
iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
iommu/vt-d: Print pasid table entries MSB to LSB in debugfs
iommu/iova: Remove stale cached32_node
iommu/vt-d: Check if domain->pgd was allocated
iommu/vt-d: Don't queue_iova() if there is no flush queue
iommu/vt-d: Avoid duplicated pci dma alias consideration
Revert "iommu/vt-d: Consolidate domain_init() to avoid duplication"
Linus Torvalds [Fri, 26 Jul 2019 16:43:43 +0000 (09:43 -0700)]
Merge branch 'for-linus-5.3' of git://git./linux/kernel/git/konrad/ibft
Pull iscsi_ibft fix from Konrad Rzeszutek Wilk:
"One tiny fix to enable iSCSI IBFT to be compiled under ARM"
* 'for-linus-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft:
iscsi_ibft: make ISCSI_IBFT depend on ACPI instead of ISCSI_IBFT_FIND
Linus Torvalds [Fri, 26 Jul 2019 16:36:01 +0000 (09:36 -0700)]
Merge tag 'hwmon-for-v5.3-rc2' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"A couple of hwmon bug fixes:
- Update k8temp documentation URL
- Register address fixes in nct6775 driver
- Fix potential division by zero in occ driver"
* tag 'hwmon-for-v5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (k8temp) documentation: update URL of datasheet
hwmon: (nct6775) Fix register address and added missed tolerance for nct6106
hwmon: (occ) Fix division by zero issue
Jens Axboe [Thu, 25 Jul 2019 16:23:15 +0000 (10:23 -0600)]
Merge branch 'nvme-5.3' of git://git.infradead.org/nvme into for-linus
Pull NVMe fixes from Christoph.
* 'nvme-5.3' of git://git.infradead.org/nvme:
Revert "nvme-pci: don't create a read hctx mapping without read queues"
nvme: fix multipath crash when ANA is deactivated
nvme: fix memory leak caused by incorrect subsystem free
nvme: ignore subnqn for ADATA SX6000LNP
Jens Axboe [Thu, 25 Jul 2019 16:20:18 +0000 (10:20 -0600)]
io_uring: ensure ->list is initialized for poll commands
Daniel reports that when testing an http server that uses io_uring
to poll for incoming connections, sometimes it hard crashes. This is
due to an uninitialized list member for the io_uring request. Normally
this doesn't trigger and none of the test cases caught it.
Reported-by: Daniel Kozak <kozzi11@gmail.com>
Tested-by: Daniel Kozak <kozzi11@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 25 Jul 2019 16:07:32 +0000 (09:07 -0700)]
Merge tag 'pm-5.3-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki
"These fix two issues related to the RAPL MMIO interface support added
recently and one cpufreq driver issue.
Specifics:
- Initialize the power capping subsystem and the RAPL driver earlier
in case the int340X thermal driver is built-in and attempts to
register an MMIO interface for RAPL which must not happen before
the requisite infrastructure is ready (Zhang Rui)
- Fix the int340X thermal driver's RAPL MMIO interface registration
error path (Rafael Wysocki)
- Fix possible use-after-free in the pasemi cpufreq driver (Wen
Yang)"
* tag 'pm-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()
int340X/processor_thermal_device: Fix proc_thermal_rapl_remove()
powercap: Invoke powercap_init() and rapl_init() earlier
Linus Torvalds [Thu, 25 Jul 2019 16:02:34 +0000 (09:02 -0700)]
Merge tag 'riscv/for-v5.3-rc2' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
"Four minor RISC-V-related changes:
- Add support for the new clone3 syscall for RV64, relying on the
generic support
- Add DT data for the gigabit Ethernet controller on the SiFive FU540
and the HiFive Unleashed board
- Update MAINTAINERS to add me to the arch/riscv maintainers' list
- Add support for PCIe message-signaled interrupts by reusing the
generic header file"
* tag 'riscv/for-v5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: dts: Add DT node for SiFive FU540 Ethernet controller driver
riscv: include generic support for MSI irqdomains
MAINTAINERS: Add Paul as a RISC-V maintainer
riscv: enable sys_clone3 syscall for rv64
Linus Torvalds [Thu, 25 Jul 2019 15:58:32 +0000 (08:58 -0700)]
Merge tag 'ktest-v5.3' of git://git./linux/kernel/git/rostedt/linux-ktest
Pull ktest fixlets from Steven Rostedt:
"This contains only simple spelling fixes"
* tag 'ktest-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Fix some typos in config-bisect.pl
Linus Torvalds [Thu, 25 Jul 2019 15:36:29 +0000 (08:36 -0700)]
Merge branch 'access-creds'
The access() (and faccessat()) credentials change can cause an
unnecessary load on the RCU machinery because every access() call ends
up freeing the temporary access credential using RCU.
This isn't really noticeable on small machines, but if you have hundreds
of cores you can cause huge slowdowns due to RCU storms.
It's easy to avoid: the temporary access crededntials aren't actually
normally accessed using RCU at all, so we can avoid the whole issue by
just marking them as such.
* access-creds:
access: avoid the RCU grace period for the temporary subjective credentials
Rafael J. Wysocki [Thu, 25 Jul 2019 08:46:07 +0000 (10:46 +0200)]
Merge branch 'pm-cpufreq'
* pm-cpufreq:
cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()
Hui Wang [Thu, 25 Jul 2019 06:57:37 +0000 (14:57 +0800)]
ALSA: hda - Add a conexant codec entry to let mute led work
This conexant codec isn't in the supported codec list yet, the hda
generic driver can drive this codec well, but on a Lenovo machine
with mute/mic-mute leds, we need to apply CXT_FIXUP_THINKPAD_ACPI
to make the leds work. After adding this codec to the list, the
driver patch_conexant.c will apply THINKPAD_ACPI to this machine.
Cc: stable@vger.kernel.org
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Takashi Iwai [Fri, 19 Jul 2019 08:27:54 +0000 (10:27 +0200)]
ALSA: hda - Fix intermittent CORB/RIRB stall on Intel chips
It turned out that the recent Intel HD-audio controller chips show a
significant stall during the system PM resume intermittently. It
doesn't happen so often and usually it may read back successfully
after one or more seconds, but in some rare worst cases the driver
went into fallback mode.
After trial-and-error, we found out that the communication stall seems
covered by issuing the sync after each verb write, as already done for
AMD and other chipsets. So this patch enables the write-sync flag for
the recent Intel chips, Skylake and onward, as a workaround.
Also, since Broxton and co have the very same driver flags as Skylake,
refer to the Skylake driver flags instead of defining the same
contents again for simplification.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201901
Reported-and-tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Masanari Iida [Tue, 23 Jul 2019 03:24:45 +0000 (12:24 +0900)]
ktest: Fix some typos in config-bisect.pl
This patch fixes some spelling typos in config-bisect.pl
Link: http://lkml.kernel.org/r/20190723032445.14220-1-standby24x7@gmail.com
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Linus Torvalds [Thu, 11 Jul 2019 16:54:40 +0000 (09:54 -0700)]
access: avoid the RCU grace period for the temporary subjective credentials
It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
work because it installs a temporary credential that gets allocated and
freed for each system call.
The allocation and freeing overhead is mostly benign, but because
credentials can be accessed under the RCU read lock, the freeing
involves a RCU grace period.
Which is not a huge deal normally, but if you have a lot of access()
calls, this causes a fair amount of seconday damage: instead of having a
nice alloc/free patterns that hits in hot per-CPU slab caches, you have
all those delayed free's, and on big machines with hundreds of cores,
the RCU overhead can end up being enormous.
But it turns out that all of this is entirely unnecessary. Exactly
because access() only installs the credential as the thread-local
subjective credential, the temporary cred pointer doesn't actually need
to be RCU free'd at all. Once we're done using it, we can just free it
synchronously and avoid all the RCU overhead.
So add a 'non_rcu' flag to 'struct cred', which can be set by users that
know they only use it in non-RCU context (there are other potential
users for this). We can make it a union with the rcu freeing list head
that we need for the RCU case, so this doesn't need any extra storage.
Note that this also makes 'get_current_cred()' clear the new non_rcu
flag, in case we have filesystems that take a long-term reference to the
cred and then expect the RCU delayed freeing afterwards. It's not
entirely clear that this is required, but it makes for clear semantics:
the subjective cred remains non-RCU as long as you only access it
synchronously using the thread-local accessors, but you _can_ use it as
a generic cred if you want to.
It is possible that we should just remove the whole RCU markings for
->cred entirely. Only ->real_cred is really supposed to be accessed
through RCU, and the long-term cred copies that nfs uses might want to
explicitly re-enable RCU freeing if required, rather than have
get_current_cred() do it implicitly.
But this is a "minimal semantic changes" change for the immediate
problem.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Glauber <jglauber@marvell.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 24 Jul 2019 16:58:39 +0000 (09:58 -0700)]
Merge tag 'powerpc-5.3-2' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"An assortment of non-regression fixes that have accumulated since the
start of the merge window.
- A fix for a user triggerable oops on machines where transactional
memory is disabled, eg. Power9 bare metal, Power8 with TM disabled
on the command line, or all Power7 or earlier machines.
- Three fixes for handling of PMU and power saving registers when
running nested KVM on Power9.
- Two fixes for bugs found while stress testing the XIVE interrupt
controller code, also on Power9.
- A fix to allow guests to boot under Qemu/KVM on Power9 using the
the Hash MMU with >= 1TB of memory.
- Two fixes for bugs in the recent DMA cleanup, one of which could
lead to checkstops.
- And finally three fixes for the PAPR SCM nvdimm driver.
Thanks to: Alexey Kardashevskiy, Andrea Arcangeli, Cédric Le Goater,
Christoph Hellwig, David Gibson, Gautham R. Shenoy, Michael Neuling,
Oliver O'Halloran, Satheesh Rajendran, Shawn Anastasio, Suraj Jitindar
Singh, Vaibhav Jain"
* tag 'powerpc-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL
powerpc/pseries: Update SCM hcall op-codes in hvcall.h
powerpc/tm: Fix oops on sigreturn on systems without TM
powerpc/dma: Fix invalid DMA mmap behavior
KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create fails
powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask()
powerpc: fix off by one in max_zone_pfn initialization for ZONE_DMA
KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries
powerpc/pmu: Set pmcregs_in_use in paca when running as LPAR
KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting
powerpc/mm: Limit rma_size to 1TB when running without HV mode
Linus Torvalds [Wed, 24 Jul 2019 16:46:13 +0000 (09:46 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Bugfixes, a pvspinlock optimization, and documentation moving"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption
Documentation: move Documentation/virtual to Documentation/virt
KVM: nVMX: Set cached_vmcs12 and cached_shadow_vmcs12 NULL after free
KVM: X86: Dynamically allocate user_fpu
KVM: X86: Fix fpu state crash in kvm guest
Revert "kvm: x86: Use task structs fpu field for user"
KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
Linus Torvalds [Wed, 24 Jul 2019 16:28:55 +0000 (09:28 -0700)]
Merge tag 'dma-mapping-5.3-2' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping regression fix from Christoph Hellwig:
"Ensure that dma_addressing_limited doesn't crash on devices without a
dma mask (Eric Auger)"
* tag 'dma-mapping-5.3-2' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: use dma_get_mask in dma_addressing_limited
Wanpeng Li [Wed, 24 Jul 2019 09:43:13 +0000 (17:43 +0800)]
KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption
Commit
11752adb (locking/pvqspinlock: Implement hybrid PV queued/unfair locks)
introduces hybrid PV queued/unfair locks
- queued mode (no starvation)
- unfair mode (good performance on not heavily contended lock)
The lock waiter goes into the unfair mode especially in VMs with over-commit
vCPUs since increaing over-commitment increase the likehood that the queue
head vCPU may have been preempted and not actively spinning.
However, reschedule queue head vCPU timely to acquire the lock still can get
better performance than just depending on lock stealing in over-subscribe
scenario.
Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
ebizzy -M
vanilla boosting improved
1VM 23520 25040 6%
2VM 8000 13600 70%
3VM 3100 5400 74%
The lock holder vCPU yields to the queue head vCPU when unlock, to boost queue
head vCPU which is involuntary preemption or the one which is voluntary halt
due to fail to acquire the lock after a short spin in the guest.
Cc: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Christoph Hellwig [Wed, 24 Jul 2019 07:24:49 +0000 (09:24 +0200)]
Documentation: move Documentation/virtual to Documentation/virt
Renaming docs seems to be en vogue at the moment, so fix on of the
grossly misnamed directories. We usually never use "virtual" as
a shortcut for virtualization in the kernel, but always virt,
as seen in the virt/ top-level directory. Fix up the documentation
to match that.
Fixes: ed16648eb5b8 ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Tue, 23 Jul 2019 22:34:59 +0000 (15:34 -0700)]
Merge branch 'parisc-5.3-3' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
- Fix build issues when kprobes are enabled
- Speed up ITLB/DTLB cache flushes when running on machines with
combined TLBs
* 'parisc-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Flush ITLB in flush_tlb_all_local() only on split TLB machines
parisc: add kprobe_fault_handler()
yangerkun [Tue, 23 Jul 2019 03:23:13 +0000 (11:23 +0800)]
Revert "nvme-pci: don't create a read hctx mapping without read queues"
This reverts commit
0298d5435276e7795b0b939d74827f6e775e7009.
With this patch, set 'poll_queues > hard queues' will lead to 'nr_read_queues = 0'
in nvme_calc_irq_sets. Then poll_queues setting can fail since dev->tagset.nr_maps
equals to 2 and nvme_pci_map_queues will not do map for poll queues.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Marta Rybczynska [Tue, 23 Jul 2019 05:41:20 +0000 (07:41 +0200)]
nvme: fix multipath crash when ANA is deactivated
Fix a crash with multipath activated. It happends when ANA log
page is larger than MDTS and because of that ANA is disabled.
The driver then tries to access unallocated buffer when connecting
to a nvme target. The signature is as follows:
[ 300.433586] nvme nvme0: ANA log page size (8208) larger than MDTS (8192).
[ 300.435387] nvme nvme0: disabling ANA support.
[ 300.437835] nvme nvme0: creating 4 I/O queues.
[ 300.459132] nvme nvme0: new ctrl: NQN "nqn.0.0.0", addr 10.91.0.1:8009
[ 300.464609] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[ 300.466342] #PF error: [normal kernel read fault]
[ 300.467385] PGD 0 P4D 0
[ 300.467987] Oops: 0000 [#1] SMP PTI
[ 300.468787] CPU: 3 PID: 50 Comm: kworker/u8:1 Not tainted 5.0.20kalray+ #4
[ 300.470264] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 300.471532] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[ 300.472724] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
[ 300.474038] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
[ 300.477374] RSP: 0018:
ffffa50e80fd7cb8 EFLAGS:
00010296
[ 300.478334] RAX:
0000000000000001 RBX:
ffff9130f1872258 RCX:
0000000000000000
[ 300.479784] RDX:
ffffffffc06c4c30 RSI:
ffff9130edad4280 RDI:
ffff9130f1872258
[ 300.481488] RBP:
0000000000000000 R08:
0000000000000001 R09:
0000000000000044
[ 300.483203] R10:
0000000000000220 R11:
0000000000000040 R12:
ffff9130f18722c0
[ 300.484928] R13:
ffff9130f18722d0 R14:
ffff9130edad4280 R15:
ffff9130f18722c0
[ 300.486626] FS:
0000000000000000(0000) GS:
ffff9130f7b80000(0000) knlGS:
0000000000000000
[ 300.488538] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 300.489907] CR2:
0000000000000008 CR3:
00000002365e6000 CR4:
00000000000006e0
[ 300.491612] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 300.493303] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 300.494991] Call Trace:
[ 300.495645] nvme_mpath_add_disk+0x5c/0xb0 [nvme_core]
[ 300.496880] nvme_validate_ns+0x2ef/0x550 [nvme_core]
[ 300.498105] ? nvme_identify_ctrl.isra.45+0x6a/0xb0 [nvme_core]
[ 300.499539] nvme_scan_work+0x2b4/0x370 [nvme_core]
[ 300.500717] ? __switch_to_asm+0x35/0x70
[ 300.501663] process_one_work+0x171/0x380
[ 300.502340] worker_thread+0x49/0x3f0
[ 300.503079] kthread+0xf8/0x130
[ 300.503795] ? max_active_store+0x80/0x80
[ 300.504690] ? kthread_bind+0x10/0x10
[ 300.505502] ret_from_fork+0x35/0x40
[ 300.506280] Modules linked in: nvme_tcp nvme_rdma rdma_cm iw_cm ib_cm ib_core nvme_fabrics nvme_core xt_physdev ip6table_raw ip6table_mangle ip6table_filter ip6_tables xt_comment iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle iptable_filter veth ebtable_filter ebtable_nat ebtables iptable_raw vxlan ip6_udp_tunnel udp_tunnel sunrpc joydev pcspkr virtio_balloon br_netfilter bridge stp llc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_console net_failover virtio_blk failover ata_piix serio_raw libata virtio_pci virtio_ring virtio
[ 300.514984] CR2:
0000000000000008
[ 300.515569] ---[ end trace
faa2eefad7e7f218 ]---
[ 300.516354] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
[ 300.517330] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
[ 300.520353] RSP: 0018:
ffffa50e80fd7cb8 EFLAGS:
00010296
[ 300.521229] RAX:
0000000000000001 RBX:
ffff9130f1872258 RCX:
0000000000000000
[ 300.522399] RDX:
ffffffffc06c4c30 RSI:
ffff9130edad4280 RDI:
ffff9130f1872258
[ 300.523560] RBP:
0000000000000000 R08:
0000000000000001 R09:
0000000000000044
[ 300.524734] R10:
0000000000000220 R11:
0000000000000040 R12:
ffff9130f18722c0
[ 300.525915] R13:
ffff9130f18722d0 R14:
ffff9130edad4280 R15:
ffff9130f18722c0
[ 300.527084] FS:
0000000000000000(0000) GS:
ffff9130f7b80000(0000) knlGS:
0000000000000000
[ 300.528396] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 300.529440] CR2:
0000000000000008 CR3:
00000002365e6000 CR4:
00000000000006e0
[ 300.530739] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 300.531989] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 300.533264] Kernel panic - not syncing: Fatal exception
[ 300.534338] Kernel Offset: 0x17c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 300.536227] ---[ end Kernel panic - not syncing: Fatal exception ]---
Condition check refactoring from Christoph Hellwig.
Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Tested-by: Jean-Baptiste Riaux <jbriaux@kalray.eu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Logan Gunthorpe [Thu, 18 Jul 2019 23:53:50 +0000 (17:53 -0600)]
nvme: fix memory leak caused by incorrect subsystem free
When freeing the subsystem after finding another match with
__nvme_find_get_subsystem(), use put_device() instead of
__nvme_release_subsystem() which calls kfree() directly.
Per the documentation, put_device() should always be used
after device_initialization() is called. Otherwise, leaks
like the one below which was detected by kmemleak may occur.
Once the call of __nvme_release_subsystem() is removed it no
longer makes sense to keep the helper, so fold it back
into nvme_release_subsystem().
unreferenced object 0xffff8883d12bfbc0 (size 16):
comm "nvme", pid 2635, jiffies
4294933602 (age 739.952s)
hex dump (first 16 bytes):
6e 76 6d 65 2d 73 75 62 73 79 73 32 00 88 ff ff nvme-subsys2....
backtrace:
[<
000000007d8fc208>] __kmalloc_track_caller+0x16d/0x2a0
[<
0000000081169e5f>] kvasprintf+0xad/0x130
[<
0000000025626f25>] kvasprintf_const+0x47/0x120
[<
00000000fa66ad36>] kobject_set_name_vargs+0x44/0x120
[<
000000004881f8b3>] dev_set_name+0x98/0xc0
[<
000000007124dae3>] nvme_init_identify+0x1995/0x38e0
[<
000000009315020a>] nvme_loop_configure_admin_queue+0x4fa/0x5e0
[<
000000001a63e766>] nvme_loop_create_ctrl+0x489/0xf80
[<
00000000a46ecc23>] nvmf_dev_write+0x1a12/0x2220
[<
000000002259b3d5>] __vfs_write+0x66/0x120
[<
000000002f6df81e>] vfs_write+0x154/0x490
[<
000000007e8cfc19>] ksys_write+0x10a/0x240
[<
00000000ff5c7b85>] __x64_sys_write+0x73/0xb0
[<
00000000fee6d692>] do_syscall_64+0xaa/0x470
[<
00000000997e1ede>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: ab9e00cc72fa ("nvme: track subsystems")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Misha Nasledov [Mon, 15 Jul 2019 07:11:49 +0000 (00:11 -0700)]
nvme: ignore subnqn for ADATA SX6000LNP
The ADATA SX6000LNP NVMe SSDs have the same subnqn and, due to this, a
system with more than one of these SSDs will only have one usable.
[ 0.942706] nvme nvme1: ignoring ctrl due to duplicate subnqn (nqn.2018-05.com.example:nvme:nvm-subsystem-OUI00E04C).
[ 0.943017] nvme nvme1: Removing after probe failure status: -22
02:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
71:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01)
There are no firmware updates available from the vendor, unfortunately.
Applying the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk for these SSDs resolves
the issue, and they all work after this patch:
/dev/nvme0n1 2J1120050420 ADATA SX6000LNP [...]
/dev/nvme1n1 2J1120050540 ADATA SX6000LNP [...]
Signed-off-by: Misha Nasledov <misha@nasledov.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Eric Auger [Mon, 22 Jul 2019 16:51:49 +0000 (18:51 +0200)]
dma-mapping: use dma_get_mask in dma_addressing_limited
We currently have cases where the dma_addressing_limited() gets
called with dma_mask unset. This causes a NULL pointer dereference.
Use dma_get_mask() accessor to prevent the crash.
Fixes: b866455423e0 ("dma-mapping: add a dma_addressing_limited helper")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Suthikulpanit, Suravee [Tue, 16 Jul 2019 04:29:16 +0000 (04:29 +0000)]
iommu/amd: Add support for X2APIC IOMMU interrupts
AMD IOMMU requires IntCapXT registers to be setup in order to generate
its own interrupts (for Event Log, PPR Log, and GA Log) with 32-bit
APIC destination ID. Without this support, AMD IOMMU MSI interrupts
will not be routed correctly when booting the system in X2APIC mode.
Cc: Joerg Roedel <joro@8bytes.org>
Fixes: 90fcffd9cf5e ('iommu/amd: Add support for IOMMU XT mode')
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Arnd Bergmann [Mon, 22 Jul 2019 12:26:34 +0000 (14:26 +0200)]
drbd: dynamically allocate shash descriptor
Building with clang and KASAN, we get a warning about an overly large
stack frame on 32-bit architectures:
drivers/block/drbd/drbd_receiver.c:921:31: error: stack frame size of 1280 bytes in function 'conn_connect'
[-Werror,-Wframe-larger-than=]
We already allocate other data dynamically in this function, so
just do the same for the shash descriptor, which makes up most of
this memory.
Link: https://lore.kernel.org/lkml/20190617132440.2721536-1-arnd@arndb.de/
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Marcos Paulo de Souza [Tue, 23 Jul 2019 03:27:41 +0000 (00:27 -0300)]
block: blk-mq: Remove blk_mq_sched_started_request and started_request
blk_mq_sched_completed_request is a function that checks if the elevator
related to the request has started_request implemented, but currently, none of
the available IO schedulers implement started_request, so remove both.
Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ding Xiang [Tue, 23 Jul 2019 07:44:41 +0000 (15:44 +0800)]
ALSA: ac97: Fix double free of ac97_codec_device
put_device will call ac97_codec_release to free
ac97_codec_device and other resources, so remove the kfree
and other redundant code.
Fixes: 74426fbff66e ("ALSA: ac97: add an ac97 bus")
Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Charles Keepax [Mon, 22 Jul 2019 09:24:36 +0000 (10:24 +0100)]
ALSA: compress: Be more restrictive about when a drain is allowed
Draining makes little sense in the situation of hardware overrun, as the
hardware will have consumed all its available samples. Additionally,
draining whilst the stream is paused would presumably get stuck as no
data is being consumed on the DSP side.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Charles Keepax [Mon, 22 Jul 2019 09:24:35 +0000 (10:24 +0100)]
ALSA: compress: Don't allow paritial drain operations on capture streams
Partial drain and next track are intended for gapless playback and
don't really have an obvious interpretation for a capture stream, so
makes sense to not allow those operations on capture streams.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Charles Keepax [Mon, 22 Jul 2019 09:24:34 +0000 (10:24 +0100)]
ALSA: compress: Prevent bypasses of set_params
Currently, whilst in SNDRV_PCM_STATE_OPEN it is possible to call
snd_compr_stop, snd_compr_drain and snd_compr_partial_drain, which
allow a transition to SNDRV_PCM_STATE_SETUP. The stream should
only be able to move to the setup state once it has received a
SNDRV_COMPRESS_SET_PARAMS ioctl. Fix this issue by not allowing
those ioctls whilst in the open state.
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Charles Keepax [Mon, 22 Jul 2019 09:24:33 +0000 (10:24 +0100)]
ALSA: compress: Fix regression on compressed capture streams
A previous fix to the stop handling on compressed capture streams causes
some knock on issues. The previous fix updated snd_compr_drain_notify to
set the state back to PREPARED for capture streams. This causes some
issues however as the handling for snd_compr_poll differs between the
two states and some user-space applications were relying on the poll
failing after the stream had been stopped.
To correct this regression whilst still fixing the original problem the
patch was addressing, update the capture handling to skip the PREPARED
state rather than skipping the SETUP state as it has done until now.
Fixes: 4f2ab5e1d13d ("ALSA: compress: Fix stop handling on compressed capture streams")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Joerg Roedel [Tue, 23 Jul 2019 07:51:00 +0000 (09:51 +0200)]
iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
The stub function for !CONFIG_IOMMU_IOVA needs to be
'static inline'.
Fixes: effa467870c76 ('iommu/vt-d: Don't queue_iova() if there is no flush queue')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Wen Yang [Wed, 17 Jul 2019 03:55:04 +0000 (11:55 +0800)]
cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()
The cpu variable is still being used in the of_get_property() call
after the of_node_put() call, which may result in use-after-free.
Fixes: a9acc26b75f6 ("cpufreq/pasemi: fix possible object reference leak")
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rafael J. Wysocki [Mon, 22 Jul 2019 10:22:57 +0000 (12:22 +0200)]
int340X/processor_thermal_device: Fix proc_thermal_rapl_remove()
Passing 0 to cpuhp_remove_state() triggers the BUG_ON() in
__cpuhp_remove_state_cpuslocked() and the argument passed to
powercap_unregister_control_type() is expected to be a valid
pointer, so avoid calling these functions with incorrect
arguments from proc_thermal_rapl_remove().
Fixes: 555c45fe0d04 ("int340X/processor_thermal_device: add support for MMIO RAPL")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Yash Shah [Fri, 19 Jul 2019 11:10:31 +0000 (16:40 +0530)]
riscv: dts: Add DT node for SiFive FU540 Ethernet controller driver
DT node for SiFive FU540-C000 GEMGXL Ethernet controller driver added
Signed-off-by: Yash Shah <yash.shah@sifive.com>
Reviewed-by: Sagar Kadam <sagar.kadam@sifive.com>
Cc: Andrew Lunn <andrew@lunn.ch>
[paul.walmsley@sifive.com: changed "phy1" to "phy0" at Andrew Lunn's
suggestion]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Wesley Terpstra [Mon, 20 May 2019 17:29:26 +0000 (10:29 -0700)]
riscv: include generic support for MSI irqdomains
Some RISC-V systems include PCIe host controllers that support PCIe
message-signaled interrupts. For this to work on Linux, we need to
enable PCI_MSI_IRQ_DOMAIN and define struct msi_alloc_info. Support
for the latter is enabled by including the architecture-generic msi.h
include.
Signed-off-by: Wesley Terpstra <wesley@sifive.com>
[paul.walmsley@sifive.com: split initial patch into one arch/riscv
patch and one drivers/pci patch]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Palmer Dabbelt [Fri, 28 Jun 2019 00:27:53 +0000 (17:27 -0700)]
MAINTAINERS: Add Paul as a RISC-V maintainer
The RISC-V port has grown significantly over the past year. Paul's been
helping out for a while ago. We agreed in person that he'd take over
collecting the patches and submitting the PRs, but it looks like I
forgot to make it official.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Linus Torvalds [Mon, 22 Jul 2019 16:30:34 +0000 (09:30 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull preemption Kconfig fix from Thomas Gleixner:
"The PREEMPT_RT stub config renamed PREEMPT to PREEMPT_LL and defined
PREEMPT outside of the menu and made it selectable by both PREEMPT_LL
and PREEMPT_RT.
Stupid me missed that 114 defconfigs select CONFIG_PREEMPT which
obviously can't work anymore. oldconfig builds are affected as well,
but it's more obvious as the user gets asked. [old]defconfig silently
fixes it up and selects PREEMPT_NONE.
Unbreak it by undoing the rename and adding a intermediate config
symbol which is selected by both PREEMPT and PREEMPT_RT. That requires
to chase down a few #ifdefs, but it's better than tweaking 114
defconfigs and annoying users"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
Linus Torvalds [Mon, 22 Jul 2019 16:14:19 +0000 (09:14 -0700)]
Merge tag 'for-linus-
20190722' of git://git./linux/kernel/git/brauner/linux
Pull pidfd polling fix from Christian Brauner:
"A fix for pidfd polling. It ensures that the task's exit state is
visible to all waiters"
* tag 'for-linus-
20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
pidfd: fix a poll race when setting exit_state
Linus Torvalds [Mon, 22 Jul 2019 16:08:38 +0000 (09:08 -0700)]
Merge tag 'for-5.3-rc1-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fixes for leaks caused by recently merged patches
- one build fix
- a fix to prevent mixing of incompatible features
* tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't leak extent_map in btrfs_get_io_geometry()
btrfs: free checksum hash on in close_ctree
btrfs: Fix build error while LIBCRC32C is module
btrfs: inode: Don't compress if NODATASUM or NODATACOW set
Thomas Gleixner [Mon, 22 Jul 2019 15:59:19 +0000 (17:59 +0200)]
sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
The merge of the CONFIG_PREEMPT_RT stub renamed CONFIG_PREEMPT to
CONFIG_PREEMPT_LL which causes all defconfigs which have CONFIG_PREEMPT=y
set to fall back to CONFIG_PREEMPT_NONE because CONFIG_PREEMPT depends on
the preemption mode choice wich defaults to NONE. This also affects
oldconfig builds.
So rather than changing 114 defconfig files and being an annoyance to
users, revert the rename and select a new config symbol PREEMPTION. That
keeps everything working smoothly and the revelant ifdef's are going to be
fixed up step by step.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes: a50a3f4b6a31 ("sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Mon, 22 Jul 2019 16:01:47 +0000 (09:01 -0700)]
Merge tag 'media/v5.3-2' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"For two regressions in media core:
- v4l2-subdev: fix regression in check_pad()
- videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already
in use"
* tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use
media: v4l2-subdev: fix regression in check_pad()
Sai Praneeth Prakhya [Mon, 22 Jul 2019 00:22:07 +0000 (17:22 -0700)]
iommu/vt-d: Print pasid table entries MSB to LSB in debugfs
Commit
dd5142ca5d24 ("iommu/vt-d: Add debugfs support to show scalable mode
DMAR table internals") prints content of pasid table entries from LSB to
MSB where as other entries are printed MSB to LSB. So, to maintain
uniformity among all entries and to not confuse the user, print MSB first.
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Fixes: dd5142ca5d24 ("iommu/vt-d: Add debugfs support to show scalable mode DMAR table internals")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Chris Wilson [Sat, 20 Jul 2019 18:08:48 +0000 (19:08 +0100)]
iommu/iova: Remove stale cached32_node
Since the cached32_node is allowed to be advanced above dma_32bit_pfn
(to provide a shortcut into the limited range), we need to be careful to
remove the to be freed node if it is the cached32_node.
[ 48.477773] BUG: KASAN: use-after-free in __cached_rbnode_delete_update+0x68/0x110
[ 48.477812] Read of size 8 at addr
ffff88870fc19020 by task kworker/u8:1/37
[ 48.477843]
[ 48.477879] CPU: 1 PID: 37 Comm: kworker/u8:1 Tainted: G U 5.2.0+ #735
[ 48.477915] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[ 48.478047] Workqueue: i915 __i915_gem_free_work [i915]
[ 48.478075] Call Trace:
[ 48.478111] dump_stack+0x5b/0x90
[ 48.478137] print_address_description+0x67/0x237
[ 48.478178] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478212] __kasan_report.cold.3+0x1c/0x38
[ 48.478240] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478280] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478308] __cached_rbnode_delete_update+0x68/0x110
[ 48.478344] private_free_iova+0x2b/0x60
[ 48.478378] iova_magazine_free_pfns+0x46/0xa0
[ 48.478403] free_iova_fast+0x277/0x340
[ 48.478443] fq_ring_free+0x15a/0x1a0
[ 48.478473] queue_iova+0x19c/0x1f0
[ 48.478597] cleanup_page_dma.isra.64+0x62/0xb0 [i915]
[ 48.478712] __gen8_ppgtt_cleanup+0x63/0x80 [i915]
[ 48.478826] __gen8_ppgtt_cleanup+0x42/0x80 [i915]
[ 48.478940] __gen8_ppgtt_clear+0x433/0x4b0 [i915]
[ 48.479053] __gen8_ppgtt_clear+0x462/0x4b0 [i915]
[ 48.479081] ? __sg_free_table+0x9e/0xf0
[ 48.479116] ? kfree+0x7f/0x150
[ 48.479234] i915_vma_unbind+0x1e2/0x240 [i915]
[ 48.479352] i915_vma_destroy+0x3a/0x280 [i915]
[ 48.479465] __i915_gem_free_objects+0xf0/0x2d0 [i915]
[ 48.479579] __i915_gem_free_work+0x41/0xa0 [i915]
[ 48.479607] process_one_work+0x495/0x710
[ 48.479642] worker_thread+0x4c7/0x6f0
[ 48.479687] ? process_one_work+0x710/0x710
[ 48.479724] kthread+0x1b2/0x1d0
[ 48.479774] ? kthread_create_worker_on_cpu+0xa0/0xa0
[ 48.479820] ret_from_fork+0x1f/0x30
[ 48.479864]
[ 48.479907] Allocated by task 631:
[ 48.479944] save_stack+0x19/0x80
[ 48.479994] __kasan_kmalloc.constprop.6+0xc1/0xd0
[ 48.480038] kmem_cache_alloc+0x91/0xf0
[ 48.480082] alloc_iova+0x2b/0x1e0
[ 48.480125] alloc_iova_fast+0x58/0x376
[ 48.480166] intel_alloc_iova+0x90/0xc0
[ 48.480214] intel_map_sg+0xde/0x1f0
[ 48.480343] i915_gem_gtt_prepare_pages+0xb8/0x170 [i915]
[ 48.480465] huge_get_pages+0x232/0x2b0 [i915]
[ 48.480590] ____i915_gem_object_get_pages+0x40/0xb0 [i915]
[ 48.480712] __i915_gem_object_get_pages+0x90/0xa0 [i915]
[ 48.480834] i915_gem_object_prepare_write+0x2d6/0x330 [i915]
[ 48.480955] create_test_object.isra.54+0x1a9/0x3e0 [i915]
[ 48.481075] igt_shared_ctx_exec+0x365/0x3c0 [i915]
[ 48.481210] __i915_subtests.cold.4+0x30/0x92 [i915]
[ 48.481341] __run_selftests.cold.3+0xa9/0x119 [i915]
[ 48.481466] i915_live_selftests+0x3c/0x70 [i915]
[ 48.481583] i915_pci_probe+0xe7/0x220 [i915]
[ 48.481620] pci_device_probe+0xe0/0x180
[ 48.481665] really_probe+0x163/0x4e0
[ 48.481710] device_driver_attach+0x85/0x90
[ 48.481750] __driver_attach+0xa5/0x180
[ 48.481796] bus_for_each_dev+0xda/0x130
[ 48.481831] bus_add_driver+0x205/0x2e0
[ 48.481882] driver_register+0xca/0x140
[ 48.481927] do_one_initcall+0x6c/0x1af
[ 48.481970] do_init_module+0x106/0x350
[ 48.482010] load_module+0x3d2c/0x3ea0
[ 48.482058] __do_sys_finit_module+0x110/0x180
[ 48.482102] do_syscall_64+0x62/0x1f0
[ 48.482147] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 48.482190]
[ 48.482224] Freed by task 37:
[ 48.482273] save_stack+0x19/0x80
[ 48.482318] __kasan_slab_free+0x12e/0x180
[ 48.482363] kmem_cache_free+0x70/0x140
[ 48.482406] __free_iova+0x1d/0x30
[ 48.482445] fq_ring_free+0x15a/0x1a0
[ 48.482490] queue_iova+0x19c/0x1f0
[ 48.482624] cleanup_page_dma.isra.64+0x62/0xb0 [i915]
[ 48.482749] __gen8_ppgtt_cleanup+0x63/0x80 [i915]
[ 48.482873] __gen8_ppgtt_cleanup+0x42/0x80 [i915]
[ 48.482999] __gen8_ppgtt_clear+0x433/0x4b0 [i915]
[ 48.483123] __gen8_ppgtt_clear+0x462/0x4b0 [i915]
[ 48.483250] i915_vma_unbind+0x1e2/0x240 [i915]
[ 48.483378] i915_vma_destroy+0x3a/0x280 [i915]
[ 48.483500] __i915_gem_free_objects+0xf0/0x2d0 [i915]
[ 48.483622] __i915_gem_free_work+0x41/0xa0 [i915]
[ 48.483659] process_one_work+0x495/0x710
[ 48.483704] worker_thread+0x4c7/0x6f0
[ 48.483748] kthread+0x1b2/0x1d0
[ 48.483787] ret_from_fork+0x1f/0x30
[ 48.483831]
[ 48.483868] The buggy address belongs to the object at
ffff88870fc19000
[ 48.483868] which belongs to the cache iommu_iova of size 40
[ 48.483920] The buggy address is located 32 bytes inside of
[ 48.483920] 40-byte region [
ffff88870fc19000,
ffff88870fc19028)
[ 48.483964] The buggy address belongs to the page:
[ 48.484006] page:
ffffea001c3f0600 refcount:1 mapcount:0 mapping:
ffff8888181a91c0 index:0x0 compound_mapcount: 0
[ 48.484045] flags: 0x8000000000010200(slab|head)
[ 48.484096] raw:
8000000000010200 ffffea001c421a08 ffffea001c447e88 ffff8888181a91c0
[ 48.484141] raw:
0000000000000000 0000000000120012 00000001ffffffff 0000000000000000
[ 48.484188] page dumped because: kasan: bad access detected
[ 48.484230]
[ 48.484265] Memory state around the buggy address:
[ 48.484314]
ffff88870fc18f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484361]
ffff88870fc18f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484406] >
ffff88870fc19000: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
[ 48.484451] ^
[ 48.484494]
ffff88870fc19080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484530]
ffff88870fc19100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108602
Fixes: e60aa7b53845 ("iommu/iova: Extend rbtree node caching")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: <stable@vger.kernel.org> # v4.15+
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Linus Torvalds [Mon, 22 Jul 2019 15:49:22 +0000 (08:49 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Several netfilter fixes including a nfnetlink deadlock fix from
Florian Westphal and fix for dropping VRF packets from Miaohe Lin.
2) Flow offload fixes from Pablo Neira Ayuso including a fix to restore
proper block sharing.
3) Fix r8169 PHY init from Thomas Voegtle.
4) Fix memory leak in mac80211, from Lorenzo Bianconi.
5) Missing NULL check on object allocation in cxgb4, from Navid
Emamdoost.
6) Fix scaling of RX power in sfp phy driver, from Andrew Lunn.
7) Check that there is actually an ip header to access in skb->data in
VRF, from Peter Kosyh.
8) Remove spurious rcu unlock in hv_netvsc, from Haiyang Zhang.
9) One more tweak the the TCP fragmentation memory limit changes, to be
less harmful to applications setting small SO_SNDBUF values. From
Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
tcp: be more careful in tcp_fragment()
hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
vrf: make sure skb->data contains ip header to make routing
connector: remove redundant input callback from cn_dev
qed: Prefer pcie_capability_read_word()
igc: Prefer pcie_capability_read_word()
cxgb4: Prefer pcie_capability_read_word()
be2net: Synchronize be_update_queues with dev_watchdog
bnx2x: Prevent load reordering in tx completion processing
net: phy: sfp: hwmon: Fix scaling of RX power
net: sched: verify that q!=NULL before setting q->flags
chelsio: Fix a typo in a function name
allocate_flower_entry: should check for null deref
net: hns3: typo in the name of a constant
kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.
tipc: Fix a typo
mac80211: don't warn about CW params when not using them
mac80211: fix possible memory leak in ieee80211_assign_beacon
nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN
nl80211: fix VENDOR_CMD_RAW_DATA
...
Dmitry Safonov [Tue, 16 Jul 2019 21:38:06 +0000 (22:38 +0100)]
iommu/vt-d: Check if domain->pgd was allocated
There is a couple of places where on domain_init() failure domain_exit()
is called. While currently domain_init() can fail only if
alloc_pgtable_page() has failed.
Make domain_exit() check if domain->pgd present, before calling
domain_unmap(), as it theoretically should crash on clearing pte entries
in dma_pte_clear_level().
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Dmitry Safonov [Tue, 16 Jul 2019 21:38:05 +0000 (22:38 +0100)]
iommu/vt-d: Don't queue_iova() if there is no flush queue
Intel VT-d driver was reworked to use common deferred flushing
implementation. Previously there was one global per-cpu flush queue,
afterwards - one per domain.
Before deferring a flush, the queue should be allocated and initialized.
Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush
queue. It's probably worth to init it for static or unmanaged domains
too, but it may be arguable - I'm leaving it to iommu folks.
Prevent queuing an iova flush if the domain doesn't have a queue.
The defensive check seems to be worth to keep even if queue would be
initialized for all kinds of domains. And is easy backportable.
On 4.19.43 stable kernel it has a user-visible effect: previously for
devices in si domain there were crashes, on sata devices:
BUG: spinlock bad magic on CPU#6, swapper/0/1
lock: 0xffff88844f582008, .magic:
00000000, .owner: <none>/-1, .owner_cpu: 0
CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1
Call Trace:
<IRQ>
dump_stack+0x61/0x7e
spin_bug+0x9d/0xa3
do_raw_spin_lock+0x22/0x8e
_raw_spin_lock_irqsave+0x32/0x3a
queue_iova+0x45/0x115
intel_unmap+0x107/0x113
intel_unmap_sg+0x6b/0x76
__ata_qc_complete+0x7f/0x103
ata_qc_complete+0x9b/0x26a
ata_qc_complete_multiple+0xd0/0xe3
ahci_handle_port_interrupt+0x3ee/0x48a
ahci_handle_port_intr+0x73/0xa9
ahci_single_level_irq_intr+0x40/0x60
__handle_irq_event_percpu+0x7f/0x19a
handle_irq_event_percpu+0x32/0x72
handle_irq_event+0x38/0x56
handle_edge_irq+0x102/0x121
handle_irq+0x147/0x15c
do_IRQ+0x66/0xf2
common_interrupt+0xf/0xf
RIP: 0010:__do_softirq+0x8c/0x2df
The same for usb devices that use ehci-pci:
BUG: spinlock bad magic on CPU#0, swapper/0/1
lock: 0xffff88844f402008, .magic:
00000000, .owner: <none>/-1, .owner_cpu: 0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4
Call Trace:
<IRQ>
dump_stack+0x61/0x7e
spin_bug+0x9d/0xa3
do_raw_spin_lock+0x22/0x8e
_raw_spin_lock_irqsave+0x32/0x3a
queue_iova+0x77/0x145
intel_unmap+0x107/0x113
intel_unmap_page+0xe/0x10
usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d
usb_hcd_unmap_urb_for_dma+0x17/0x100
unmap_urb_for_dma+0x22/0x24
__usb_hcd_giveback_urb+0x51/0xc3
usb_giveback_urb_bh+0x97/0xde
tasklet_action_common.isra.4+0x5f/0xa1
tasklet_action+0x2d/0x30
__do_softirq+0x138/0x2df
irq_exit+0x7d/0x8b
smp_apic_timer_interrupt+0x10f/0x151
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: <stable@vger.kernel.org> # 4.14+
Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Tue, 9 Jul 2019 05:22:45 +0000 (13:22 +0800)]
iommu/vt-d: Avoid duplicated pci dma alias consideration
As we have abandoned the home-made lazy domain allocation
and delegated the DMA domain life cycle up to the default
domain mechanism defined in the generic iommu layer, we
needn't consider pci alias anymore when mapping/unmapping
the context entries. Without this fix, we see kernel NULL
pointer dereference during pci device hot-plug test.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Fixes: fa954e6831789 ("iommu/vt-d: Delegate the dma domain to upper layer")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reported-and-tested-by: Xu Pengfei <pengfei.xu@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Joerg Roedel [Mon, 22 Jul 2019 14:21:05 +0000 (16:21 +0200)]
Revert "iommu/vt-d: Consolidate domain_init() to avoid duplication"
This reverts commit
123b2ffc376e1b3e9e015c75175b61e88a8b8518.
This commit reportedly caused boot failures on some systems
and needs to be reverted for now.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Wei Yongjun [Mon, 22 Jul 2019 14:12:36 +0000 (22:12 +0800)]
bcache: fix possible memory leak in bch_cached_dev_run()
memory malloced in bch_cached_dev_run() and should be freed before
leaving from the error handling cases, otherwise it will cause
memory leak.
Fixes: 0b13efecf5f2 ("bcache: add return value check to bch_cached_dev_run()")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Suren Baghdasaryan [Wed, 17 Jul 2019 17:21:00 +0000 (13:21 -0400)]
pidfd: fix a poll race when setting exit_state
There is a race between reading task->exit_state in pidfd_poll and
writing it after do_notify_parent calls do_notify_pidfd. Expected
sequence of events is:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
tsk->exit_state = EXIT_DEAD
pidfd_poll
if (tsk->exit_state)
However nothing prevents the following sequence:
CPU 0 CPU 1
------------------------------------------------
exit_notify
do_notify_parent
do_notify_pidfd
pidfd_poll
if (tsk->exit_state)
tsk->exit_state = EXIT_DEAD
This causes a polling task to wait forever, since poll blocks because
exit_state is 0 and the waiting task is not notified again. A stress
test continuously doing pidfd poll and process exits uncovered this bug.
To fix it, we make sure that the task's exit_state is always set before
calling do_notify_pidfd.
Fixes: b53b0b9d9a6 ("pidfd: add polling support")
Cc: kernel-team@android.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org
[christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie]
Signed-off-by: Christian Brauner <christian@brauner.io>
Vaibhav Jain [Sat, 29 Jun 2019 16:06:10 +0000 (21:36 +0530)]
powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
In some cases initial bind of scm memory for an lpar can fail if
previously it wasn't released using a scm-unbind hcall. This situation
can arise due to panic of the previous kernel or forced lpar
fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.
To mitigate such cases the patch updates papr_scm_probe() to force a
call to drc_pmem_unbind() in case the initial bind of scm memory fails
with EBUSY error. In case scm-bind operation again fails after the
forced scm-unbind then we follow the existing error path. We also
update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
and indicate it as a EBUSY error back to the caller.
Suggested-by: "Oliver O'Halloran" <oohall@gmail.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-4-vaibhav@linux.ibm.com
Vaibhav Jain [Sat, 29 Jun 2019 16:06:09 +0000 (21:36 +0530)]
powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL
The new hcall named H_SCM_UNBIND_ALL has been introduce that can
unbind all or specific scm memory assigned to an lpar. This is
more efficient than using H_SCM_UNBIND_MEM as currently we don't
support partial unbind of scm memory.
Hence this patch proposes following changes to drc_pmem_unbind():
* Update drc_pmem_unbind() to replace hcall H_SCM_UNBIND_MEM to
H_SCM_UNBIND_ALL.
* Update drc_pmem_unbind() to handles cases when PHYP asks the guest
kernel to wait for specific amount of time before retrying the
hcall via the 'LONG_BUSY' return value.
* Ensure appropriate error code is returned back from the function
in case of an error.
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-3-vaibhav@linux.ibm.com
Vaibhav Jain [Sat, 29 Jun 2019 16:06:08 +0000 (21:36 +0530)]
powerpc/pseries: Update SCM hcall op-codes in hvcall.h
Update the hvcalls.h to include op-codes for new hcalls introduce to
manage SCM memory. Also update existing hcall definitions to reflect
current papr specification for SCM.
The removed hcall op-codes H_SCM_MEM_QUERY, H_SCM_BLOCK_CLEAR were
transient proposals and there support was never implemented by
Power-VM nor they were used anywhere in Linux kernel. Hence we don't
expect anyone to be impacted by this change.
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190629160610.23402-2-vaibhav@linux.ibm.com
Jan Kiszka [Sun, 21 Jul 2019 14:01:36 +0000 (16:01 +0200)]
KVM: nVMX: Set cached_vmcs12 and cached_shadow_vmcs12 NULL after free
Shall help finding use-after-free bugs earlier.
Suggested-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Mon, 22 Jul 2019 04:26:21 +0000 (12:26 +0800)]
KVM: X86: Dynamically allocate user_fpu
After reverting commit
240c35a3783a (kvm: x86: Use task structs fpu field
for user), struct kvm_vcpu is 19456 bytes on my server, PAGE_ALLOC_COSTLY_ORDER(3)
is the order at which allocations are deemed costly to service. In serveless
scenario, one host can service hundreds/thoudands firecracker/kata-container
instances, howerver, new instance will fail to launch after memory is too
fragmented to allocate kvm_vcpu struct on host, this was observed in some
cloud provider product environments.
This patch dynamically allocates user_fpu, kvm_vcpu is 15168 bytes now on my
Skylake server.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Mon, 22 Jul 2019 04:26:20 +0000 (12:26 +0800)]
KVM: X86: Fix fpu state crash in kvm guest
The idea before commit
240c35a37 (which has just been reverted)
was that we have the following FPU states:
userspace (QEMU) guest
---------------------------------------------------------------------------
processor vcpu->arch.guest_fpu
>>> KVM_RUN: kvm_load_guest_fpu
vcpu->arch.user_fpu processor
>>> preempt out
vcpu->arch.user_fpu current->thread.fpu
>>> preempt in
vcpu->arch.user_fpu processor
>>> back to userspace
>>> kvm_put_guest_fpu
processor vcpu->arch.guest_fpu
---------------------------------------------------------------------------
With the new lazy model we want to get the state back to the processor
when schedule in from current->thread.fpu.
Reported-by: Thomas Lambertz <mail@thomaslambertz.de>
Reported-by: anthony <antdev66@gmail.com>
Tested-by: anthony <antdev66@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Lambertz <mail@thomaslambertz.de>
Cc: anthony <antdev66@gmail.com>
Cc: stable@vger.kernel.org
Fixes: 5f409e20b (x86/fpu: Defer FPU state load until return to userspace)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Add a comment in front of the warning. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 22 Jul 2019 11:31:27 +0000 (13:31 +0200)]
Revert "kvm: x86: Use task structs fpu field for user"
This reverts commit
240c35a3783ab9b3a0afaba0dde7291295680a6b
("kvm: x86: Use task structs fpu field for user", 2018-11-06).
The commit is broken and causes QEMU's FPU state to be destroyed
when KVM_RUN is preempted.
Fixes: 240c35a3783a ("kvm: x86: Use task structs fpu field for user")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jan Kiszka [Sun, 21 Jul 2019 11:52:18 +0000 (13:52 +0200)]
KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
Letting this pend may cause nested_get_vmcs12_pages to run against an
invalid state, corrupting the effective vmcs of L1.
This was triggerable in QEMU after a guest corruption in L2, followed by
a L1 reset.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Fixes: 7f7f1ba33cf2 ("KVM: x86: do not load vmcs12 pages while still in SMM")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Zhang Rui [Fri, 19 Jul 2019 15:25:14 +0000 (23:25 +0800)]
powercap: Invoke powercap_init() and rapl_init() earlier
The MMIO RAPL interface driver depends on both powercap subsystem and
the intel_rapl_common code.
But when all of them are built-in, the MMIO RAPL interface driver can
be loaded before the other two and this breaks the system during boot.
Fix this by adjusting the init order of the powercap subsystem and the
intel_rapl_common code, so that it can be initialized first.
Fixes: 555c45fe0d04 ("int340X/processor_thermal_device: add support for MMIO RAPL")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Christophe JAILLET [Sun, 21 Jul 2019 10:25:58 +0000 (12:25 +0200)]
ALSA: line6: Fix a typo
s/Vairax/Variax/
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Zhengyuan Liu [Mon, 22 Jul 2019 02:23:27 +0000 (10:23 +0800)]
io_uring: track io length in async_list based on bytes
We are using PAGE_SIZE as the unit to determine if the total len in
async_list has exceeded max_pages, it's not fair for smaller io sizes.
For example, if we are doing 1k-size io streams, we will never exceed
max_pages since len >>= PAGE_SHIFT always gets zero. So use original
bytes to make it more accurate.
Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Sat, 20 Jul 2019 14:37:31 +0000 (08:37 -0600)]
io_uring: don't use iov_iter_advance() for fixed buffers
Hrvoje reports that when a large fixed buffer is registered and IO is
being done to the latter pages of said buffer, the IO submission time
is much worse:
reading to the start of the buffer: 11238 ns
reading to the end of the buffer:
1039879 ns
In fact, it's worse by two orders of magnitude. The reason for that is
how io_uring figures out how to setup the iov_iter. We point the iter
at the first bvec, and then use iov_iter_advance() to fast-forward to
the offset within that buffer we need.
However, that is abysmally slow, as it entails iterating the bvecs
that we setup as part of buffer registration. There's really no need
to use this generic helper, as we know it's a BVEC type iterator, and
we also know that each bvec is PAGE_SIZE in size, apart from possibly
the first and last. Hence we can just use a shift on the offset to
find the right index, and then adjust the iov_iter appropriately.
After this fix, the timings are:
reading to the start of the buffer: 10135 ns
reading to the end of the buffer: 1377 ns
Or about an 755x improvement for the tail page.
Reported-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Tested-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 16 Jul 2019 19:56:42 +0000 (13:56 -0600)]
block: properly handle IOCB_NOWAIT for async O_DIRECT IO
A caller is supposed to pass in REQ_NOWAIT if we can't block for any
given operation, but O_DIRECT for block devices just ignore this. Hence
we'll block for various resource shortages on the block layer side,
like having to wait for requests.
Use the new REQ_NOWAIT_INLINE to ask for this error to be returned
inline, so we can handle it appropriately and return -EAGAIN to the
caller.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 16 Jul 2019 19:55:23 +0000 (13:55 -0600)]
blk-mq: allow REQ_NOWAIT to return an error inline
By default, if a caller sets REQ_NOWAIT and we need to block, we'll
return -EAGAIN through the bio->bi_end_io() callback. For some use
cases, this makes it hard to use.
Allow a caller to ask for inline return of errors related to
blocking by also setting REQ_NOWAIT_INLINE.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Eric Dumazet [Fri, 19 Jul 2019 18:52:33 +0000 (11:52 -0700)]
tcp: be more careful in tcp_fragment()
Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.
We should allow these flows to make progress.
This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.
It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15
Note for < 4.15 backports :
tcp_rtx_queue_tail() will probably look like :
static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
{
struct sk_buff *skb = tcp_send_head(sk);
return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
}
Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Christoph Paasch <cpaasch@apple.com>
Cc: Jonathan Looney <jtl@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Fri, 19 Jul 2019 17:33:51 +0000 (17:33 +0000)]
hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
There is an extra rcu_read_unlock left in netvsc_recv_callback(),
after a previous patch that removes RCU from this function.
This patch removes the extra RCU unlock.
Fixes: 345ac08990b8 ("hv_netvsc: pass netvsc_device to receive callback")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Neuling [Fri, 19 Jul 2019 05:05:02 +0000 (15:05 +1000)]
powerpc/tm: Fix oops on sigreturn on systems without TM
On systems like P9 powernv where we have no TM (or P8 booted with
ppc_tm=off), userspace can construct a signal context which still has
the MSR TS bits set. The kernel tries to restore this context which
results in the following crash:
Unexpected TM Bad Thing exception at
c0000000000022fc (msr 0x8000000102a03031) tm_scratch=
800000020280f033
Oops: Unrecoverable exception, sig: 6 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 1636 Comm: sigfuz Not tainted
5.2.0-11043-g0a8ad0ffa4 #69
NIP:
c0000000000022fc LR:
00007fffb2d67e48 CTR:
0000000000000000
REGS:
c00000003fffbd70 TRAP: 0700 Not tainted (
5.2.0-11045-g7142b497d8)
MSR:
8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]> CR:
42004242 XER:
00000000
CFAR:
c0000000000022e0 IRQMASK: 0
GPR00:
0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669
GPR04:
00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8
GPR08:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12:
0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000
GPR16:
00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420
GPR20:
00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000
GPR24:
00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000
GPR28:
00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728
NIP [
c0000000000022fc] rfi_flush_fallback+0x7c/0x80
LR [
00007fffb2d67e48] 0x7fffb2d67e48
Call Trace:
Instruction dump:
e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00
e94d0c08 e96d0c10 e82d0c18 7db242a6 <
4c000024>
7db243a6 7db142a6 f82d0c18
The problem is the signal code assumes TM is enabled when
CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as
with P9 powernv or if `ppc_tm=off` is used on P8.
This means any local user can crash the system.
Fix the problem by returning a bad stack frame to the user if they try
to set the MSR TS bits with sigreturn() on systems where TM is not
supported.
Found with sigfuz kernel selftest on P9.
This fixes CVE-2019-13648.
Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9
Reported-by: Praveen Pandey <Praveen.Pandey@in.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org
Robert Karszniewicz [Sat, 20 Jul 2019 13:16:52 +0000 (15:16 +0200)]
hwmon: (k8temp) documentation: update URL of datasheet
The old URL is dead.
Signed-off-by: Robert Karszniewicz <avoidr@firemail.cc>
Link: https://lore.kernel.org/r/7139bc7707c24bd4dd7eb323e2da90105a3de9c1.1563522498.git.avoidr@firemail.cc
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Björn Gerhart [Mon, 15 Jul 2019 16:33:55 +0000 (18:33 +0200)]
hwmon: (nct6775) Fix register address and added missed tolerance for nct6106
Fixed address of third NCT6106_REG_WEIGHT_DUTY_STEP, and
added missed NCT6106_REG_TOLERANCE_H.
Fixes: 6c009501ff200 ("hwmon: (nct6775) Add support for NCT6102D/6106D")
Signed-off-by: Bjoern Gerhart <gerhart@posteo.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Lei YU [Thu, 11 Jul 2019 02:44:48 +0000 (10:44 +0800)]
hwmon: (occ) Fix division by zero issue
The code in occ_get_powr_avg() invokes div64_u64() without checking the
divisor. In case the divisor is zero, kernel gets an "Division by zero
in kernel" error.
Check the divisor and make it return 0 if the divisor is 0.
Fixes: c10e753d43eb ("hwmon (occ): Add sensor types and versions")
Signed-off-by: Lei YU <mine260309@gmail.com>
Reviewed-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/1562813088-23708-1-git-send-email-mine260309@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Linus Torvalds [Sun, 21 Jul 2019 21:05:38 +0000 (14:05 -0700)]
Linus 5.3-rc1
Peter Kosyh [Fri, 19 Jul 2019 08:11:47 +0000 (11:11 +0300)]
vrf: make sure skb->data contains ip header to make routing
vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
using ip/ipv6 addresses, but don't make sure the header is available
in skb->data[] (skb_headlen() is less then header size).
Case:
1) igb driver from intel.
2) Packet size is greater then 255.
3) MPLS forwards to VRF device.
So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound()
functions.
Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Thu, 18 Jul 2019 04:26:46 +0000 (07:26 +0300)]
connector: remove redundant input callback from cn_dev
A small cleanup: this callback is never used.
Originally fixed by Stanislav Kinsburskiy <skinsbursky@virtuozzo.com>
for OpenVZ7 bug OVZ-6877
cc: stanislav.kinsburskiy@gmail.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frederick Lawler [Thu, 18 Jul 2019 02:07:42 +0000 (21:07 -0500)]
qed: Prefer pcie_capability_read_word()
Commit
8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frederick Lawler [Thu, 18 Jul 2019 02:07:39 +0000 (21:07 -0500)]
igc: Prefer pcie_capability_read_word()
Commit
8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frederick Lawler [Thu, 18 Jul 2019 02:07:36 +0000 (21:07 -0500)]
cxgb4: Prefer pcie_capability_read_word()
Commit
8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability")
added accessors for the PCI Express Capability so that drivers didn't
need to be aware of differences between v1 and v2 of the PCI
Express Capability.
Replace pci_read_config_word() and pci_write_config_word() calls with
pcie_capability_read_word() and pcie_capability_write_word().
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Thu, 18 Jul 2019 01:42:18 +0000 (10:42 +0900)]
be2net: Synchronize be_update_queues with dev_watchdog
As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
ethtool set_channels operation is started. be_tx_timeout(), which dumps
some queue structures, is not written to run concurrently with
be_update_queues(), which frees/allocates those queues structures. Add some
synchronization between the two.
Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brian King [Mon, 15 Jul 2019 21:41:50 +0000 (16:41 -0500)]
bnx2x: Prevent load reordering in tx completion processing
This patch fixes an issue seen on Power systems with bnx2x which results
in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb
pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons
load in bnx2x_tx_int. Adding a read memory barrier resolves the issue.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sun, 21 Jul 2019 16:50:08 +0000 (18:50 +0200)]
net: phy: sfp: hwmon: Fix scaling of RX power
The RX power read from the SFP uses units of 0.1uW. This must be
scaled to units of uW for HWMON. This requires a divide by 10, not the
current 100.
With this change in place, sensors(1) and ethtool -m agree:
sff2-isa-0000
Adapter: ISA adapter
in0: +3.23 V
temp1: +33.1 C
power1: 270.00 uW
power2: 200.00 uW
curr1: +0.01 A
Laser output power : 0.2743 mW / -5.62 dBm
Receiver signal average optical power : 0.2014 mW / -6.96 dBm
Reported-by: chris.healy@zii.aero
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 1323061a018a ("net: phy: sfp: Add HWMON support for module sensors")
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Sun, 21 Jul 2019 14:44:12 +0000 (17:44 +0300)]
net: sched: verify that q!=NULL before setting q->flags
In function int tc_new_tfilter() q pointer can be NULL when adding filter
on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after
filter creation, following NULL pointer dereference happens in case parent
block is shared:
[ 212.925060] BUG: kernel NULL pointer dereference, address:
0000000000000010
[ 212.925445] #PF: supervisor write access in kernel mode
[ 212.925709] #PF: error_code(0x0002) - not-present page
[ 212.925965] PGD
8000000827923067 P4D
8000000827923067 PUD
827924067 PMD 0
[ 212.926302] Oops: 0002 [#1] SMP KASAN PTI
[ 212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G B 5.2.0+ #512
[ 212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40
[ 212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f
b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8
[ 212.928607] RSP: 0018:
ffff88884fd5f5d8 EFLAGS:
00010296
[ 212.928905] RAX:
0000000000000000 RBX:
0000000000000000 RCX:
dffffc0000000000
[ 212.929201] RDX:
0000000000000007 RSI:
0000000000000004 RDI:
0000000000000297
[ 212.929402] RBP:
ffff88886bedd600 R08:
ffffffffb91d4b51 R09:
fffffbfff7616e4d
[ 212.929609] R10:
fffffbfff7616e4c R11:
ffffffffbb0b7263 R12:
ffff88886bc61040
[ 212.929803] R13:
ffff88884fd5f950 R14:
ffffc900039c5000 R15:
ffff88835e927680
[ 212.929999] FS:
00007fe7c50b6480(0000) GS:
ffff88886f980000(0000) knlGS:
0000000000000000
[ 212.930235] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 212.930394] CR2:
0000000000000010 CR3:
000000085bd04002 CR4:
00000000001606e0
[ 212.930588] Call Trace:
[ 212.930682] ? tc_del_tfilter+0xa40/0xa40
[ 212.930811] ? __lock_acquire+0x5b5/0x2460
[ 212.930948] ? find_held_lock+0x85/0xa0
[ 212.931081] ? tc_del_tfilter+0xa40/0xa40
[ 212.931201] rtnetlink_rcv_msg+0x4ab/0x5f0
[ 212.931332] ? rtnl_dellink+0x490/0x490
[ 212.931454] ? lockdep_hardirqs_on+0x260/0x260
[ 212.931589] ? netlink_deliver_tap+0xab/0x5a0
[ 212.931717] ? match_held_lock+0x1b/0x240
[ 212.931844] netlink_rcv_skb+0xd0/0x200
[ 212.931958] ? rtnl_dellink+0x490/0x490
[ 212.932079] ? netlink_ack+0x440/0x440
[ 212.932205] ? netlink_deliver_tap+0x161/0x5a0
[ 212.932335] ? lock_downgrade+0x360/0x360
[ 212.932457] ? lock_acquire+0xe5/0x210
[ 212.932579] netlink_unicast+0x296/0x350
[ 212.932705] ? netlink_attachskb+0x390/0x390
[ 212.932834] ? _copy_from_iter_full+0xe0/0x3a0
[ 212.932976] netlink_sendmsg+0x394/0x600
[ 212.937998] ? netlink_unicast+0x350/0x350
[ 212.943033] ? move_addr_to_kernel.part.0+0x90/0x90
[ 212.948115] ? netlink_unicast+0x350/0x350
[ 212.953185] sock_sendmsg+0x96/0xa0
[ 212.958099] ___sys_sendmsg+0x482/0x520
[ 212.962881] ? match_held_lock+0x1b/0x240
[ 212.967618] ? copy_msghdr_from_user+0x250/0x250
[ 212.972337] ? lock_downgrade+0x360/0x360
[ 212.976973] ? rwlock_bug.part.0+0x60/0x60
[ 212.981548] ? __mod_node_page_state+0x1f/0xa0
[ 212.986060] ? match_held_lock+0x1b/0x240
[ 212.990567] ? find_held_lock+0x85/0xa0
[ 212.994989] ? do_user_addr_fault+0x349/0x5b0
[ 212.999387] ? lock_downgrade+0x360/0x360
[ 213.003713] ? find_held_lock+0x85/0xa0
[ 213.007972] ? __fget_light+0xa1/0xf0
[ 213.012143] ? sockfd_lookup_light+0x91/0xb0
[ 213.016165] __sys_sendmsg+0xba/0x130
[ 213.020040] ? __sys_sendmsg_sock+0xb0/0xb0
[ 213.023870] ? handle_mm_fault+0x337/0x470
[ 213.027592] ? page_fault+0x8/0x30
[ 213.031316] ? lockdep_hardirqs_off+0xbe/0x100
[ 213.034999] ? mark_held_locks+0x24/0x90
[ 213.038671] ? do_syscall_64+0x1e/0xe0
[ 213.042297] do_syscall_64+0x74/0xe0
[ 213.045828] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 213.049354] RIP: 0033:0x7fe7c527c7b8
[ 213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f
0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54
[ 213.060269] RSP: 002b:
00007ffc3f7908a8 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
[ 213.064144] RAX:
ffffffffffffffda RBX:
000000005d34716f RCX:
00007fe7c527c7b8
[ 213.068094] RDX:
0000000000000000 RSI:
00007ffc3f790910 RDI:
0000000000000003
[ 213.072109] RBP:
0000000000000000 R08:
0000000000000001 R09:
00007fe7c5340cc0
[ 213.076113] R10:
0000000000404ec2 R11:
0000000000000246 R12:
0000000000000080
[ 213.080146] R13:
0000000000480640 R14:
0000000000000080 R15:
0000000000000000
[ 213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common
\e[<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc
lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi
_ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe
r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas
[ 213.112326] CR2:
0000000000000010
[ 213.117429] ---[ end trace
adb58eb0a4ee6283 ]---
Verify that q pointer is not NULL before setting the 'flags' field.
Fixes: 3f05e6886a59 ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Sun, 21 Jul 2019 13:16:05 +0000 (15:16 +0200)]
chelsio: Fix a typo in a function name
It is likely that 'my3216_poll()' should be 'my3126_poll()'. (1 and 2
switched in 3126.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Navid Emamdoost [Sun, 21 Jul 2019 06:37:31 +0000 (01:37 -0500)]
allocate_flower_entry: should check for null deref
allocate_flower_entry does not check for allocation success, but tries
to deref the result. I only moved the spin_lock under null check, because
the caller is checking allocation's status at line 652.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Sun, 21 Jul 2019 13:08:31 +0000 (15:08 +0200)]
net: hns3: typo in the name of a constant
All constant in 'enum HCLGE_MBX_OPCODE' start with HCLGE, except
'HLCGE_MBX_PUSH_VLAN_INFO' (C and L switched)
s/HLC/HCL/
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeremy Sowden [Sun, 21 Jul 2019 11:31:05 +0000 (12:31 +0100)]
kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.
net/netfilter/nf_tables_offload.h includes net/netfilter/nf_tables.h
which is itself on the blacklist.
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe JAILLET [Sun, 21 Jul 2019 10:38:11 +0000 (12:38 +0200)]
tipc: Fix a typo
s/tipc_toprsv_listener_data_ready/tipc_topsrv_listener_data_ready/
(r and s switched in topsrv)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 21 Jul 2019 18:39:05 +0000 (11:39 -0700)]
Merge tag 'mac80211-for-davem-2019-07-20' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
We have a handful of fixes:
* ignore bad CW parameters if we aren't using them,
instead of warning
* fix operation (and then build) with the new netlink vendor
command policy requirement
* fix a memory leak in an error path when setting beacons
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 21 Jul 2019 17:28:39 +0000 (10:28 -0700)]
Merge tag 'devicetree-fixes-for-5.3' of git://git./linux/kernel/git/robh/linux
Pull Devicetree fixes from Rob Herring:
"Fix several warnings/errors in validation of binding schemas"
* tag 'devicetree-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: pinctrl: stm32: Fix missing 'clocks' property in examples
dt-bindings: iio: ad7124: Fix dtc warnings in example
dt-bindings: iio: avia-hx711: Fix avdd-supply typo in example
dt-bindings: pinctrl: aspeed: Fix AST2500 example errors
dt-bindings: pinctrl: aspeed: Fix 'compatible' schema errors
dt-bindings: riscv: Limit cpus schema to only check RiscV 'cpu' nodes
dt-bindings: Ensure child nodes are of type 'object'
Linus Torvalds [Sun, 21 Jul 2019 17:09:43 +0000 (10:09 -0700)]
Merge branch 'work.misc' of git://git./linux/kernel/git/viro/vfs
Pull vfs documentation typo fix from Al Viro.
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
typo fix: it's d_make_root, not d_make_inode...
Linus Torvalds [Sun, 21 Jul 2019 17:01:17 +0000 (10:01 -0700)]
Merge tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Two fixes for stable, one that had dependency on earlier patch in this
merge window and can now go in, and a perf improvement in SMB3 open"
* tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module number
cifs: flush before set-info if we have writeable handles
smb3: optimize open to not send query file internal info
cifs: copy_file_range needs to strip setuid bits and update timestamps
CIFS: fix deadlock in cached root handling
Qian Cai [Thu, 11 Jul 2019 16:17:45 +0000 (12:17 -0400)]
iommu/amd: fix a crash in iova_magazine_free_pfns
The commit
b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops
method") incorrectly changed the checking from dma_ops_alloc_iova() in
map_sg() causes a crash under memory pressure as dma_ops_alloc_iova()
never return DMA_MAPPING_ERROR on failure but 0, so the error handling
is all wrong.
kernel BUG at drivers/iommu/iova.c:801!
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0
Call Trace:
free_cpu_cached_iovas+0xbd/0x150
alloc_iova_fast+0x8c/0xba
dma_ops_alloc_iova.isra.6+0x65/0xa0
map_sg+0x8c/0x2a0
scsi_dma_map+0xc6/0x160
pqi_aio_submit_io+0x1f6/0x440 [smartpqi]
pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi]
scsi_queue_rq+0x79c/0x1200
blk_mq_dispatch_rq_list+0x4dc/0xb70
blk_mq_sched_dispatch_requests+0x249/0x310
__blk_mq_run_hw_queue+0x128/0x200
blk_mq_run_work_fn+0x27/0x30
process_one_work+0x522/0xa10
worker_thread+0x63/0x5b0
kthread+0x1d2/0x1f0
ret_from_fork+0x22/0x40
Fixes: b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops method")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Tue, 30 Apr 2019 14:27:50 +0000 (17:27 +0300)]
hexagon: switch to generic version of pte allocation
The hexagon implementation pte_alloc_one(), pte_alloc_one_kernel(),
pte_free_kernel() and pte_free() is identical to the generic except of
lack of __GFP_ACCOUNT for the user PTEs allocation.
Switch hexagon to use generic version of these functions.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 21 Jul 2019 16:46:59 +0000 (09:46 -0700)]
Merge tag 'ntb-5.3' of git://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"New feature to add support for NTB virtual MSI interrupts, the ability
to test and use this feature in the NTB transport layer.
Also, bug fixes for the AMD and Switchtec drivers, as well as some
general patches"
* tag 'ntb-5.3' of git://github.com/jonmason/ntb: (22 commits)
NTB: Describe the ntb_msi_test client in the documentation.
NTB: Add MSI interrupt support to ntb_transport
NTB: Add ntb_msi_test support to ntb_test
NTB: Introduce NTB MSI Test Client
NTB: Introduce MSI library
NTB: Rename ntb.c to support multiple source files in the module
NTB: Introduce functions to calculate multi-port resource index
NTB: Introduce helper functions to calculate logical port number
PCI/switchtec: Add module parameter to request more interrupts
PCI/MSI: Support allocating virtual MSI interrupts
ntb_hw_switchtec: Fix setup MW with failure bug
ntb_hw_switchtec: Skip unnecessary re-setup of shared memory window for crosslink case
ntb_hw_switchtec: Remove redundant steps of switchtec_ntb_reinit_peer() function
NTB: correct ntb_dev_ops and ntb_dev comment typos
NTB: amd: Silence shift wrapping warning in amd_ntb_db_vector_mask()
ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev()
NTB: ntb_transport: Ensure qp->tx_mw_dma_addr is initaliazed
NTB: ntb_hw_amd: set peer limit register
NTB: ntb_perf: Clear stale values in doorbell and command SPAD register
NTB: ntb_perf: Disable NTB link after clearing peer XLAT registers
...
Helge Deller [Sat, 20 Jul 2019 22:55:48 +0000 (00:55 +0200)]
parisc: Flush ITLB in flush_tlb_all_local() only on split TLB machines
flush_tlb_all_local() flushes the ITLB and DTLB of the CPU.
In case the machine does not have separate ITLBs and DTLBs, use the
alternative functionality to replace the code which flushes the ITLB
with nops while keeping the code which flushes the DTLB.
Signed-off-by: Helge Deller <deller@gmx.de>
Sven Schnelle [Sun, 21 Jul 2019 09:00:39 +0000 (11:00 +0200)]
parisc: add kprobe_fault_handler()
Add kprobe_fault_handler() to fix compilation for PA-RISC.
On PA-RISC we actually don't need that function as the recovery counter
is restored after interrupt. See the PA-RISC 2.0 Architecture Manual,
pg. 4-8, Figure 4-4: "Interruption Processing".
Fixes: b98cca444d28 ("mm, kprobes: generalize and rename notify_page_fault() as kprobe_page_fault()")
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Al Viro [Sun, 21 Jul 2019 03:17:30 +0000 (23:17 -0400)]
typo fix: it's d_make_root, not d_make_inode...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>