Xiao Ni [Sat, 20 Oct 2018 00:09:25 +0000 (08:09 +0800)]
MD: Memory leak when flush bio size is zero
flush_pool is leaked when flush bio size is zero
Fixes: 5a409b4f56d5 ("MD: fix lock contention for flush bios")
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Jack Wang [Fri, 19 Oct 2018 14:21:31 +0000 (16:21 +0200)]
md: fix memleak for mempool
I noticed kmemleak report memory leak when run create/stop
md in a loop, backtrace:
[<
000000001ca975e7>] mempool_create_node+0x86/0xd0
[<
0000000095576bcd>] md_run+0x1057/0x1410 [md_mod]
[<
000000007b45c5fc>] do_md_run+0x15/0x130 [md_mod]
[<
000000001ede9ec0>] md_ioctl+0x1f49/0x25d0 [md_mod]
[<
000000004142cacf>] blkdev_ioctl+0x680/0xd00
The root cause is we alloc mddev->flush_pool and
mddev->flush_bio_pool in md_run, but from do_md_stop
will not call into md_stop but __md_stop, move the
mempool_destroy to __md_stop fixes the problem for me.
The bug was introduced in
5a409b4f56d5, the fixes should go to
4.18+
Fixes: 5a409b4f56d5 ("MD: fix lock contention for flush bios")
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Guoqing Jiang [Thu, 18 Oct 2018 08:37:48 +0000 (16:37 +0800)]
md-cluster: remove suspend_info
Previously, we allow multiple nodes can resync device, but we
had changed it to only support one node can do resync at one
time, but suspend_info is still used.
Now, let's remove the structure and use suspend_lo/hi to record
the range.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Guoqing Jiang [Thu, 18 Oct 2018 08:37:47 +0000 (16:37 +0800)]
md-cluster: send BITMAP_NEEDS_SYNC message if reshaping is interrupted
We need to continue the reshaping if it was interrupted in
original node. So original node should call resync_bitmap
in case reshaping is aborted.
Then BITMAP_NEEDS_SYNC message is broadcasted to other nodes,
node which continues the reshaping should restart reshape from
mddev->reshape_position instead of from the first beginning.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Guoqing Jiang [Thu, 18 Oct 2018 08:37:46 +0000 (16:37 +0800)]
md-cluster/bitmap: don't call md_bitmap_sync_with_cluster during reshaping stage
When reshape is happening in one node, other nodes could receive
lots of RESYNCING messages, so md_bitmap_sync_with_cluster is called.
Since the resyncing window is typically small in these RESYNCING
messages, so WARN is always triggered, so we should not call the
func when reshape is happening.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Guoqing Jiang [Thu, 18 Oct 2018 08:37:45 +0000 (16:37 +0800)]
md-cluster/raid10: don't call remove_and_add_spares during reshaping stage
remove_and_add_spares is not needed if reshape is
happening in another node, because raid10_add_disk
called inside raid10_start_reshape would handle the
role changes of disk. Plus, remove_and_add_spares
can't deal with the role change due to reshape.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Guoqing Jiang [Thu, 18 Oct 2018 08:37:44 +0000 (16:37 +0800)]
md-cluster/raid10: call update_size in md_reap_sync_thread
We need to change the capacity in all nodes after one node
finishs reshape. And as we did before, we can't change the
capacity directly in md_do_sync, instead, the capacity should
be only changed in update_size or received CHANGE_CAPACITY
msg.
So master node calls update_size after completes reshape in
md_reap_sync_thread, but we need to skip ops->update_size if
MD_CLOSING is set since reshaping could not be finish.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Guoqing Jiang [Thu, 18 Oct 2018 08:37:43 +0000 (16:37 +0800)]
md-cluster: introduce resync_info_get interface for sanity check
Since the resync region from suspend_info means one node
is reshaping this area, so the position of reshape_progress
should be included in the area.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Guoqing Jiang [Thu, 18 Oct 2018 08:37:42 +0000 (16:37 +0800)]
md-cluster/raid10: support add disk under grow mode
For clustered raid10 scenario, we need to let all the nodes
know about that a new disk is added to the array, and the
reshape caused by add new member just need to be happened in
one node, but other nodes should know about the change.
Since reshape means read data from somewhere (which is already
used by array) and write data to unused region. Obviously, it
is awful if one node is reading data from address while another
node is writing to the same address. Considering we have
implemented suspend writes in the resyncing area, so we can
just broadcast the reading address to other nodes to avoid the
trouble.
For master node, it would call reshape_request then update sb
during the reshape period. To avoid above trouble, we call
resync_info_update to send RESYNC message in reshape_request.
Then from slave node's view, it receives two type messages:
1. RESYNCING message
Slave node add the address (where master node reading data from)
to suspend list.
2. METADATA_UPDATED message
Once slave nodes know the reshaping is started in master node,
it is time to update reshape position and call start_reshape to
follow master node's step. After reshape is done, only reshape
position is need to be updated, so the majority task of reshaping
is happened on the master node.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Guoqing Jiang [Thu, 18 Oct 2018 08:37:41 +0000 (16:37 +0800)]
md-cluster/raid10: resize all the bitmaps before start reshape
To support add disk under grow mode, we need to resize
all the bitmaps of each node before reshape, so that we
can ensure all nodes have the same view of the bitmap of
the clustered raid.
So after the master node resized the bitmap, it broadcast
a message to other slave nodes, and it checks the size of
each bitmap are same or not by compare pages. We can only
continue the reshaping after all nodes update the bitmap
to the same size (by checking the pages), otherwise revert
bitmap size to previous value.
The resize_bitmaps interface and BITMAP_RESIZE message are
introduced in md-cluster.c for the purpose.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Shaohua Li [Mon, 15 Oct 2018 00:05:07 +0000 (17:05 -0700)]
MD: fix invalid stored role for a disk - try2
Commit
d595567dc4f0 (MD: fix invalid stored role for a disk) broke linear
hotadd. Let's only fix the role for disks in raid1/10.
Based on Guoqing's original patch.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Jack Wang [Mon, 8 Oct 2018 15:24:03 +0000 (17:24 +0200)]
md/bitmap: use mddev_suspend/resume instead of ->quiesce()
After
9e1cc0a54556 ("md: use mddev_suspend/resume instead of ->quiesce()")
We still have similar left in bitmap functions.
Replace quiesce() with mddev_suspend/resume.
Also move md_bitmap_create out of mddev_suspend. and move mddev_resume
after md_bitmap_destroy. as we did in set_bitmap_file.
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Colin Ian King [Mon, 8 Oct 2018 21:16:09 +0000 (22:16 +0100)]
md: remove redundant code that is no longer reachable
And earlier commit removed the error label to two statements that
are now never reachable. Since this code is now dead code, remove it.
Detected by CoverityScan, CID#
1462409 ("Structurally dead code")
Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Shaohua Li <shli@fb.com>
NeilBrown [Wed, 3 Oct 2018 05:04:41 +0000 (15:04 +1000)]
md: allow metadata updates while suspending an array - fix
Commit
35bfc52187f6 ("md: allow metadata update while suspending.")
added support for allowing md_check_recovery() to still perform
metadata updates while the array is entering the 'suspended' state.
This is needed to allow the processes of entering the state to
complete.
Unfortunately, the patch doesn't really work. The test for
"mddev->suspended" at the start of md_check_recovery() means that the
function doesn't try to do anything at all while entering suspend.
This patch moves the code of updating the metadata while suspending to
*before* the test on mddev->suspended.
Reported-by: Jeff Mahoney <jeffm@suse.com>
Fixes: 35bfc52187f6 ("md: allow metadata update while suspending.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Shaohua Li [Tue, 2 Oct 2018 01:36:36 +0000 (18:36 -0700)]
MD: fix invalid stored role for a disk
If we change the number of array's device after device is removed from array,
then add the device back to array, we can see that device is added as active
role instead of spare which we expected.
Please see the below link for details:
https://marc.info/?l=linux-raid&m=
153736982015076&w=2
This is caused by that we prefer to use device's previous role which is
recorded by saved_raid_disk, but we should respect the new number of
conf->raid_disks since it could be changed after device is removed.
Reported-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Tested-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Acked-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Alex Wu [Fri, 21 Sep 2018 08:05:03 +0000 (16:05 +0800)]
md/raid10: Fix raid10 replace hang when new added disk faulty
[Symptom]
Resync thread hang when new added disk faulty during replacing.
[Root Cause]
In raid10_sync_request(), we expect to issue a bio with callback
end_sync_read(), and a bio with callback end_sync_write().
In normal situation, we will add resyncing sectors into
mddev->recovery_active when raid10_sync_request() returned, and sub
resynced sectors from mddev->recovery_active when end_sync_write()
calls end_sync_request().
If new added disk, which are replacing the old disk, is set faulty,
there is a race condition:
1. In the first rcu protected section, resync thread did not detect
that mreplace is set faulty and pass the condition.
2. In the second rcu protected section, mreplace is set faulty.
3. But, resync thread will prepare the read object first, and then
check the write condition.
4. It will find that mreplace is set faulty and do not have to
prepare write object.
This cause we add resync sectors but never sub it.
[How to Reproduce]
This issue can be easily reproduced by the following steps:
mdadm -C /dev/md0 --assume-clean -l 10 -n 4 /dev/sd[abcd]
mdadm /dev/md0 -a /dev/sde
mdadm /dev/md0 --replace /dev/sdd
sleep 1
mdadm /dev/md0 -f /dev/sde
[How to Fix]
This issue can be fixed by using local variables to record the result
of test conditions. Once the conditions are satisfied, we can make sure
that we need to issue a bio for read and a bio for write.
Previous 'commit
24afd80d99f8 ("md/raid10: handle recovery of
replacement devices.")' will also check whether bio is NULL, but leave
the comment saying that it is a pointless test. So we remove this dummy
check.
Reported-by: Alex Chen <alexchen@synology.com>
Reviewed-by: Allen Peng <allenpeng@synology.com>
Reviewed-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: Alex Wu <alexwu@synology.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Mariusz Tkaczyk [Tue, 4 Sep 2018 13:08:30 +0000 (15:08 +0200)]
raid5: block failing device if raid will be failed
Currently there is an inconsistency for failing the member drives
for arrays with different RAID levels. For RAID456 - there is a possibility
to fail all of the devices. However - for other RAID levels - kernel blocks
removing the member drive, if the operation results in array's FAIL state
(EBUSY is returned). For example - removing last drive from RAID1 is not
possible.
This kind of blocker was never implemented for raid456 and we cannot see
the reason why.
We had tested following patch and did not observe any regression, so do you
have any comments/reasons for current approach, or we can send the proper
patch for this?
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Greg Kroah-Hartman [Fri, 28 Sep 2018 16:55:17 +0000 (18:55 +0200)]
Merge tag 'drm-fixes-2018-09-28' of git://anongit.freedesktop.org/drm/drm
Dave writes:
"drm fixes for 4.19-rc6
Looks like a pretty normal week for graphics,
core: syncobj fix, panel link regression revert
amd: suspend/resume fixes, EDID emulation fix
mali-dp: NV12 writeback and vblank reset fixes
etnaviv: DMA setup fix"
* tag 'drm-fixes-2018-09-28' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Fix Edid emulation for linux
drm/amd/display: Fix Vega10 lightup on S3 resume
drm/amdgpu: Fix vce work queue was not cancelled when suspend
Revert "drm/panel: Add device_link from panel device to DRM device"
drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
drm/malidp: Fix writeback in NV12
drm: mali-dp: Call drm_crtc_vblank_reset on device init
drm/etnaviv: add DMA configuration for etnaviv platform device
Greg Kroah-Hartman [Fri, 28 Sep 2018 16:53:22 +0000 (18:53 +0200)]
Merge tag 'riscv-for-linus-4.19-rc6' of git://git./linux/kernel/git/palmer/riscv-linux
Palmer writes:
"A Single RISC-V Update for 4.19-rc6
The Debian guys have been pushing on our port and found some
unversioned symbols leaking into modules. This PR contains a single
fix for that issue."
* tag 'riscv-for-linus-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISC-V: include linux/ftrace.h in asm-prototypes.h
Greg Kroah-Hartman [Fri, 28 Sep 2018 16:20:41 +0000 (18:20 +0200)]
Merge tag 'pci-v4.19-fixes-2' of ssh://gitolite./linux/kernel/git/helgaas/pci
Bjorn writes:
"PCI fixes:
- Fix ACPI hotplug issue that causes black screen crash at boot (Mika
Westerberg)
- Fix DesignWare "scheduling while atomic" issues (Jisheng Zhang)
- Add PPC contacts to MAINTAINERS for PCI core error handling (Bjorn
Helgaas)
- Sort Mobiveil MAINTAINERS entry (Lorenzo Pieralisi)"
* tag 'pci-v4.19-fixes-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
PCI: dwc: Fix scheduling while atomic issues
MAINTAINERS: Move mobiveil PCI driver entry where it belongs
MAINTAINERS: Update PPC contacts for PCI core error handling
Dave Airlie [Thu, 27 Sep 2018 23:30:11 +0000 (09:30 +1000)]
Merge branch 'drm-fixes-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just a few fixes for 4.19:
- Couple of suspend/resume fixes
- Fix EDID emulation with DC
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180927155418.2813-1-alexander.deucher@amd.com
Dave Airlie [Thu, 27 Sep 2018 23:25:26 +0000 (09:25 +1000)]
Merge tag 'drm-misc-fixes-2018-09-27-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
- Revert adding device-link to panels
- Don't leak fences in drm/syncobj
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20180927152712.GA53076@art_vandelay
Greg Kroah-Hartman [Thu, 27 Sep 2018 19:53:55 +0000 (21:53 +0200)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Jason writes:
"Second RDMA rc pull request
- Fix a long standing race bug when destroying comp_event file descriptors
- srp, hfi1, bnxt_re: Various driver crashes from missing validation
and other cases
- Fixes for regressions in patches merged this window in the gid
cache, devx, ucma and uapi."
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/core: Set right entry state before releasing reference
IB/mlx5: Destroy the DEVX object upon error flow
IB/uverbs: Free uapi on destroy
RDMA/bnxt_re: Fix system crash during RDMA resource initialization
IB/hfi1: Fix destroy_qp hang after a link down
IB/hfi1: Fix context recovery when PBC has an UnsupportedVL
IB/hfi1: Invalid user input can result in crash
IB/hfi1: Fix SL array bounds check
RDMA/uverbs: Fix validity check for modify QP
IB/srp: Avoid that sg_reset -d ${srp_device} triggers an infinite loop
ucma: fix a use-after-free in ucma_resolve_ip()
RDMA/uverbs: Atomically flush and mark closed the comp event queue
cxgb4: fix abort_req_rss6 struct
Greg Kroah-Hartman [Thu, 27 Sep 2018 19:16:24 +0000 (21:16 +0200)]
Merge tag 'for_v4.19-rc6' of git://git./linux/kernel/git/jack/linux-fs
Jan writes:
"an ext2 patch fixing fsync(2) for DAX mounts."
* tag 'for_v4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2, dax: set ext2_dax_aops for dax files
Bhawanpreet Lakha [Wed, 26 Sep 2018 17:42:10 +0000 (13:42 -0400)]
drm/amd/display: Fix Edid emulation for linux
[Why]
EDID emulation didn't work properly for linux, as we stop programming
if nothing is connected physically.
[How]
We get a flag from DRM when we want to do edid emulation. We check if
this flag is true and nothing is connected physically, if so we only
program the front end using VIRTUAL_SIGNAL.
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Wed, 26 Sep 2018 17:42:16 +0000 (13:42 -0400)]
drm/amd/display: Fix Vega10 lightup on S3 resume
[Why]
There have been a few reports of Vega10 display remaining blank
after S3 resume. The regression is caused by workaround for mode
change on Vega10 - skip set_bandwidth if stream count is 0.
As a result we skipped dispclk reset on suspend, thus on resume
we may skip the clock update assuming it hasn't been changed.
On some systems it causes display blank or 'out of range'.
[How]
Revert "drm/amd/display: Fix Vega10 black screen after mode change"
Verified that it hadn't cause mode change regression.
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Sun peng Li <Sunpeng.Li@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rex Zhu [Thu, 27 Sep 2018 12:48:39 +0000 (20:48 +0800)]
drm/amdgpu: Fix vce work queue was not cancelled when suspend
The vce cancel_delayed_work_sync never be called.
driver call the function in error path.
This caused the A+A suspend hang when runtime pm enebled.
As we will visit the smu in the idle queue. this will cause
smu hang because the dgpu has been suspend, and the dgpu also
will be waked up. As the smu has been hang, so the dgpu resume
will failed.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Linus Walleij [Thu, 27 Sep 2018 12:41:30 +0000 (14:41 +0200)]
Revert "drm/panel: Add device_link from panel device to DRM device"
This reverts commit
0c08754b59da5557532d946599854e6df28edc22.
commit
0c08754b59da
("drm/panel: Add device_link from panel device to DRM device")
creates a circular dependency under these circumstances:
1. The panel depends on dsi-host because it is MIPI-DSI child
device.
2. dsi-host depends on the drm parent device (connector->dev->dev)
this should be allowed.
3. drm parent dev (connector->dev->dev) depends on the panel
after this patch.
This makes the dependency circular and while it appears it
does not affect any in-tree drivers (they do not seem to have
dsi hosts depending on the same parent device) this does not
seem right.
As noted in a response from Andrzej Hajda, the intent is
likely to make the panel dependent on the DRM device
(connector->dev) not its parent. But we have no way of
doing that since the DRM device doesn't contain any
struct device on its own (arguably it should).
Revert this until a proper approach is figured out.
Cc: Jyri Sarha <jsarha@ti.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20180927124130.9102-1-linus.walleij@linaro.org
Dave Airlie [Thu, 27 Sep 2018 00:49:44 +0000 (10:49 +1000)]
Merge branch 'for-upstream/malidp-fixes' of git://linux-arm.org/linux-ld into drm-fixes
Fix NV12 writeback and fix vblank reset.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Liviu Dudau <Liviu.Dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180921112354.GR936@e110455-lin.cambridge.arm.com
Dave Airlie [Thu, 27 Sep 2018 00:19:26 +0000 (10:19 +1000)]
Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes
one fix to get a proper DMA configuration in place for the etnaviv
virtual device. I'm sending this as a fix, as a dma-mapping change at
the ARC architecture side during the 4.19 cycle broke etnaviv on this
platform, which gets remedied with this patch, but it also enables
ARM64.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/ea1f712bf09bf9439c6b092bf2c2bde7bb01cf5e.camel@pengutronix.de
Mika Westerberg [Wed, 26 Sep 2018 20:39:28 +0000 (15:39 -0500)]
ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
HP 6730b laptop has an ethernet NIC connected to one of the PCIe root
ports. The root ports themselves are native PCIe hotplug capable. Now,
during boot after PCI devices are scanned the BIOS triggers ACPI bus check
directly to the NIC:
ACPI: \_SB_.PCI0.RP06.NIC_: Bus check in hotplug_event()
It is not clear why it is sending bus check but regardless the ACPI hotplug
notify handler calls enable_slot() directly (instead of going through
acpiphp_check_bridge() as there is no bridge), which ends up handling
special case for non-hotplug bridges with native PCIe hotplug. This
results a crash of some kind but the reporter only sees black screen so it
is hard to figure out the exact spot and what actually happens. Based on
a few fix proposals it was tracked to crash somewhere inside
pci_assign_unassigned_bridge_resources().
In any case we should not really be in that special branch at all because
the ACPI notify happened to a slot that is not a PCI bridge (it is just a
regular PCI device).
Fix this so that we only go to that special branch if we are calling
enable_slot() for a bridge (e.g., the ACPI notification was for the
bridge).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201127
Fixes: 84c8b58ed3ad ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug")
Reported-by: Peter Anemone <peter.anemone@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: stable@vger.kernel.org # v4.18+
Jason Ekstrand [Wed, 26 Sep 2018 07:17:03 +0000 (02:17 -0500)]
drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
We attempt to get fences earlier in the hopes that everything will
already have fences and no callbacks will be needed. If we do succeed
in getting a fence, getting one a second time will result in a duplicate
ref with no unref. This is causing memory leaks in Vulkan applications
that create a lot of fences; playing for a few hours can, apparently,
bring down the system.
Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107899
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20180926071703.15257-1-jason.ekstrand@intel.com
Greg Kroah-Hartman [Wed, 26 Sep 2018 11:08:53 +0000 (13:08 +0200)]
Merge tag 'iommu-fixes-v4.19-rc5' of git://git./linux/kernel/git/joro/iommu
Joerg writes:
"IOMMU Fixes for Linux v4.19-rc5
Three fixes queued up:
- Warning fix for Rockchip IOMMU where there were IRQ handlers
for offlined hardware.
- Fix for Intel VT-d because recent changes caused boot failures
on some machines because it tried to allocate to much
contiguous memory.
- Fix for AMD IOMMU to handle eMMC devices correctly that appear
as ACPI HID devices."
* tag 'iommu-fixes-v4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Return devid as alias for ACPI HID devices
iommu/vt-d: Handle memory shortage on pasid table allocation
iommu/rockchip: Free irqs in shutdown handler
Arindam Nath [Tue, 18 Sep 2018 10:10:58 +0000 (15:40 +0530)]
iommu/amd: Return devid as alias for ACPI HID devices
ACPI HID devices do not actually have an alias for
them in the IVRS. But dev_data->alias is still used
for indexing into the IOMMU device table for devices
being handled by the IOMMU. So for ACPI HID devices,
we simply return the corresponding devid as an alias,
as parsed from IVRS table.
Signed-off-by: Arindam Nath <arindam.nath@amd.com>
Fixes: 2bf9a0a12749 ('iommu/amd: Add iommu support for ACPI HID devices')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Parav Pandit [Tue, 25 Sep 2018 09:10:40 +0000 (12:10 +0300)]
RDMA/core: Set right entry state before releasing reference
Currently add_modify_gid() for IB link layer has followong issue
in cache update path.
When GID update event occurs, core releases reference to the GID
table without updating its state and/or entry pointer.
CPU-0 CPU-1
------ -----
ib_cache_update() IPoIB ULP
add_modify_gid() [..]
put_gid_entry()
refcnt = 0, but
state = valid,
entry is valid.
(work item is not yet executed).
ipoib_create_ah()
rdma_create_ah()
rdma_get_gid_attr() <--
Tries to acquire gid_attr
which has refcnt = 0.
This is incorrect.
GID entry state and entry pointer is provides the accurate GID enty
state. Such fields must be updated with rwlock to protect against
readers and, such fields must be in sane state before refcount can drop
to zero. Otherwise above race condition can happen leading to
use-after-free situation.
Following backtrace has been observed when cache update for an IB port
is triggered while IPoIB ULP is creating an AH.
Therefore, when updating GID entry, first mark a valid entry as invalid
through state and set the barrier so that no callers can acquired
the GID entry, followed by release reference to it.
refcount_t: increment on 0; use-after-free.
WARNING: CPU: 4 PID: 29106 at lib/refcount.c:153 refcount_inc_checked+0x30/0x50
Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
RIP: 0010:refcount_inc_checked+0x30/0x50
RSP: 0018:
ffff8802ad36f600 EFLAGS:
00010082
RAX:
0000000000000000 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
0000000000000002 RSI:
0000000000000008 RDI:
ffffffff86710100
RBP:
ffff8802d6e60a30 R08:
ffffed005d67bf8b R09:
ffffed005d67bf8b
R10:
0000000000000001 R11:
ffffed005d67bf8a R12:
ffff88027620cee8
R13:
ffff8802d6e60988 R14:
ffff8802d6e60a78 R15:
0000000000000202
FS:
0000000000000000(0000) GS:
ffff8802eb200000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f3ab35e5c88 CR3:
00000002ce84a000 CR4:
00000000000006e0
IPv6: ADDRCONF(NETDEV_CHANGE): ib1: link becomes ready
Call Trace:
rdma_get_gid_attr+0x220/0x310 [ib_core]
? lock_acquire+0x145/0x3a0
rdma_fill_sgid_attr+0x32c/0x470 [ib_core]
rdma_create_ah+0x89/0x160 [ib_core]
? rdma_fill_sgid_attr+0x470/0x470 [ib_core]
? ipoib_create_ah+0x52/0x260 [ib_ipoib]
ipoib_create_ah+0xf5/0x260 [ib_ipoib]
ipoib_mcast_join_complete+0xbbe/0x2540 [ib_ipoib]
Fixes: b150c3862d21 ("IB/core: Introduce GID entry reference counts")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas [Tue, 25 Sep 2018 09:11:12 +0000 (12:11 +0300)]
IB/mlx5: Destroy the DEVX object upon error flow
Upon DEVX object creation the object must be destroyed upon a follows
error flow.
Fixes: 7efce3691d33 ("IB/mlx5: Add obj create and destroy functionality")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mark Bloch [Tue, 25 Sep 2018 08:23:55 +0000 (11:23 +0300)]
IB/uverbs: Free uapi on destroy
Make sure we free struct uverbs_api once we clean the radix tree. It was
allocated by uverbs_alloc_api().
Fixes: 9ed3e5f44772 ("IB/uverbs: Build the specs into a radix tree at runtime")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Greg Kroah-Hartman [Tue, 25 Sep 2018 19:37:41 +0000 (21:37 +0200)]
erge tag 'libnvdimm-fixes-4.19-rc6' of git://git./linux/kernel/git/nvdimm/nvdimm
Dan writes:
"libnvdimm/dax for 4.19-rc6
* (2) fixes for the dax error handling updates that were merged for
v4.19-rc1. My mails to Al have been bouncing recently, so I do not have
his ack but the uaccess change is of the trivial / obviously correct
variety. The address_space_operations fixes a regression.
* A filesystem-dax fix to correct the zero page lookup to be compatible
with non-x86 (mips and s390) architectures."
* tag 'libnvdimm-fixes-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: Add missing address_space_operations
uaccess: Fix is_source param for check_copy_size() in copy_to_iter_mcsafe()
filesystem-dax: Fix use of zero page
Greg Kroah-Hartman [Tue, 25 Sep 2018 16:14:14 +0000 (18:14 +0200)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
James writes:
"SCSI fixes on
20180925
Nine obvious bug fixes mostly in individual drivers. The target fix
is of particular importance because it's CVE related."
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: don't crash the host on invalid commands
scsi: ipr: System hung while dlpar adding primary ipr adapter back
scsi: target: iscsi: Use bin2hex instead of a re-implementation
scsi: target: iscsi: Use hex2bin instead of a re-implementation
scsi: lpfc: Synchronize access to remoteport via rport
scsi: ufs: Disable blk-mq for now
scsi: sd: Contribute to randomness when running rotational device
scsi: ibmvscsis: Ensure partition name is properly NUL terminated
scsi: ibmvscsis: Fix a stringop-overflow warning
Greg Kroah-Hartman [Tue, 25 Sep 2018 15:54:55 +0000 (17:54 +0200)]
Merge tag 'usb-4.19-rc6' of git://git./linux/kernel/git/gregkh/usb
I wrote:
"USB fixes for 4.19-rc6
Here are some small USB core and driver fixes for reported issues for
4.19-rc6.
The most visible is the oops fix for when the USB core is built into the
kernel that is present in 4.18. Turns out not many people actually do
that so it went unnoticed for a while. The rest is some tiny typec,
musb, and other core fixes.
All have been in linux-next with no reported issues."
* tag 'usb-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: mux: Take care of driver module reference counting
usb: core: safely deal with the dynamic quirk lists
usb: roles: Take care of driver module reference counting
USB: handle NULL config in usb_find_alt_setting()
USB: fix error handling in usb_driver_claim_interface()
USB: remove LPM management from usb_driver_claim_interface()
USB: usbdevfs: restore warning for nonsensical flags
USB: usbdevfs: sanitize flags more
Revert "usb: cdc-wdm: Fix a sleep-in-atomic-context bug in service_outstanding_interrupt()"
usb: musb: dsps: do not disable CPPI41 irq in driver teardown
Greg Kroah-Hartman [Tue, 25 Sep 2018 15:22:50 +0000 (17:22 +0200)]
Merge tag 'tty-4.19-rc6' of git://git./linux/kernel/git/gregkh/tty
I wrote:
"TTY/Serial driver fixes for 4.19-rc6
Here are a number of small tty and serial driver fixes for reported
issues for 4.19-rc6.
One should hopefully resolve a much-reported issue that syzbot has found
in the tty layer. Although there are still more issues there, getting
this fixed is nice to see finally happen.
All of these have been in linux-next for a while with no reported
issues."
* tag 'tty-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: imx: restore handshaking irq for imx1
tty: vt_ioctl: fix potential Spectre v1
tty: Drop tty->count on tty_reopen() failure
serial: cpm_uart: return immediately from console poll
tty: serial: lpuart: avoid leaking struct tty_struct
serial: mvebu-uart: Fix reporting of effective CSIZE to userspace
Greg Kroah-Hartman [Tue, 25 Sep 2018 15:01:52 +0000 (17:01 +0200)]
Merge tag 'char-misc-4.19-rc6' of git://git./linux/kernel/git/gregkh/char-misc
Greg (well I), wrote:
"Char/Misc driver fixes for 4.19-rc6
Here are some soundwire and intel_th (tracing) driver fixes for some
reported issues.
All of these have been in linux-next for a week with no reported issues."
* tag 'char-misc-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
intel_th: pci: Add Ice Lake PCH support
intel_th: Fix resource handling for ACPI glue layer
intel_th: Fix device removal logic
soundwire: Fix acquiring bus lock twice during master release
soundwire: Fix incorrect exit after configuring stream
soundwire: Fix duplicate stream state assignment
Lu Baolu [Sat, 8 Sep 2018 01:42:53 +0000 (09:42 +0800)]
iommu/vt-d: Handle memory shortage on pasid table allocation
Pasid table memory allocation could return failure due to memory
shortage. Limit the pasid table size to 1MiB because current 8MiB
contiguous physical memory allocation can be hard to come by. W/o
a PASID table, the device could continue to work with only shared
virtual memory impacted. So, let's go ahead with context mapping
even the memory allocation for pasid table failed.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107783
Fixes: cc580e41260d ("iommu/vt-d: Per PCI device pasid table interfaces")
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Reported-and-tested-by: Pelton Kyle D <kyle.d.pelton@intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lubomir Rintel [Mon, 24 Sep 2018 12:18:34 +0000 (13:18 +0100)]
Revert "uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name"
This changes UAPI, breaking iwd and libell:
ell/key.c: In function 'kernel_dh_compute':
ell/key.c:205:38: error: 'struct keyctl_dh_params' has no member named 'private'; did you mean 'dh_private'?
struct keyctl_dh_params params = { .private = private,
^~~~~~~
dh_private
This reverts commit
8a2336e549d385bb0b46880435b411df8d8200e8.
Fixes: 8a2336e549d3 ("uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name")
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Randy Dunlap <rdunlap@infradead.org>
cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
cc: Stephan Mueller <smueller@chronox.de>
cc: James Morris <jmorris@namei.org>
cc: "Serge E. Hallyn" <serge@hallyn.com>
cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: <stable@vger.kernel.org>
Signed-off-by: James Morris <james.morris@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Tue, 25 Sep 2018 09:19:28 +0000 (11:19 +0200)]
Merge gitolite./pub/scm/linux/kernel/git/davem/net
Dave writes:
"Networking fixes:
1) Fix multiqueue handling of coalesce timer in stmmac, from Jose
Abreu.
2) Fix memory corruption in NFC, from Suren Baghdasaryan.
3) Don't write reserved bits in ravb driver, from Kazuya Mizuguchi.
4) SMC bug fixes from Karsten Graul, YueHaibing, and Ursula Braun.
5) Fix TX done race in mvpp2, from Antoine Tenart.
6) ipv6 metrics leak, from Wei Wang.
7) Adjust firmware version requirements in mlxsw, from Petr Machata.
8) Fix autonegotiation on resume in r8169, from Heiner Kallweit.
9) Fixed missing entries when dumping /proc/net/if_inet6, from Jeff
Barnhill.
10) Fix double free in devlink, from Dan Carpenter.
11) Fix ethtool regression from UFO feature removal, from Maciej
Żenczykowski.
12) Fix drivers that have a ndo_poll_controller() that captures the
cpu entirely on loaded hosts by trying to drain all rx and tx
queues, from Eric Dumazet.
13) Fix memory corruption with jumbo frames in aquantia driver, from
Friedemann Gerold."
* gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (79 commits)
net: mvneta: fix the remaining Rx descriptor unmapping issues
ip_tunnel: be careful when accessing the inner header
mpls: allow routes on ip6gre devices
net: aquantia: memory corruption on jumbo frames
tun: remove ndo_poll_controller
nfp: remove ndo_poll_controller
bnxt: remove ndo_poll_controller
bnx2x: remove ndo_poll_controller
mlx5: remove ndo_poll_controller
mlx4: remove ndo_poll_controller
i40evf: remove ndo_poll_controller
ice: remove ndo_poll_controller
igb: remove ndo_poll_controller
ixgb: remove ndo_poll_controller
fm10k: remove ndo_poll_controller
ixgbevf: remove ndo_poll_controller
ixgbe: remove ndo_poll_controller
bonding: use netpoll_poll_dev() helper
netpoll: make ndo_poll_controller() optional
rds: Fix build regression.
...
Heiko Stuebner [Mon, 27 Aug 2018 10:56:24 +0000 (12:56 +0200)]
iommu/rockchip: Free irqs in shutdown handler
In the iommu's shutdown handler we disable runtime-pm which could
result in the irq-handler running unclocked and since commit
3fc7c5c0cff3 ("iommu/rockchip: Handle errors returned from PM framework")
we warn about that fact.
This can cause warnings on shutdown on some Rockchip machines, so
free the irqs in the shutdown handler before we disable runtime-pm.
Reported-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Fixes: 3fc7c5c0cff3 ("iommu/rockchip: Handle errors returned from PM framework")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
James Cowgill [Thu, 6 Sep 2018 21:57:56 +0000 (22:57 +0100)]
RISC-V: include linux/ftrace.h in asm-prototypes.h
Building a riscv kernel with CONFIG_FUNCTION_TRACER and
CONFIG_MODVERSIONS enabled results in these two warnings:
MODPOST vmlinux.o
WARNING: EXPORT symbol "return_to_handler" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.
When exporting symbols from an assembly file, the MODVERSIONS code
requires their prototypes to be defined in asm-prototypes.h (see
scripts/Makefile.build). Since both of these symbols have prototypes
defined in linux/ftrace.h, include this header from RISC-V's
asm-prototypes.h.
Reported-by: Karsten Merker <merker@debian.org>
Signed-off-by: James Cowgill <jcowgill@debian.org>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Antoine Tenart [Mon, 24 Sep 2018 14:56:13 +0000 (16:56 +0200)]
net: mvneta: fix the remaining Rx descriptor unmapping issues
With CONFIG_DMA_API_DEBUG enabled we get DMA unmapping warning in
various places of the mvneta driver, for example when putting down an
interface while traffic is passing through.
The issue is when using s/w buffer management, the Rx buffers are mapped
using dma_map_page but unmapped with dma_unmap_single. This patch fixes
this by using the right unmapping function.
Fixes: 562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 24 Sep 2018 13:48:19 +0000 (15:48 +0200)]
ip_tunnel: be careful when accessing the inner header
Cong noted that we need the same checks introduced by commit
76c0ddd8c3a6
("ip6_tunnel: be careful when accessing the inner header")
even for ipv4 tunnels.
Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saif Hasan [Fri, 21 Sep 2018 21:30:05 +0000 (14:30 -0700)]
mpls: allow routes on ip6gre devices
Summary:
This appears to be necessary and sufficient change to enable `MPLS` on
`ip6gre` tunnels (RFC4023).
This diff allows IP6GRE devices to be recognized by MPLS kernel module
and hence user can configure interface to accept packets with mpls
headers as well setup mpls routes on them.
Test Plan:
Test plan consists of multiple containers connected via GRE-V6 tunnel.
Then carrying out testing steps as below.
- Carry out necessary sysctl settings on all containers
```
sysctl -w net.mpls.platform_labels=65536
sysctl -w net.mpls.ip_ttl_propagate=1
sysctl -w net.mpls.conf.lo.input=1
```
- Establish IP6GRE tunnels
```
ip -6 tunnel add name if_1_2_1 mode ip6gre \
local 2401:db00:21:6048:feed:0::1 \
remote 2401:db00:21:6048:feed:0::2 key 1
ip link set dev if_1_2_1 up
sysctl -w net.mpls.conf.if_1_2_1.input=1
ip -4 addr add 169.254.0.2/31 dev if_1_2_1 scope link
ip -6 tunnel add name if_1_3_1 mode ip6gre \
local 2401:db00:21:6048:feed:0::1 \
remote 2401:db00:21:6048:feed:0::3 key 1
ip link set dev if_1_3_1 up
sysctl -w net.mpls.conf.if_1_3_1.input=1
ip -4 addr add 169.254.0.4/31 dev if_1_3_1 scope link
```
- Install MPLS encap rules on node-1 towards node-2
```
ip route add 192.168.0.11/32 nexthop encap mpls 32/64 \
via inet 169.254.0.3 dev if_1_2_1
```
- Install MPLS forwarding rules on node-2 and node-3
```
// node2
ip -f mpls route add 32 via inet 169.254.0.7 dev if_2_4_1
// node3
ip -f mpls route add 64 via inet 169.254.0.12 dev if_4_3_1
```
- Ping 192.168.0.11 (node4) from 192.168.0.1 (node1) (where routing
towards 192.168.0.1 is via IP route directly towards node1 from node4)
```
ping 192.168.0.11
```
- tcpdump on interface to capture ping packets wrapped within MPLS
header which inturn wrapped within IP6GRE header
```
16:43:41.121073 IP6
2401:db00:21:6048:feed::1 > 2401:db00:21:6048:feed::2:
DSTOPT GREv0, key=0x1, length 100:
MPLS (label 32, exp 0, ttl 255) (label 64, exp 0, [S], ttl 255)
IP 192.168.0.1 > 192.168.0.11:
ICMP echo request, id 1208, seq 45, length 64
0x0000: 6000 2cdb 006c 3c3f 2401 db00 0021 6048 `.,..l<?$....!`H
0x0010: feed 0000 0000 0001 2401 db00 0021 6048 ........$....!`H
0x0020: feed 0000 0000 0002 2f00 0401 0401 0100 ......../.......
0x0030: 2000 8847 0000 0001 0002 00ff 0004 01ff ...G............
0x0040: 4500 0054 3280 4000 ff01 c7cb c0a8 0001 E..T2.@.........
0x0050: c0a8 000b 0800 a8d7 04b8 002d 2d3c a05b ...........--<.[
0x0060: 0000 0000 bcd8 0100 0000 0000 1011 1213 ................
0x0070: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0080: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()*+,-./0123
0x0090: 3435 3637 4567
```
Signed-off-by: Saif Hasan <has@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Sep 2018 17:08:45 +0000 (10:08 -0700)]
Merge git://git./pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2018-09-24
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Several fixes for BPF sockmap to only allow sockets being attached in
ESTABLISHED state, from John.
2) Fix up the license to LGPL/BSD for the libc compat header which contains
fallback helpers that libbpf and bpftool is using, from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Selvin Xavier [Fri, 21 Sep 2018 05:33:00 +0000 (22:33 -0700)]
RDMA/bnxt_re: Fix system crash during RDMA resource initialization
bnxt_re_ib_reg acquires and releases the rtnl lock whenever it accesses
the L2 driver.
The following sequence can trigger a crash
Acquires the rtnl_lock ->
Registers roce driver callback with L2 driver ->
release the rtnl lock
bnxt_re acquires the rtnl_lock ->
Request for MSIx vectors ->
release the rtnl_lock
Issue happens when bnxt_re proceeds with remaining part of initialization
and L2 driver invokes bnxt_ulp_irq_stop as a part of bnxt_open_nic.
The crash is in bnxt_qplib_nq_stop_irq as the NQ structures are
not initialized yet,
<snip>
[ 3551.726647] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 3551.726656] IP: [<
ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re]
[ 3551.726674] PGD 0
[ 3551.726679] Oops: 0002 1 SMP
...
[ 3551.726822] Hardware name: Dell Inc. PowerEdge R720/08RW36, BIOS 2.4.3 07/09/2014
[ 3551.726826] task:
ffff97e30eec5ee0 ti:
ffff97e3173bc000 task.ti:
ffff97e3173bc000
[ 3551.726829] RIP: 0010:[<
ffffffffc0840ee9>] [<
ffffffffc0840ee9>]
bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re]
...
[ 3551.726872] Call Trace:
[ 3551.726886] [<
ffffffffc082cb9e>] bnxt_re_stop_irq+0x4e/0x70 [bnxt_re]
[ 3551.726899] [<
ffffffffc07d6a53>] bnxt_ulp_irq_stop+0x43/0x70 [bnxt_en]
[ 3551.726908] [<
ffffffffc07c82f4>] bnxt_reserve_rings+0x174/0x1e0 [bnxt_en]
[ 3551.726917] [<
ffffffffc07cafd8>] __bnxt_open_nic+0x368/0x9a0 [bnxt_en]
[ 3551.726925] [<
ffffffffc07cb62b>] bnxt_open_nic+0x1b/0x50 [bnxt_en]
[ 3551.726934] [<
ffffffffc07cc62f>] bnxt_setup_mq_tc+0x11f/0x260 [bnxt_en]
[ 3551.726943] [<
ffffffffc07d5f58>] bnxt_dcbnl_ieee_setets+0xb8/0x1f0 [bnxt_en]
[ 3551.726954] [<
ffffffff890f983a>] dcbnl_ieee_set+0x9a/0x250
[ 3551.726966] [<
ffffffff88fd6d21>] ? __alloc_skb+0xa1/0x2d0
[ 3551.726972] [<
ffffffff890f72fa>] dcb_doit+0x13a/0x210
[ 3551.726981] [<
ffffffff89003ff7>] rtnetlink_rcv_msg+0xa7/0x260
[ 3551.726989] [<
ffffffff88ffdb00>] ? rtnl_unicast+0x20/0x30
[ 3551.726996] [<
ffffffff88bf9dc8>] ? __kmalloc_node_track_caller+0x58/0x290
[ 3551.727002] [<
ffffffff890f7326>] ? dcb_doit+0x166/0x210
[ 3551.727007] [<
ffffffff88fd6d0d>] ? __alloc_skb+0x8d/0x2d0
[ 3551.727012] [<
ffffffff89003f50>] ? rtnl_newlink+0x880/0x880
...
[ 3551.727104] [<
ffffffff8911f7d5>] system_call_fastpath+0x1c/0x21
...
[ 3551.727164] RIP [<
ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re]
[ 3551.727175] RSP <
ffff97e3173bf788>
[ 3551.727177] CR2:
0000000000000000
Avoid this inconsistent state and system crash by acquiring
the rtnl lock for the entire duration of device initialization.
Re-factor the code to remove the rtnl lock from the individual function
and acquire and release it from the caller.
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Fixes: 6e04b1035689 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Greg Kroah-Hartman [Mon, 24 Sep 2018 13:16:41 +0000 (15:16 +0200)]
Merge tag 'media/v4.19-2' of git://git./linux/kernel/git/mchehab/linux-media
Mauro briefly writes:
"media fixes for v4.19-rc5
some drivers and Kbuild fixes"
* tag 'media/v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: platform: fix cros-ec-cec build error
media: staging/media/mt9t031/Kconfig: remove bogus entry
media: i2c: mt9v111: Fix v4l2-ctrl error handling
media: camss: add missing includes
media: camss: Use managed memory allocations
media: camss: mark PM functions as __maybe_unused
media: af9035: prevent buffer overflow on write
media: video_function_calls.rst: drop obsolete video-set-attributes reference
Friedemann Gerold [Sat, 15 Sep 2018 15:03:39 +0000 (18:03 +0300)]
net: aquantia: memory corruption on jumbo frames
This patch fixes skb_shared area, which will be corrupted
upon reception of 4K jumbo packets.
Originally build_skb usage purpose was to reuse page for skb to eliminate
needs of extra fragments. But that logic does not take into account that
skb_shared_info should be reserved at the end of skb data area.
In case packet data consumes all the page (4K), skb_shinfo location
overflows the page. As a consequence, __build_skb zeroed shinfo data above
the allocated page, corrupting next page.
The issue is rarely seen in real life because jumbo are normally larger
than 4K and that causes another code path to trigger.
But it 100% reproducible with simple scapy packet, like:
sendp(IP(dst="192.168.100.3") / TCP(dport=443) \
/ Raw(RandString(size=(4096-40))), iface="enp1s0")
Fixes: 018423e90bee ("net: ethernet: aquantia: Add ring support code")
Reported-by: Friedemann Gerold <f.gerold@b-c-s.de>
Reported-by: Michael Rauch <michael@rauch.be>
Signed-off-by: Friedemann Gerold <f.gerold@b-c-s.de>
Tested-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Sep 2018 04:55:26 +0000 (21:55 -0700)]
Merge branch 'netpoll-avoid-capture-effects-for-NAPI-drivers'
Eric Dumazet says:
====================
netpoll: avoid capture effects for NAPI drivers
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC).
This capture, showing one ksoftirqd eating all cycles
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
It seems that all networking drivers that do use NAPI
for their TX completions, should not provide a ndo_poll_controller() :
Most NAPI drivers have netpoll support already handled
in core networking stack, since netpoll_poll_dev()
uses poll_napi(dev) to iterate through registered
NAPI contexts for a device.
This patch series take care of the first round, we will
handle other drivers in future rounds.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:52 +0000 (15:27 -0700)]
tun: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
tun uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:51 +0000 (15:27 -0700)]
nfp: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
nfp uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:50 +0000 (15:27 -0700)]
bnxt: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
bnxt uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:49 +0000 (15:27 -0700)]
bnx2x: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
bnx2x uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:48 +0000 (15:27 -0700)]
mlx5: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
mlx5 uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:47 +0000 (15:27 -0700)]
mlx4: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
mlx4 uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:46 +0000 (15:27 -0700)]
i40evf: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
i40evf uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:45 +0000 (15:27 -0700)]
ice: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
ice uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:44 +0000 (15:27 -0700)]
igb: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
igb uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:43 +0000 (15:27 -0700)]
ixgb: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
ixgb uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
This also removes a problematic use of disable_irq() in
a context it is forbidden, as explained in commit
af3e0fcf7887 ("8139too: Use disable_irq_nosync() in
rtl8139_poll_controller()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:42 +0000 (15:27 -0700)]
fm10k: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
lasts for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
fm10k uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:41 +0000 (15:27 -0700)]
ixgbevf: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
ixgbevf uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:40 +0000 (15:27 -0700)]
ixgbe: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
ixgbe uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Reported-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:39 +0000 (15:27 -0700)]
bonding: use netpoll_poll_dev() helper
We want to allow NAPI drivers to no longer provide
ndo_poll_controller() method, as it has been proven problematic.
team driver must not look at its presence, but instead call
netpoll_poll_dev() which factorize the needed actions.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Sep 2018 22:27:38 +0000 (15:27 -0700)]
netpoll: make ndo_poll_controller() optional
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
It seems that all networking drivers that do use NAPI
for their TX completions, should not provide a ndo_poll_controller().
NAPI drivers have netpoll support already handled
in core networking stack, since netpoll_poll_dev()
uses poll_napi(dev) to iterate through registered
NAPI contexts for a device.
This patch allows netpoll_poll_dev() to process NAPI
contexts even for drivers not providing ndo_poll_controller(),
allowing for following patches in NAPI drivers.
Also we export netpoll_poll_dev() so that it can be called
by bonding/team drivers in following patches.
Reported-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 23 Sep 2018 19:25:15 +0000 (12:25 -0700)]
rds: Fix build regression.
Use DECLARE_* not DEFINE_*
Fixes: 8360ed6745df ("RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats")
Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Kroah-Hartman [Sun, 23 Sep 2018 17:15:18 +0000 (19:15 +0200)]
Linux 4.19-rc5
Greg Kroah-Hartman [Sun, 23 Sep 2018 15:19:27 +0000 (17:19 +0200)]
Merge tag 'mfd-fixes-4.19' of git://git./linux/kernel/git/lee/mfd
Lee writes:
"MFD fixes for v4.19
- Fix Dialog DA9063 regulator constraints issue causing failure in
probe
- Fix OMAP Device Tree compatible strings to match DT"
* tag 'mfd-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: omap-usb-host: Fix dts probe of children
mfd: da9063: Fix DT probing with constraints
Greg Kroah-Hartman [Sun, 23 Sep 2018 11:32:19 +0000 (13:32 +0200)]
Merge tag 'for-linus-4.19d-rc5-tag' of git://git./linux/kernel/git/xen/tip
Juergen writes:
"xen:
Two small fixes for xen drivers."
* tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: issue warning message when out of grant maptrack entries
xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code
Greg Kroah-Hartman [Sun, 23 Sep 2018 06:33:28 +0000 (08:33 +0200)]
Merge tag 'for-linus-
20180922' of git://git.kernel.dk/linux-block
Jens writes:
"Just a single fix in this pull request, fixing a regression in
/proc/diskstats caused by the unification of timestamps."
* tag 'for-linus-
20180922' of git://git.kernel.dk/linux-block:
block: use nanosecond resolution for iostat
Greg Kroah-Hartman [Sun, 23 Sep 2018 06:10:12 +0000 (08:10 +0200)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Thomas writes:
"A set of fixes for x86:
- Resolve the kvmclock regression on AMD systems with memory
encryption enabled. The rework of the kvmclock memory allocation
during early boot results in encrypted storage, which is not
shareable with the hypervisor. Create a new section for this data
which is mapped unencrypted and take care that the later
allocations for shared kvmclock memory is unencrypted as well.
- Fix the build regression in the paravirt code introduced by the
recent spectre v2 updates.
- Ensure that the initial static page tables cover the fixmap space
correctly so early console always works. This worked so far by
chance, but recent modifications to the fixmap layout can -
depending on kernel configuration - move the relevant entries to a
different place which is not covered by the initial static page
tables.
- Address the regressions and issues which got introduced with the
recent extensions to the Intel Recource Director Technology code.
- Update maintainer entries to document reality"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Expand static page table for fixmap space
MAINTAINERS: Add X86 MM entry
x86/intel_rdt: Add Reinette as co-maintainer for RDT
MAINTAINERS: Add Borislav to the x86 maintainers
x86/paravirt: Fix some warning messages
x86/intel_rdt: Fix incorrect loop end condition
x86/intel_rdt: Fix exclusive mode handling of MBA resource
x86/intel_rdt: Fix incorrect loop end condition
x86/intel_rdt: Do not allow pseudo-locking of MBA resource
x86/intel_rdt: Fix unchecked MSR access
x86/intel_rdt: Fix invalid mode warning when multiple resources are managed
x86/intel_rdt: Global closid helper to support future fixes
x86/intel_rdt: Fix size reporting of MBA resource
x86/intel_rdt: Fix data type in parsing callbacks
x86/kvm: Use __bss_decrypted attribute in shared variables
x86/mm: Add .bss..decrypted section to hold shared variables
Greg Kroah-Hartman [Sun, 23 Sep 2018 06:09:16 +0000 (08:09 +0200)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Thomas writes:
"- Provide a strerror_r wrapper so lib/bpf can be built on systems
without _GNU_SOURCE
- Unbreak the man page generator when building out of tree"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf Documentation: Fix out-of-tree asciidoctor man page generation
tools lib bpf: Provide wrapper for strerror_r to build in !_GNU_SOURCE systems
Greg Kroah-Hartman [Sun, 23 Sep 2018 06:06:54 +0000 (08:06 +0200)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Thomas writes:
"Make the EFI arm stub device tree loader default on to unbreak
existing EFI boot loaders which do not have DTB support."
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub/arm: default EFI_ARMSTUB_DTB_LOADER to y
Maciej Żenczykowski [Sat, 22 Sep 2018 08:34:01 +0000 (01:34 -0700)]
net-ethtool: ETHTOOL_GUFO did not and should not require CAP_NET_ADMIN
So it should not fail with EPERM even though it is no longer implemented...
This is a fix for:
(userns)$ egrep ^Cap /proc/self/status
CapInh:
0000003fffffffff
CapPrm:
0000003fffffffff
CapEff:
0000003fffffffff
CapBnd:
0000003fffffffff
CapAmb:
0000003fffffffff
(userns)$ tcpdump -i usb_rndis0
tcpdump: WARNING: usb_rndis0: SIOCETHTOOL(ETHTOOL_GUFO) ioctl failed: Operation not permitted
Warning: Kernel filter failed: Bad file descriptor
tcpdump: can't remove kernel filter: Bad file descriptor
With this change it returns EOPNOTSUPP instead of EPERM.
See also https://github.com/the-tcpdump-group/libpcap/issues/689
Fixes: 08a00fea6de2 "net: Remove references to NETIF_F_UFO from ethtool."
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Jiang [Mon, 10 Sep 2018 23:18:29 +0000 (16:18 -0700)]
device-dax: Add missing address_space_operations
With address_space_operations missing for device dax, namely the
.set_page_dirty, we hit a kernel warning when running destructive
ndctl unit test: make TESTS=device-dax check
WARNING: CPU: 3 PID: 7380 at fs/buffer.c:581 __set_page_dirty+0xb1/0xc0
Setting address_space_operations to noop_set_page_dirty and
noop_invalidatepage for device dax to prevent fallback to
__set_page_dirty_buffers() and block_invalidatepage() respectively.
Fixes: 2232c6382a ("device-dax: Enable page_mapping()")
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Nathan Chancellor [Fri, 21 Sep 2018 18:04:51 +0000 (11:04 -0700)]
RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats
Clang warns when two declarations' section attributes don't match.
net/rds/ib_stats.c:40:1: warning: section does not match previous
declaration [-Wsection]
DEFINE_PER_CPU_SHARED_ALIGNED(struct rds_ib_statistics, rds_ib_stats);
^
./include/linux/percpu-defs.h:142:2: note: expanded from macro
'DEFINE_PER_CPU_SHARED_ALIGNED'
DEFINE_PER_CPU_SECTION(type, name,
PER_CPU_SHARED_ALIGNED_SECTION) \
^
./include/linux/percpu-defs.h:93:9: note: expanded from macro
'DEFINE_PER_CPU_SECTION'
extern __PCPU_ATTRS(sec) __typeof__(type) name;
\
^
./include/linux/percpu-defs.h:49:26: note: expanded from macro
'__PCPU_ATTRS'
__percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
\
^
net/rds/ib.h:446:1: note: previous attribute is here
DECLARE_PER_CPU(struct rds_ib_statistics, rds_ib_stats);
^
./include/linux/percpu-defs.h:111:2: note: expanded from macro
'DECLARE_PER_CPU'
DECLARE_PER_CPU_SECTION(type, name, "")
^
./include/linux/percpu-defs.h:87:9: note: expanded from macro
'DECLARE_PER_CPU_SECTION'
extern __PCPU_ATTRS(sec) __typeof__(type) name
^
./include/linux/percpu-defs.h:49:26: note: expanded from macro
'__PCPU_ATTRS'
__percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
\
^
1 warning generated.
The initial definition was added in commit
ec16227e1414 ("RDS/IB:
Infiniband transport") and the cache aligned definition was added in
commit
e6babe4cc4ce ("RDS/IB: Stats and sysctls") right after. The
definition probably should have been updated in net/rds/ib.h, which is
what this patch does.
Link: https://github.com/ClangBuiltLinux/linux/issues/114
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Omar Sandoval [Fri, 21 Sep 2018 23:44:34 +0000 (16:44 -0700)]
block: use nanosecond resolution for iostat
Klaus Kusche reported that the I/O busy time in /proc/diskstats was not
updating properly on 4.18. This is because we started using ktime to
track elapsed time, and we convert nanoseconds to jiffies when we update
the partition counter. However, this gets rounded down, so any I/Os that
take less than a jiffy are not accounted for. Previously in this case,
the value of jiffies would sometimes increment while we were doing I/O,
so at least some I/Os were accounted for.
Let's convert the stats to use nanoseconds internally. We still report
milliseconds as before, now more accurately than ever. The value is
still truncated to 32 bits for backwards compatibility.
Fixes: 522a777566f5 ("block: consolidate struct request timestamp fields")
Cc: stable@vger.kernel.org
Reported-by: Klaus Kusche <klaus.kusche@computerix.info>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Nathan Chancellor [Fri, 21 Sep 2018 09:44:12 +0000 (02:44 -0700)]
net/mlx4: Use cpumask_available for eq->affinity_mask
Clang warns that the address of a pointer will always evaluated as true
in a boolean context:
drivers/net/ethernet/mellanox/mlx4/eq.c:243:11: warning: address of
array 'eq->affinity_mask' will always evaluate to 'true'
[-Wpointer-bool-conversion]
if (!eq->affinity_mask || cpumask_empty(eq->affinity_mask))
~~~~~^~~~~~~~~~~~~
1 warning generated.
Use cpumask_available, introduced in commit
f7e30f01a9e2 ("cpumask: Add
helper cpumask_available()"), which does the proper checking and avoids
this warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/86
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 21 Sep 2018 08:07:55 +0000 (11:07 +0300)]
devlink: double free in devlink_resource_fill()
Smatch reports that devlink_dpipe_send_and_alloc_skb() frees the skb
on error so this is a double free. We fixed a bunch of these bugs in
commit
7fe4d6dcbcb4 ("devlink: Remove redundant free on error path") but
we accidentally overlooked this one.
Fixes: d9f9b9a4d05f ("devlink: Add support for resource abstraction")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Fri, 21 Sep 2018 03:46:37 +0000 (11:46 +0800)]
net: apple: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
Found by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Fri, 21 Sep 2018 03:44:05 +0000 (11:44 +0800)]
net: i825xx: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
Found by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Fri, 21 Sep 2018 03:35:11 +0000 (11:35 +0800)]
net: wiznet: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
Found by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Fri, 21 Sep 2018 03:05:50 +0000 (11:05 +0800)]
net: sgi: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
Found by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Fri, 21 Sep 2018 03:02:37 +0000 (11:02 +0800)]
net: cirrus: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
Found by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Fri, 21 Sep 2018 02:53:47 +0000 (10:53 +0800)]
net: seeq: fix return type of ndo_start_xmit function
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
Found by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Fri, 21 Sep 2018 02:53:17 +0000 (02:53 +0000)]
PCI: hv: Fix return value check in hv_pci_assign_slots()
In case of error, the function pci_create_slot() returns ERR_PTR() and
never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().
Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Barnhill [Fri, 21 Sep 2018 00:45:27 +0000 (00:45 +0000)]
net/ipv6: Display all addresses in output of /proc/net/if_inet6
The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly
handle starting/stopping the iteration. The problem is that at some point
during the iteration, an overflow is detected and the process is
subsequently stopped. The item being shown via seq_printf() when the
overflow occurs is not actually shown, though. When start() is
subsequently called to resume iterating, it returns the next item, and
thus the item that was being processed when the overflow occurred never
gets printed.
Alter the meaning of the private data member "offset". Currently, when it
is not 0 (which only happens at the very beginning), "offset" represents
the next hlist item to be printed. After this change, "offset" always
represents the current item.
This is also consistent with the private data member "bucket", which
represents the current bucket, and also the use of "pos" as defined in
seq_file.txt:
The pos passed to start() will always be either zero, or the most
recent pos used in the previous session.
Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Tranchetti [Thu, 20 Sep 2018 20:29:45 +0000 (14:29 -0600)]
netlabel: check for IPV4MASK in addrinfo_get
netlbl_unlabel_addrinfo_get() assumes that if it finds the
NLBL_UNLABEL_A_IPV4ADDR attribute, it must also have the
NLBL_UNLABEL_A_IPV4MASK attribute as well. However, this is
not necessarily the case as the current checks in
netlbl_unlabel_staticadd() and friends are not sufficent to
enforce this.
If passed a netlink message with NLBL_UNLABEL_A_IPV4ADDR,
NLBL_UNLABEL_A_IPV6ADDR, and NLBL_UNLABEL_A_IPV6MASK attributes,
these functions will all call netlbl_unlabel_addrinfo_get() which
will then attempt dereference NULL when fetching the non-existent
NLBL_UNLABEL_A_IPV4MASK attribute:
Unable to handle kernel NULL pointer dereference at virtual address 0
Process unlab (pid: 31762, stack limit = 0xffffff80502d8000)
Call trace:
netlbl_unlabel_addrinfo_get+0x44/0xd8
netlbl_unlabel_staticremovedef+0x98/0xe0
genl_rcv_msg+0x354/0x388
netlink_rcv_skb+0xac/0x118
genl_rcv+0x34/0x48
netlink_unicast+0x158/0x1f0
netlink_sendmsg+0x32c/0x338
sock_sendmsg+0x44/0x60
___sys_sendmsg+0x1d0/0x2a8
__sys_sendmsg+0x64/0xb4
SyS_sendmsg+0x34/0x4c
el0_svc_naked+0x34/0x38
Code:
51001149 7100113f 540000a0 f9401508 (
79400108)
---[ end trace
f6438a488e737143 ]---
Kernel panic - not syncing: Fatal exception
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 22 Sep 2018 00:46:42 +0000 (02:46 +0200)]
Merge branch 'bpf-sockmap-estab-fixes'
John Fastabend says:
====================
Eric noted that using the close callback is not sufficient
to catch all transitions from ESTABLISHED state to a LISTEN
state. So this series does two things. First, only allow
adding socks in ESTABLISH state and second use unhash callback
to catch tcp_disconnect() transitions.
v2: added check for ESTABLISH state in hash update sockmap as well
v3: Do not release lock from unhash in error path, no lock was
used in the first place. And drop not so useful code comments
v4: convert,
if (unhash()) return unhash(); return
to if (unhash()) unhash(); return;
Thanks for reviewing Yonghong I carried your ACKs forward.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Tue, 18 Sep 2018 16:01:54 +0000 (09:01 -0700)]
bpf: test_maps, only support ESTABLISHED socks
Ensure that sockets added to a sock{map|hash} that is not in the
ESTABLISHED state is rejected.
Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Tue, 18 Sep 2018 16:01:49 +0000 (09:01 -0700)]
bpf: sockmap, fix transition through disconnect without close
It is possible (via shutdown()) for TCP socks to go trough TCP_CLOSE
state via tcp_disconnect() without actually calling tcp_close which
would then call our bpf_tcp_close() callback. Because of this a user
could disconnect a socket then put it in a LISTEN state which would
break our assumptions about sockets always being ESTABLISHED state.
To resolve this rely on the unhash hook, which is called in the
disconnect case, to remove the sock from the sockmap.
Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Tue, 18 Sep 2018 16:01:44 +0000 (09:01 -0700)]
bpf: sockmap only allow ESTABLISHED sock state
After this patch we only allow socks that are in ESTABLISHED state or
are being added via a sock_ops event that is transitioning into an
ESTABLISHED state. By allowing sock_ops events we allow users to
manage sockmaps directly from sock ops programs. The two supported
sock_ops ops are BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB and
BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB.
Similar to TLS ULP this ensures sk_user_data is correct.
Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Greg Kroah-Hartman [Fri, 21 Sep 2018 18:01:16 +0000 (20:01 +0200)]
Merge tag 'pinctrl-v4.19-3' of git://git./linux/kernel/git/linusw/linux-pinctrl
Linus writes:
"Pin control fixes for v4.19:
- Two fixes for the Intel pin controllers than cause
problems on laptops."
* tag 'pinctrl-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: intel: Do pin translation in other GPIO operations as well
pinctrl: cannonlake: Fix gpio base for GPP-E
Johannes Thumshirn [Fri, 21 Sep 2018 07:01:01 +0000 (09:01 +0200)]
scsi: sd: don't crash the host on invalid commands
When sd_init_command() get's a command with a unknown req_op() it crashes the
system via BUG().
This makes debugging the actual reason for the broken request cmd_flags pretty
hard as the system is down before it's able to write out debugging data on the
serial console or the trace buffer.
Change the BUG() to a WARN_ON() and return BLKPREP_KILL to fail gracefully and
return an I/O error to the producer of the request.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Wen Xiong [Fri, 21 Sep 2018 00:32:12 +0000 (19:32 -0500)]
scsi: ipr: System hung while dlpar adding primary ipr adapter back
While dlpar adding primary ipr adapter back, driver goes through adapter
initialization then schedule ipr_worker_thread to start te disk scan by
dropping the host lock, calling scsi_add_device. Then get the adapter reset
request again, so driver does scsi_block_requests, this will cause the
scsi_add_device get hung until we unblock. But we can't run ipr_worker_thread
to do the unblock because its stuck in scsi_add_device.
This patch fixes the issue.
[mkp: typo and whitespace fixes]
Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>