Paolo Abeni [Thu, 22 Jun 2017 13:01:22 +0000 (15:01 +0200)]
udp/v6: prefetch rmem_alloc in udp6_queue_rcv_skb()
very similar to commit
dd99e425be23 ("udp: prefetch
rmem_alloc in udp_queue_rcv_skb()"), this allows saving a cache
miss when the BH is bottle-neck for UDP over ipv6 packet
processing, e.g. for small packets when a single RX NIC ingress
queue is in use.
Performances under flood when multiple NIC RX queues used are
unaffected, but when a single NIC rx queue is in use, this
gives ~8% performance improvement.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Jun 2017 17:42:57 +0000 (13:42 -0400)]
Merge branch 'net-mvpp2-misc-improvements'
Thomas Petazzoni says:
====================
net: mvpp2: misc improvements
Here are a few patches making various small improvements/refactoring
in the mvpp2 driver. They are based on today's net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Thu, 22 Jun 2017 12:23:20 +0000 (14:23 +0200)]
net: mvpp2: remove mvpp2_pool_refill()
When all a function does is calling another function with the exact same
arguments, in the exact same order, you know it's time to remove said
function. Which is exactly what this commit does.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Thu, 22 Jun 2017 12:23:19 +0000 (14:23 +0200)]
net: mvpp2: remove unused mvpp2_bm_cookie_pool_set() function
This function is not used in the driver, remove it.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni [Thu, 22 Jun 2017 12:23:18 +0000 (14:23 +0200)]
net: mvpp2: add comments about smp_processor_id() usage
A previous commit modified a number of smp_processor_id() used in
migration-enabled contexts into get_cpu/put_cpu sections. However, a few
smp_processor_id() calls remain in the driver, and this commit adds
comments explaining why they can be kept.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Jun 2017 17:39:58 +0000 (13:39 -0400)]
Merge branch 'stmmac-pci-Refactor-DMI-probing'
Jan Kiszka says:
====================
stmmac: pci: Refactor DMI probing
Some cleanups of the way we probe DMI platforms in the driver. Reduces
a bit of open-coding and makes the logic easier reusable for any
potential DMI platform != Quark.
Tested on IOT2000 and Galileo Gen2.
Changes in v5:
- fixed a remaining issue in patch 5
- dropped patch 6 for now
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Kiszka [Thu, 22 Jun 2017 06:18:01 +0000 (08:18 +0200)]
stmmac: pci: Use dmi_system_id table for retrieving PHY addresses
Avoids reimplementation of DMI matching in stmmac_pci_find_phy_addr.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Kiszka [Thu, 22 Jun 2017 06:18:00 +0000 (08:18 +0200)]
stmmac: pci: Select quark_pci_dmi_data from quark_default_data
No need to carry this reference in stmmac_pci_info - the Quark-specific
setup handler knows that it needs to use the Quark-specific DMI table.
This also allows to drop the stmmac_pci_info reference from the setup
handler parameter list.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Kiszka [Thu, 22 Jun 2017 06:17:59 +0000 (08:17 +0200)]
stmmac: pci: Make stmmac_pci_find_phy_addr truly generic
Move the special case for the early Galileo firmware into
quark_default_setup. This allows to use stmmac_pci_find_phy_addr for
non-quark cases.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Kiszka [Thu, 22 Jun 2017 06:17:58 +0000 (08:17 +0200)]
stmmac: pci: Use stmmac_pci_info for all devices
Make stmmac_default_data compatible with stmmac_pci_info.setup and use
an info structure for all devices. This allows to make the probing more
regular.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Kiszka [Thu, 22 Jun 2017 06:17:57 +0000 (08:17 +0200)]
stmmac: pci: Make stmmac_pci_info structure constant
By removing the PCI device reference from the structure and passing it
as parameters to the interested functions, we can make quark_pci_info
const.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Wed, 21 Jun 2017 23:40:47 +0000 (16:40 -0700)]
hv_netvsc: Fix the carrier state error when data path is off
When the VF NIC is opened, the synthetic NIC's carrier state is set to
off. This tells the host to transitions data path to the VF device. But
if startup script or user manipulates the admin state of the netvsc
device directly for example:
# ifconfig eth0 down
# ifconfig eth0 up
Then the carrier state of the synthetic NIC would be on, even though the
data path was still over the VF NIC. This patch sets the carrier state
of synthetic NIC with consideration of the related VF state.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Wed, 21 Jun 2017 23:40:46 +0000 (16:40 -0700)]
hv_netvsc: Remove unnecessary var link_state from struct netvsc_device_info
We simply use rndis_device->link_state in the netdev_dbg. The variable,
link_state from struct netvsc_device_info, is not used anywhere else.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Wed, 21 Jun 2017 20:48:27 +0000 (13:48 -0700)]
samples/bpf: fix a build problem
tracex5_kern.c build failed with the following error message:
../samples/bpf/tracex5_kern.c:12:10: fatal error: 'syscall_nrs.h' file not found
#include "syscall_nrs.h"
The generated file syscall_nrs.h is put in build/samples/bpf directory,
but this directory is not in include path, hence build failed.
The fix is to add $(obj) into the clang compilation path.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Jun 2017 15:34:05 +0000 (11:34 -0400)]
Merge branch 'rds-tcp-fixes'
Sowmini Varadhan says:
====================
rds: tcp: fixes
Patch1 is a bug fix for correct reconnect when a connection
is restarted. Patch 2 accelerates cleanup by setting linger
to 1 and sending a RST to the peer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Wed, 21 Jun 2017 20:40:13 +0000 (13:40 -0700)]
rds: tcp: set linger to 1 when unloading a rds-tcp
If we are unloading the rds_tcp module, we can set linger to 1
and drop pending packets to accelerate reconnect. The peer will
end up resetting the connection based on new generation numbers
of the new incarnation, so hanging on to unsent TCP packets via
linger is mostly pointless in this case.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Jenny Xu <jenny.x.xu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Wed, 21 Jun 2017 20:40:12 +0000 (13:40 -0700)]
rds: tcp: send handshake ping-probe from passive endpoint
The RDS handshake ping probe added by commit
5916e2c1554f
("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg()
before the first data packet is sent to a peer. If the conversation
is not bidirectional (i.e., one side is always passive and never
invokes rds_sendmsg()) and the passive side restarts its rds_tcp
module, a new HS ping probe needs to be sent, so that the number
of paths can be re-established.
This patch achieves that by sending a HS ping probe from
rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done
a handshake probe with this peer yet).
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Jenny Xu <jenny.x.xu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Fontenot [Wed, 21 Jun 2017 20:41:02 +0000 (15:41 -0500)]
ibmvnic: Correct return code checking for ibmvnic_init during probe
The update to ibmvnic_init to allow an EAGAIN return code broke
the calling of ibmvnic_init from ibmvnic_probe. The code now
will return from this point in the probe routine if anything
other than EAGAIN is returned. The check should be to see if rc
is non-zero and not equal to EAGAIN.
Without this fix, the vNIC driver can return 0 (success) from
its probe routine due to ibmvnic_init returning zero, but before
completing the probe process and registering with the netdev layer.
Fixes: 6a2fb0e99f9c (ibmvnic: driver initialization for kdump/kexec)
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Jun 2017 15:31:35 +0000 (11:31 -0400)]
Merge branch 'ibmvnic-Correct-long-term-mapped-buffer-error-handling'
Thomas Falcon says:
====================
ibmvnic: Correct long-term-mapped buffer error handling
This patch set fixes the error-handling of long-term-mapped buffers
during adapter initialization and reset. The first patch fixes a bug
in an incorrectly defined descriptor that was keeping the return
codes from the VIO server from being properly checked. The second patch
fixes and cleans up the error-handling implementation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Wed, 21 Jun 2017 19:53:01 +0000 (14:53 -0500)]
ibmvnic: Fix error handling when registering long-term-mapped buffers
The patch stores the return code of the REQUEST_MAP_RSP sub-CRQ command
in the private data structure, where it can be later checked during
device open or a reset.
In the case of a reset, the mapping request to the vNIC Server may fail,
especially in the case of a partition migration. The driver attempts to
handle this by re-allocating the buffer and re-sending the mapping request.
The original error handling implementation was removed. The separate
function handling the REQUEST_MAP response message was also removed,
since it is now simple enough to be handled in the ibmvnic_handle_crq
function.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Wed, 21 Jun 2017 19:53:00 +0000 (14:53 -0500)]
ibmvnic: Fix incorrectly defined ibmvnic_request_map_rsp structure
This reserved area should be eight bytes in length instead of four.
As a result, the return codes in the REQUEST_MAP_RSP descriptors
were not being properly handled.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chenbo Feng [Wed, 21 Jun 2017 02:06:40 +0000 (19:06 -0700)]
tcp: Add a tcp_filter hook before handle ack packet
Currently in both ipv4 and ipv6 code path, the ack packet received when
sk at TCP_NEW_SYN_RECV state is not filtered by socket filter or cgroup
filter since it is handled from tcp_child_process and never reaches the
tcp_filter inside tcp_v4_rcv or tcp_v6_rcv. Adding a tcp_filter hooks
here can make sure all the ingress tcp packet can be correctly filtered.
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 20 Jun 2017 20:40:46 +0000 (22:40 +0200)]
net: phy: smsc: fix buffer overflow in memcpy
The memcpy annotation triggers for a fixed-length buffer copy:
In file included from /git/arm-soc/arch/arm64/include/asm/processor.h:30:0,
from /git/arm-soc/arch/arm64/include/asm/spinlock.h:21,
from /git/arm-soc/include/linux/spinlock.h:87,
from /git/arm-soc/include/linux/seqlock.h:35,
from /git/arm-soc/include/linux/time.h:5,
from /git/arm-soc/include/linux/stat.h:21,
from /git/arm-soc/include/linux/module.h:10,
from /git/arm-soc/drivers/net/phy/smsc.c:20:
In function 'memcpy',
inlined from 'smsc_get_strings' at /git/arm-soc/drivers/net/phy/smsc.c:166:3:
/git/arm-soc/include/linux/string.h:309:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
Using strncpy instead of memcpy should do the right thing here.
Fixes: 030a89028db0 ("net: phy: smsc: Implement PHY statistics")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Myron Stowe [Tue, 20 Jun 2017 17:21:26 +0000 (11:21 -0600)]
net/mlx5e: Use device ID defines
Use Mellanox device ID definitions in the driver's mlx5 ID table so tools
such as 'grep' and 'cscope' can be used to help find correlated material
(such as INTx Masking quirks:
d76d2fe05fd PCI: Convert Mellanox broken
INTx quirks to be for listed devices only).
No functional change intended.
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Denys Vlasenko [Mon, 19 Jun 2017 19:50:52 +0000 (21:50 +0200)]
liquidio: stop using huge static buffer, save 4096k in .data
Only compile-tested - I don't have the hardware.
>From code inspection, octeon_pci_write_core_mem() appears to be safe wrt
unaligned source. In any case, u8 fbuf[] was not guaranteed to be aligned
anyway.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Felix Manlunas <felix.manlunas@cavium.com>
CC: Prasad Kanneganti <prasad.kanneganti@cavium.com>
CC: Derek Chickles <derek.chickles@cavium.com>
CC: David Miller <davem@davemloft.net>
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 21 Jun 2017 21:35:22 +0000 (17:35 -0400)]
Merge git://git./linux/kernel/git/davem/net
Two entries being added at the same time to the IFLA
policy table, whilst parallel bug fixes to decnet
routing dst handling overlapping with the dst gc removal
in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 21 Jun 2017 19:40:20 +0000 (12:40 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Fix refcounting wrt timers which hold onto inet6 address objects,
from Xin Long.
2) Fix an ancient bug in wireless wext ioctls, from Johannes Berg.
3) Firmware handling fixes in brcm80211 driver, from Arend Van Spriel.
4) Several mlx5 driver fixes (firmware readiness, timestamp cap
reporting, devlink command validity checking, tc offloading, etc.)
From Eli Cohen, Maor Dickman, Chris Mi, and Or Gerlitz.
5) Fix dst leak in IP/IP6 tunnels, from Haishuang Yan.
6) Fix dst refcount bug in decnet, from Wei Wang.
7) Netdev can be double freed in register_vlan_device(). Fix from Gao
Feng.
8) Don't allow object to be destroyed while it is being dumped in SCTP,
from Xin Long.
9) Fix dpaa_eth build when modular, from Madalin Bucur.
10) Fix throw route leaks, from Serhey Popovych.
11) IFLA_GROUP missing from if_nlmsg_size() and ifla_policy[] table,
also from Serhey Popovych.
12) Fix premature TX SKB free in stmmac, from Niklas Cassel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
igmp: add a missing spin_lock_init()
net: stmmac: free an skb first when there are no longer any descriptors using it
sfc: remove duplicate up_write on VF filter_sem
rtnetlink: add IFLA_GROUP to ifla_policy
ipv6: Do not leak throw route references
dt-bindings: net: sms911x: Add missing optional VDD regulators
dpaa_eth: reuse the dma_ops provided by the FMan MAC device
fsl/fman: propagate dma_ops
net/core: remove explicit do_softirq() from busy_poll_stop()
fib_rules: Resolve goto rules target on delete
sctp: ensure ep is not destroyed before doing the dump
net/hns:bugfix of ethtool -t phy self_test
net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev
cxgb4: notify uP to route ctrlq compl to rdma rspq
ip6_tunnel: Correct tos value in collect_md mode
decnet: always not take dst->__refcnt when inserting dst into hash table
ip6_tunnel: fix potential issue in __ip6_tnl_rcv
ip_tunnel: fix potential issue in ip_tunnel_rcv
brcmfmac: fix uninitialized warning in brcmf_usb_probe_phase2()
net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it
...
David S. Miller [Wed, 21 Jun 2017 19:33:00 +0000 (15:33 -0400)]
Merge branch 'qed-File-split-and-rename-towards-iWARP-support'
Michal Kalderon says:
====================
qed*: File split and rename towards iWARP support
This patch series makes a few more infrastructure changes towards adding
iWARP support. Hopefully this is the last infrastructure change
prior to the iWARP RFC.
Patch #1-3 take care of taking all the common iWARP/RoCE code out of
qed_roce.[ch] and placing it in qed_rdma.[ch]
Patch #4 renames the roce interface file as it is common for
RoCE and iWARP. This patch touches qedr as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalderon, Michal [Wed, 21 Jun 2017 13:22:46 +0000 (16:22 +0300)]
qed*: Rename qed_roce_if.h to qed_rdma_if.h
Rename the qed_roce_if file to qed_rdma_if as it
represents a common interface for RoCE and iWARP.
this commit affects RDMA/qedr as well.
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalderon, Michal [Wed, 21 Jun 2017 13:22:45 +0000 (16:22 +0300)]
qed: Split rdma content between qed_rdma and qed_roce
This patch places common iWARP / RoCE code in qed_rdma
and roce specific code in qed_roce
There is one new function ( qed_roce_setup ) added, the rest
of the patch removes content from the files and removes some
static definitions.
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalderon, Michal [Wed, 21 Jun 2017 13:22:44 +0000 (16:22 +0300)]
qed: Duplicate qed_roce.[ch] to qed_rdma.[ch]
This patch adds files that will contain common code for RoCE/iWARP.
The files are currently identical to qed_roce.c / qed_roce.h and
intentionally not added to the makefile. The next patch in the series
will modify the files so that roce specific code is left in qed_roce
and common roce/iwarp code will be placed in qed_rdma
This patch is the result of a simple
cp qed_rdma.c qed_roce.c
cp qed_rdma.h qed_roce.h
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalderon, Michal [Wed, 21 Jun 2017 13:22:43 +0000 (16:22 +0300)]
qed: Cleanup qed_roce before duplicating it
The next patch in the series will duplicate qed_roce as part
of code preprations for iWARP support. Do some cleanup before
duplicating
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 21 Jun 2017 19:16:12 +0000 (12:16 -0700)]
Merge tag 'pinctrl-v4.12-3' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull more pin control fixes from Linus Walleij:
"Some late arriving fixes. I should have sent earlier, just swamped
with work as usual. Thomas patch makes AMD systems usable despite
firmware bugs so it is fairly important.
- Make the AMD driver use a regular interrupt rather than a chained
one, so the system does not lock up.
- Fix a function call error deep inside the STM32 driver"
* tag 'pinctrl-v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: stm32: Fix bad function call
pinctrl/amd: Use regular interrupt instead of chained
Daniel Borkmann [Wed, 21 Jun 2017 18:16:11 +0000 (20:16 +0200)]
bpf: expose prog id for cls_bpf and act_bpf
In order to be able to retrieve the attached programs from cls_bpf
and act_bpf, we need to expose the prog ids via netlink so that
an application can later on get an fd based on the id through the
BPF_PROG_GET_FD_BY_ID command, and dump related prog info via
BPF_OBJ_GET_INFO_BY_FD command for bpf(2).
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 21 Jun 2017 19:06:29 +0000 (12:06 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- revert of a commit to magicmouse driver that regressess certain
devices, from Daniel Stone
- quirk for a specific Dell mouse, from Sebastian Parschauer
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
Revert "HID: magicmouse: Set multi-touch keybits for Magic Mouse"
HID: Add quirk for Dell PIXART OEM mouse
Linus Torvalds [Wed, 21 Jun 2017 19:02:48 +0000 (12:02 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/livepatching
Pull livepatching fix from Jiri Kosina:
"Fix the way how livepatches are being stacked with respect to RCU,
from Petr Mladek"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Fix stacking of patches with respect to RCU
Linus Torvalds [Wed, 21 Jun 2017 18:30:52 +0000 (11:30 -0700)]
Merge branch 'ufs-fixes' of git://git./linux/kernel/git/viro/vfs
Pull more ufs fixes from Al Viro:
"More UFS fixes, unfortunately including build regression fix for the
64-bit s_dsize commit. Fixed in this pile:
- trivial bug in signedness of 32bit timestamps on ufs1
- ESTALE instead of ufs_error() when doing open-by-fhandle on
something deleted
- build regression on 32bit in ufs_new_fragments() - calculating that
many percents of u64 pulls libgcc stuff on some of those. Mea
culpa.
- fix hysteresis loop broken by typo in 2.4.14.7 (right next to the
location of previous bug).
- fix the insane limits of said hysteresis loop on filesystems with
very low percentage of reserved blocks. If it's 5% or less, just
use the OPTSPACE policy.
- calculate those limits once and mount time.
This tree does pass xfstests clean (both ufs1 and ufs2) and it _does_
survive cross-builds.
Again, my apologies for missing that, especially since I have noticed
a related percentage-of-64bit issue in earlier patches (when dealing
with amount of reserved blocks). Self-LART applied..."
* 'ufs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ufs: fix the logics for tail relocation
ufs_iget(): fail with -ESTALE on deleted inode
fix signedness of timestamps on ufs1
Helge Deller [Mon, 19 Jun 2017 15:34:05 +0000 (17:34 +0200)]
Allow stack to grow up to address space limit
Fix expand_upwards() on architectures with an upward-growing stack (parisc,
metag and partly IA-64) to allow the stack to reliably grow exactly up to
the address space limit given by TASK_SIZE.
Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 20 Jun 2017 09:10:44 +0000 (02:10 -0700)]
mm: fix new crash in unmapped_area_topdown()
Trinity gets kernel BUG at mm/mmap.c:1963! in about 3 minutes of
mmap testing. That's the VM_BUG_ON(gap_end < gap_start) at the
end of unmapped_area_topdown(). Linus points out how MAP_FIXED
(which does not have to respect our stack guard gap intentions)
could result in gap_end below gap_start there. Fix that, and
the similar case in its alternative, unmapped_area().
Cc: stable@vger.kernel.org
Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Debugged-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo Abeni [Wed, 21 Jun 2017 09:45:31 +0000 (11:45 +0200)]
sock: avoid dirtying incoming_cpu if not needed
for connected socket, the incoming_cpu field in the sock struct
is not going to change frequently, but we are setting it
unconditionally for each packet.
Since sk_incoming_cpu and sk_flags share the same cacheline,
and the latter is access by udp_recvmsg(), this cause a cache
miss for each packet for UDP connected socket.
With this patch, we set the incoming cpu field only when the
ingress cpu really changes.
This gives a small but measurable performance improvement for
connected UDP socket.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Herrmann [Wed, 21 Jun 2017 08:47:15 +0000 (10:47 +0200)]
net: introduce SO_PEERGROUPS getsockopt
This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
retrieve the auxiliary groups of the remote peer. It is designed to
naturally extend SO_PEERCRED. That is, the underlying data is from the
same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
is, if the provided buffer is too small, ERANGE is returned and @optlen
is updated. Otherwise, the information is copied, @optlen is set to the
actual size, and 0 is returned.
While SO_PEERCRED (and thus `struct ucred') already returns the primary
group, it lacks the auxiliary group vector. However, nearly all access
controls (including kernel side VFS and SYSVIPC, but also user-space
polkit, DBus, ...) consider the entire set of groups, rather than just
the primary group. But this is currently not possible with pure
SO_PEERCRED. Instead, user-space has to work around this and query the
system database for the auxiliary groups of a UID retrieved via
SO_PEERCRED.
Unfortunately, there is no race-free way to query the auxiliary groups
of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
solution is to use getgrouplist(3p), which itself falls back to NSS and
whatever is configured in nsswitch.conf(3). This effectively checks
which groups we *would* assign to the user if it logged in *now*. On
normal systems it is as easy as reading /etc/group, but with NSS it can
resort to quering network databases (eg., LDAP), using IPC or network
communication.
Long story short: Whenever we want to use auxiliary groups for access
checks on IPC, we need further IPC to talk to the user/group databases,
rather than just relying on SO_PEERCRED and the incoming socket. This
is unfortunate, and might even result in dead-locks if the database
query uses the same IPC as the original request.
So far, those recursions / dead-locks have been avoided by using
primitive IPC for all crucial NSS modules. However, we want to avoid
re-inventing the wheel for each NSS module that might be involved in
user/group queries. Hence, we would preferably make DBus (and other IPC
that supports access-management based on groups) work without resorting
to the user/group database. This new SO_PEERGROUPS ioctl would allow us
to make dbus-daemon work without ever calling into NSS.
Cc: Michal Sekletar <msekleta@redhat.com>
Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 21 Jun 2017 08:24:40 +0000 (10:24 +0200)]
udp: prefetch rmem_alloc in udp_queue_rcv_skb()
On UDP packets processing, if the BH is the bottle-neck, it
always sees a cache miss while updating rmem_alloc; try to
avoid it prefetching the value as soon as we have the socket
available.
Performances under flood with multiple NIC rx queues used are
unaffected, but when a single NIC rx queue is in use, this
gives ~10% performance improvement.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chad Dupuis [Wed, 21 Jun 2017 05:26:34 +0000 (08:26 +0300)]
qede: Fix compilation without QED_RDMA
When CONFIG_QED_RDMA isn't defined, we'd hit the following:
/include/linux/qed/qede_rdma.h:84:19:
warning: ‘qede_rdma_dev_add’ used but never defined [enabled by default]
static inline int qede_rdma_dev_add(struct qede_dev *dev);
Fixes: bbfcd1e8e167 ("qed*: Set rdma generic functions prefix")
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Wed, 21 Jun 2017 03:25:18 +0000 (11:25 +0800)]
r8152: correct the definition
Replace VLAN_HLEN and CRC_SIZE with ETH_FCS_LEN.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 21 Jun 2017 15:31:46 +0000 (11:31 -0400)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-06-20
This series contains updates to i40e and i40evf only.
Björn adds additional XDP support for i40e, by adding pass and drop actions
and XDP_TX action support.
Jake fixes a possible NULL pointer dereference in
i40evf_get_ethtool_stats() which could occur if the VF fails to recover
from a reset, and then a user requests statistics. Changed the use of
dev_info() to dev_dbg() for vf_capability client routine so that the
standard log is not spammed with this information which "might" cause
administrators to worry. Also added more code comments to help explain
why udp_port has be in host byte order and to avoid future changes which
may cause this to break. Fixed the holding of the RTNL lock for the
entire reset routine, reduced the scope so that the reset function will
handle its own lock, so that we do not have to wrap every reference
to i40e_do_reset() with RTNL lock/unlock.
Alice updates flags related to firmware interactions for WoL and admin
queue command address with the correct value.
Sudheer makes a fix to ensure that the array is not accessed past the
size of the array.
Greg fixes the parsing of firmware 4.33 admin queue commmand "Get CEE
DCBX PER CFG" because the firmware now creates the oper_prio_tc nibbles
reversed from those in the CDD Priority Group sub-TLV.
Carolyn adds a check and message to let users know that when in MFP mode,
changing RSS hash input set is not supported.
Shannon makes the partition bandwidth control more generic since it is not
in just one form of multi-function partitioning (MFP). Also fixes a bug
which was causing the firmware confusion in some reset sequences, when
we were disabling interrupts and we were clearing the whole register.
Instead we should only be clearing the CAUSE_ENA bit when disabling
interrupts.
Filip adds support for OEM firmware version, so that if a OEM specific
adapter is detected, ethtool reports the OEM product version in the
firmware version string instead of etrack id.
Alan fixes a bug where the driver was not correctly exiting overflow
promiscuous mode, which can happen if "too many" MAC filters are added,
putting the driver into overflow promiscuous mode, and the filters are
then removed. The bug occurs because the conditional for toggling
promiscuous mode was only be executed when enabled and not when it was
disabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 21 Jun 2017 15:22:54 +0000 (11:22 -0400)]
Merge branch 'ipmr-ip6mr-add-Netlink-notifications-on-cache-reports'
Julien Gomes says:
====================
ipmr/ip6mr: add Netlink notifications on cache reports
Currently, all ipmr/ip6mr cache reports are sent through the
mroute/mroute6 socket only.
This forces the use of a single socket for mroute programming, cache
reports and, regarding ipmr, IGMP messages without Router Alert option
reception.
The present patches are aiming to send Netlink notifications in addition
to the existing igmpmsg/mrt6msg to give user programs a way to handle
cache reports in parallel with multiple sockets other than the
mroute/mroute6 socket.
Changes in v2:
- Changed attributes naming from {IPMRA,IP6MRA}_CACHEREPORTA_* to
{IPMRA,IP6MRA}_CREPORT_*
- Improved packet data copy to handle non-linear packets in
ipmr/ip6mr cache report Netlink notification creation
- Added two rtnetlink groups with restricted-binding
- Changed cache report notified groups from RTNL_{IPV4,IPV6}_MROUTE to
the new restricted groups in ipmr/ip6mr
Changes in v3:
- Put message size calculation for {igmp,mrt6}msg_netlink_event in separate
functions
- Increased vif id attributes size from u8 to u32
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julien Gomes [Tue, 20 Jun 2017 20:54:18 +0000 (13:54 -0700)]
ip6mr: add netlink notifications on mrt6msg cache reports
Add Netlink notifications on cache reports in ip6mr, in addition to the
existing mrt6msg sent to mroute6_sk.
Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV6_MROUTE_R.
MSGTYPE, MIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the
same data as their equivalent fields in the mrt6msg header.
PKT attribute is the packet sent to mroute6_sk, without the added
mrt6msg header.
Suggested-by: Ryan Halbrook <halbrook@arista.com>
Signed-off-by: Julien Gomes <julien@arista.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julien Gomes [Tue, 20 Jun 2017 20:54:17 +0000 (13:54 -0700)]
ipmr: add netlink notifications on igmpmsg cache reports
Add Netlink notifications on cache reports in ipmr, in addition to the
existing igmpmsg sent to mroute_sk.
Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV4_MROUTE_R.
MSGTYPE, VIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the
same data as their equivalent fields in the igmpmsg header.
PKT attribute is the packet sent to mroute_sk, without the added igmpmsg
header.
Suggested-by: Ryan Halbrook <halbrook@arista.com>
Signed-off-by: Julien Gomes <julien@arista.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julien Gomes [Tue, 20 Jun 2017 20:54:16 +0000 (13:54 -0700)]
rtnetlink: add restricted rtnl groups for ipv4 and ipv6 mroute
Add RTNLGRP_{IPV4,IPV6}_MROUTE_R as two new restricted groups for the
NETLINK_ROUTE family.
Binding to these groups specifically requires CAP_NET_ADMIN to allow
multicast of sensitive messages (e.g. mroute cache reports).
Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Julien Gomes <julien@arista.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julien Gomes [Tue, 20 Jun 2017 20:54:15 +0000 (13:54 -0700)]
rtnetlink: add NEWCACHEREPORT message type
New NEWCACHEREPORT message type to be used for cache reports sent
via Netlink, effectively allowing splitting cache report reception from
mroute programming.
Suggested-by: Ryan Halbrook <halbrook@arista.com>
Signed-off-by: Julien Gomes <julien@arista.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 20 Jun 2017 20:11:21 +0000 (22:11 +0200)]
tcp: md5: hide unused variable
Changing from a memcpy to per-member comparison left the
size variable unused:
net/ipv4/tcp_ipv4.c: In function 'tcp_md5_do_lookup':
net/ipv4/tcp_ipv4.c:910:15: error: unused variable 'size' [-Werror=unused-variable]
This does not show up when CONFIG_IPV6 is enabled, but the
variable can be removed either way, along with the now unused
assignment.
Fixes: 6797318e623d ("tcp: md5: add an address prefix for key lookup")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
yuan linyu [Wed, 21 Jun 2017 12:04:40 +0000 (20:04 +0800)]
idsn: fix wrong skb_put() used
in my commit
b952f4dff2751252db073c27c0f8a16a416a2ddc,
- *(u8 *)skb_put(skb_out, 1) = (u8)(accm >> 24); \
+ skb_put(skb_out, (u8)(accm >> 24)); \
it should skb_put_u8()
Fixes: b952f4dff275 ("net: manual clean code which call skb_put_[data:zero])")
Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 7 Jun 2017 09:43:13 +0000 (05:43 -0400)]
i40e: don't hold RTNL lock for the entire reset
We recently refactored i40e_do_reset() and its friends to be able to
hold the RTNL lock only for the portions that actually need to be
protected. However, a separate refactoring added several new callers of
these functions during the PCIe error recovery and suspend/resume
cycles.
When merging the changes together, it was not noticed that we could
reduce the RTNL scope by letting the reset function handle the lock
itself, as previously it was not possible.
Fix this by replacing these call sites to indicate that the reset
function should handle its own lock. This enables multiple PFs to reset
or resume simultaneously without serializing the resets via the RTNL
lock. The end result is that on systems with lots of PFs and VFs the
resets don't stall waiting for each other to finish.
It is probable that we can also do the same for i40e_do_reset_safe, but
this author did not research that change carefully enough to be
confident.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Catherine Sullivan [Wed, 7 Jun 2017 09:43:12 +0000 (05:43 -0400)]
i40e: Handle PE_CRITERR properly with IWARP enabled
When IWARP is enabled, we weren't clearing the PE_CRITERR, just logging
it and removing it from the mask. We need to do a corer to reset the
PE_CRITERR register, so set the bit for that as we handle the
interrupt.
We should also be checking for the error against the PFINT_ICR0 register,
and only need to clear it in the value getting written to
PFINT_ICR0_ENA.
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 7 Jun 2017 09:43:11 +0000 (05:43 -0400)]
i40e: clear only cause_ena bit
When disabling interrupts, we should only be clearing the CAUSE_ENA bit,
not clearing the whole register. Clearing the whole register sets the
NEXTQ_IDX field to 0 instead of 0x7ff which can confuse the Firmware in
some reset sequences.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alan Brady [Wed, 7 Jun 2017 09:43:10 +0000 (05:43 -0400)]
i40e: fix disabling overflow promiscuous mode
There exists a bug in which the driver does not correctly exit overflow
promiscuous mode. This can occur if "too many" mac filters are added,
putting the driver into overflow promiscuous mode, and the filters are
then removed. When the failed filters are removed, the driver reports
exiting overflow promiscuous mode which is correct, however traffic
continues to be received as if in promiscuous mode still.
The bug occurs because the conditional for toggling promiscuous mode was
set to only execute when promiscuous mode was enabled and not when it
was disabled as well. This patch fixes the conditional to correctly
execute when promiscuous mode is toggled and not just enabled. Without
this patch, the driver is unable to correctly exit overflow promiscuous
mode.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Filip Sadowski [Wed, 7 Jun 2017 09:43:09 +0000 (05:43 -0400)]
i40e: Add support for OEM firmware version
This patch adds support for OEM firmware version. If OEM specific
adapter is detected ethtool reports OEM product version in firmware
version string instead of etrack id.
Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 7 Jun 2017 09:43:08 +0000 (05:43 -0400)]
i40e: genericize the partition bandwidth control
Partition bandwidth control is not in just one form of MFP (multi-function
partitioning), so make the code more generic and be sure to nudge the Tx
scheduler for all MFP.
Copyright updated to 2017.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Carolyn Wyborny [Wed, 7 Jun 2017 09:43:07 +0000 (05:43 -0400)]
i40e: Add message for unsupported MFP mode
This patch adds a check and message if the device is in
MFP mode as changing RSS input set is not supported in
MFP mode.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Bowers [Wed, 7 Jun 2017 09:43:06 +0000 (05:43 -0400)]
i40e: Support firmware CEE DCB UP to TC map re-definition
Changes parsing of FW 4.33 AQ command Get CEE DCBX OPER CFG (0x0A07).
Change is required because FW now creates the oper_prio_tc
nibbles reversed from those in the CEE Priority Group sub-TLV.
This change will only apply to FW 4.33 as future FW versions will use a
different function to parse the CEE data.
Signed-off-by: Greg Bowers <gregory.j.bowers@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sudheer Mogilappagari [Wed, 7 Jun 2017 09:43:05 +0000 (05:43 -0400)]
i40e: Fix potential out of bound array access
This is a fix for the static code analysis issue where dcbcfg->numapps
could be greater than size of array (i.e dcbcfg->app[I40E_DCBX_MAX_APPS]).
The fix makes sure that the array is not accessed past the size of
of the array (i.e. I40E_DCBX_MAX_APPS).
Copyright updated to 2017.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 7 Jun 2017 09:43:04 +0000 (05:43 -0400)]
i40e: comment that udp_port must be in host byte order
The firmware expects the port number passed when setting up
the UDP tunnel configuration to be in Little Endian format.
The i40e_aq_add_udp_tunnel command byte swaps the value from
host order to Little Endian.
Since commit
fe0b0cd97b4f ("i40e: send correct port number to
AdminQ when enabling UDP tunnels") we've correctly
sent the value in host order.
Let's also add a comment to the function explaining that it must
be in host order, as the port numbers are commonly stored as Big
Endian values.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 7 Jun 2017 09:43:03 +0000 (05:43 -0400)]
i40e: use dev_dbg instead of dev_info when warning about missing routine
When searching for the vf_capability client routine, dev_info() was
used, instead of the normal dev_dbg(). This causes the message to be
displayed at standard log levels which can cause administrators to
worry. Avoid this by using dev_dbg instead.
Copyright updated to 2017.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alice Michael [Wed, 7 Jun 2017 09:43:02 +0000 (05:43 -0400)]
i40e/i40evf: update WOL and I40E_AQC_ADDR_VALID_MASK flags
Update a few flags related to FW interactions.
Copyright updated to 2017.
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Wed, 7 Jun 2017 09:43:01 +0000 (05:43 -0400)]
i40evf: assign num_active_queues inside i40evf_alloc_queues
The variable num_active_queues represents the number of active queues we
have for the device. We assign this pretty early in i40evf_init_subtask.
Several code locations are written with loops over the tx_rings and
rx_rings structures, which don't get allocated until
i40evf_alloc_queues, and which get freed by i40evf_free_queues.
These call sites were written under the assumption that tx_rings and
rx_rings would always be allocated at least when num_active_queues is
non-zero.
Lets fix this by moving the assignment into the function where we
allocate queues. We'll use a temporary variable for storage so that we
don't assign the value in the adapter structure until after the rings
have been set up.
Finally, when we free the queues, we'll clear the value to ensure that
we do not loop over the rings memory that no longer exists.
This resolves a possible NULL pointer dereference in
i40evf_get_ethtool_stats which could occur if the VF fails to recover
from a reset, and then a user requests statistics.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Björn Töpel [Wed, 24 May 2017 05:55:35 +0000 (07:55 +0200)]
i40e: add support for XDP_TX action
This patch adds proper XDP_TX action support. For each Tx ring, an
additional XDP Tx ring is allocated and setup. This version does the
DMA mapping in the fast-path, which will penalize performance for
IOMMU enabled systems. Further, debugfs support is not wired up for
the XDP Tx rings.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Björn Töpel [Wed, 24 May 2017 05:55:34 +0000 (07:55 +0200)]
i40e: add XDP support for pass and drop actions
This commit adds basic XDP support for i40e derived NICs. All XDP
actions will end up in XDP_DROP.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
WANG Cong [Tue, 20 Jun 2017 17:46:27 +0000 (10:46 -0700)]
igmp: add a missing spin_lock_init()
Andrey reported a lockdep warning on non-initialized
spinlock:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 4099 Comm: a.out Not tainted 4.12.0-rc6+ #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16
dump_stack+0x292/0x395 lib/dump_stack.c:52
register_lock_class+0x717/0x1aa0 kernel/locking/lockdep.c:755
? 0xffffffffa0000000
__lock_acquire+0x269/0x3690 kernel/locking/lockdep.c:3255
lock_acquire+0x22d/0x560 kernel/locking/lockdep.c:3855
__raw_spin_lock_bh ./include/linux/spinlock_api_smp.h:135
_raw_spin_lock_bh+0x36/0x50 kernel/locking/spinlock.c:175
spin_lock_bh ./include/linux/spinlock.h:304
ip_mc_clear_src+0x27/0x1e0 net/ipv4/igmp.c:2076
igmpv3_clear_delrec+0xee/0x4f0 net/ipv4/igmp.c:1194
ip_mc_destroy_dev+0x4e/0x190 net/ipv4/igmp.c:1736
We miss a spin_lock_init() in igmpv3_add_delrec(), probably
because previously we never use it on this code path. Since
we already unlink it from the global mc_tomb list, it is
probably safe not to acquire this spinlock here. It does not
harm to have it although, to avoid conditional locking.
Fixes: c38b7d327aaf ("igmp: acquire pmc lock for ip_mc_clear_src()")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 20 Jun 2017 19:47:15 +0000 (15:47 -0400)]
Merge tag 'mlx5-updates-2017-06-20' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2017-06-20 (mlx5 IPoIB updates)
This series includes updates to mlx5 IPoIB netdevice driver (mlx5i),
1. We move ipoib files into separate directory, to allow it to grow
separately in its own space
2. Remove HW update carrier logic from IPoIB and VF representors profiles.
3. Add basic ethtool support. (Rings options/statistics and driver info).
4. Change MTU support.
5. Xmit path statistics reporting.
6. add PTP support.
For the new ethtool ops, PTP (ioctl) and change_mtu ndos in IPoIB, we didn't add new
implementation or new logic, we only reused those callbacks from the already existing
mlx5e (ethernet netdevice profile) and exposed them in IPoIB netdevice/ethtool ops.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 20 Jun 2017 19:44:22 +0000 (15:44 -0400)]
Merge branch 's390-net-updates-part-2'
Julian Wiedmann says:
====================
s390/net updates, part 2 (v2)
thanks for the feedback. Here's an updated patchset that honours
the reverse christmas tree and drops the __packed attribute. Please apply.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 20 Jun 2017 14:00:34 +0000 (16:00 +0200)]
s390/qeth: use diag26c to get MAC address on L2
When a s390 guest runs on a z/VM host that's part of a SSI cluster,
it can be migrated to a different host. In this case, the MAC address
it originally obtained on the old host may be re-assigned to a new
guest. This would result in address conflicts between the two guests.
When running as z/VM guest, use the diag26c MAC Service to obtain
a hypervisor-managed MAC address. The MAC Service is SSI-aware, and
won't re-assign the address after the guest is migrated to a new host.
This patch adds support for the z/VM MAC Service on L2 devices.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 20 Jun 2017 14:00:33 +0000 (16:00 +0200)]
s390/diag: add diag26c support
Implement support for the hypervisor diagnose 0x26c
('Access Certain System Information').
It passes a request buffer and a subfunction code, and receives
a response buffer and a return code.
Also add the scaffolding for the 'MAC Services' subfunction.
It may be used by network devices to obtain a hypervisor-managed
MAC address.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 20 Jun 2017 14:00:32 +0000 (16:00 +0200)]
s390/qeth: fix packing buffer statistics
There's two spots in qeth_send_packet() where we don't accurately
account for transmitted packing buffers in qeth's performance
statistics:
1) when flushing the current buffer due to insufficient size,
and the next buffer is not EMPTY, we need to account for that
flushed buffer.
2) when synchronizing with the TX completion code, we reset
flush_count and thus forget to account for any previously
flushed buffers.
Reported-by: Nils Hoppmann <niho@de.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kittipon Meesompop [Tue, 20 Jun 2017 14:00:31 +0000 (16:00 +0200)]
s390/qeth: add ipa return codes for bridgeport
add ipa return codes for Bridgeport (HiperSockets and OSA) according to
system level design.
Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 20 Jun 2017 19:41:56 +0000 (15:41 -0400)]
Merge tag 'wireless-drivers-for-davem-2017-06-20' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.12
Two important fixes for brcmfmac. The rest of the brcmfmac patches are
either code preparation and fixing a new build warning.
brcmfmac
* fix a NULL pointer dereference during resume
* fix a NULL pointer dereference with USB devices, a regression from
v4.12-rc1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Tue, 20 Jun 2017 12:32:41 +0000 (14:32 +0200)]
net: stmmac: free an skb first when there are no longer any descriptors using it
When having the skb pointer in the first descriptor, stmmac_tx_clean
can get called at a moment where the IP has only cleared the own bit
of the first descriptor, thus freeing the skb, even though there can
be several descriptors whose buffers point into the same skb.
By simply moving the skb pointer from the first descriptor to the last
descriptor, a skb will get freed only when the IP has cleared the
own bit of all the descriptors that are using that skb.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 20 Jun 2017 12:08:51 +0000 (13:08 +0100)]
sfc: remove duplicate up_write on VF filter_sem
Somehow two copies of the line 'up_write(&vf->efx->filter_sem);' got into
efx_ef10_sriov_set_vf_vlan(). This would put the mutex in a bad state and
cause all subsequent down attempts to hang.
Fixes: 671b53eec2ed ("sfc: Ensure down_write(&filter_sem) and up_write() are matched before calling efx_net_open()")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Serhey Popovych [Tue, 20 Jun 2017 11:35:23 +0000 (14:35 +0300)]
rtnetlink: add IFLA_GROUP to ifla_policy
Network interface groups support added while ago, however
there is no IFLA_GROUP attribute description in policy
and netlink message size calculations until now.
Add IFLA_GROUP attribute to the policy.
Fixes: cbda10fa97d7 ("net_device: add support for network device groups")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Serhey Popovych [Tue, 20 Jun 2017 10:29:25 +0000 (13:29 +0300)]
ipv6: Do not leak throw route references
While commit
73ba57bfae4a ("ipv6: fix backtracking for throw routes")
does good job on error propagation to the fib_rules_lookup()
in fib rules core framework that also corrects throw routes
handling, it does not solve route reference leakage problem
happened when we return -EAGAIN to the fib_rules_lookup()
and leave routing table entry referenced in arg->result.
If rule with matched throw route isn't last matched in the
list we overwrite arg->result losing reference on throw
route stored previously forever.
We also partially revert commit
ab997ad40839 ("ipv6: fix the
incorrect return value of throw route") since we never return
routing table entry with dst.error == -EAGAIN when
CONFIG_IPV6_MULTIPLE_TABLES is on. Also there is no point
to check for RTF_REJECT flag since it is always set throw
route.
Fixes: 73ba57bfae4a ("ipv6: fix backtracking for throw routes")
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 20 Jun 2017 08:05:11 +0000 (16:05 +0800)]
sctp: handle errors when updating asoc
It's a bad thing not to handle errors when updating asoc. The memory
allocation failure in any of the functions called in sctp_assoc_update()
would cause sctp to work unexpectedly.
This patch is to fix it by aborting the asoc and reporting the error when
any of these functions fails.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 20 Jun 2017 08:01:55 +0000 (16:01 +0800)]
sctp: uncork the old asoc before changing to the new one
local_cork is used to decide if it should uncork asoc outq after processing
some cmds, and it is set when replying or sending msgs. local_cork should
always have the same value with current asoc q->cork in some way.
The thing is when changing to a new asoc by cmd SET_ASOC, local_cork may
not be consistent with the current asoc any more. The cmd seqs can be:
SCTP_CMD_UPDATE_ASSOC (asoc)
SCTP_CMD_REPLY (asoc)
SCTP_CMD_SET_ASOC (new_asoc)
SCTP_CMD_DELETE_TCB (new_asoc)
SCTP_CMD_SET_ASOC (asoc)
SCTP_CMD_REPLY (asoc)
The 1st REPLY makes OLD asoc q->cork and local_cork both are 1, and the cmd
DELETE_TCB clears NEW asoc q->cork and local_cork. After asoc goes back to
OLD asoc, q->cork is still 1 while local_cork is 0. The 2nd REPLY will not
set local_cork because q->cork is already set and it can't be uncorked and
sent out because of this.
To keep local_cork consistent with the current asoc q->cork, this patch is
to uncork the old asoc if local_cork is set before changing to the new one.
Note that the above cmd seqs will be used in the next patch when updating
asoc and handling errors in it.
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 20 Jun 2017 07:44:44 +0000 (15:44 +0800)]
dccp: call inet_add_protocol after register_pernet_subsys in dccp_v6_init
Patch "call inet_add_protocol after register_pernet_subsys in dccp_v4_init"
fixed a null pointer dereference issue for dccp_ipv4 module.
The same fix is needed for dccp_ipv6 module.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 20 Jun 2017 07:42:38 +0000 (15:42 +0800)]
dccp: call inet_add_protocol after register_pernet_subsys in dccp_v4_init
Now dccp_ipv4 works as a kernel module. During loading this module, if
one dccp packet is being recieved after inet_add_protocol but before
register_pernet_subsys in which v4_ctl_sk is initialized, a null pointer
dereference may be triggered because of init_net.dccp.v4_ctl_sk is 0x0.
Jianlin found this issue when the following call trace occurred:
[ 171.950177] BUG: unable to handle kernel NULL pointer dereference at
0000000000000110
[ 171.951007] IP: [<
ffffffffc0558364>] dccp_v4_ctl_send_reset+0xc4/0x220 [dccp_ipv4]
[...]
[ 171.984629] Call Trace:
[ 171.984859] <IRQ>
[ 171.985061]
[ 171.985213] [<
ffffffffc0559a53>] dccp_v4_rcv+0x383/0x3f9 [dccp_ipv4]
[ 171.985711] [<
ffffffff815ca054>] ip_local_deliver_finish+0xb4/0x1f0
[ 171.986309] [<
ffffffff815ca339>] ip_local_deliver+0x59/0xd0
[ 171.986852] [<
ffffffff810cd7a4>] ? update_curr+0x104/0x190
[ 171.986956] [<
ffffffff815c9cda>] ip_rcv_finish+0x8a/0x350
[ 171.986956] [<
ffffffff815ca666>] ip_rcv+0x2b6/0x410
[ 171.986956] [<
ffffffff810c83b4>] ? task_cputime+0x44/0x80
[ 171.986956] [<
ffffffff81586f22>] __netif_receive_skb_core+0x572/0x7c0
[ 171.986956] [<
ffffffff810d2c51>] ? trigger_load_balance+0x61/0x1e0
[ 171.986956] [<
ffffffff81587188>] __netif_receive_skb+0x18/0x60
[ 171.986956] [<
ffffffff8158841e>] process_backlog+0xae/0x180
[ 171.986956] [<
ffffffff8158799d>] net_rx_action+0x16d/0x380
[ 171.986956] [<
ffffffff81090b7f>] __do_softirq+0xef/0x280
[ 171.986956] [<
ffffffff816b6a1c>] call_softirq+0x1c/0x30
This patch is to move inet_add_protocol after register_pernet_subsys in
dccp_v4_init, so that v4_ctl_sk is initialized before any incoming dccp
packets are processed.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Mon, 19 Jun 2017 23:28:44 +0000 (16:28 -0700)]
enic: Fix format truncation warning
With -Wformat-truncation, gcc throws the following warning.
Fix this by increasing the size of devname to accommodate 15 character
netdev interface name and description.
Remove length format precision for %s. We can fit entire name.
Also increment the version.
drivers/net/ethernet/cisco/enic/enic_main.c: In function ‘enic_open’:
drivers/net/ethernet/cisco/enic/enic_main.c:1740:15: warning: ‘%u’ directive output may be truncated writing between 1 and 2 bytes into a region of size between 1 and 12 [-Wformat-truncation=]
"%.11s-rx-%u", netdev->name, i);
^~
drivers/net/ethernet/cisco/enic/enic_main.c:1740:5: note: directive argument in the range [0, 16]
"%.11s-rx-%u", netdev->name, i);
^~~~~~~~~~~~~
drivers/net/ethernet/cisco/enic/enic_main.c:1738:4: note: ‘snprintf’ output between 6 and 18 bytes into a destination of size 16
snprintf(enic->msix[intr].devname,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sizeof(enic->msix[intr].devname),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%.11s-rx-%u", netdev->name, i);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Mon, 19 Jun 2017 16:36:44 +0000 (18:36 +0200)]
net: stmmac: enable TSO for IPv6
There is nothing in the IP that prevents us from enabling TSO for IPv6.
Before patch:
ftp fe80::2aa:bbff:fecc:1336%eth0
ftp> get /dev/zero
882512708 bytes received in 00:14 (56.11 MiB/s)
After patch:
ftp fe80::2aa:bbff:fecc:1336%eth0
ftp> get /dev/zero
1203326784 bytes received in 00:12 (94.52 MiB/s)
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Allen [Mon, 19 Jun 2017 16:27:53 +0000 (11:27 -0500)]
ibmvnic: Return from ibmvnic_resume if not in VNIC_OPEN state
If the ibmvnic driver is not in the VNIC_OPEN state, return from
ibmvnic_resume callback. If we are not in the VNIC_OPEN state, interrupts
may not be initialized and directly calling the interrupt handler will
cause a crash.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Krzysztof Kozlowski [Mon, 19 Jun 2017 16:05:41 +0000 (18:05 +0200)]
dt-bindings: net: sms911x: Add missing optional VDD regulators
The lan911x family of devices require supplying from 3.3 V power
supplies (connected to VDD_IO, VDD_A and VREG_3.3 pins). The existing
driver however obtains only VDD_IO and VDD_A regulators in an optional
way so document this in bindings.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 20 Jun 2017 17:46:54 +0000 (13:46 -0400)]
Merge branch 'net-fix-loadable-module-for-DPAA-Ethernet'
Madalin Bucur says:
====================
net: fix loadable module for DPAA Ethernet
The DPAA Ethernet makes use of a symbol that is not exported.
Address the issue by propagating the dma_ops rather than calling
arch_setup_dma_ops().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Mon, 19 Jun 2017 15:04:17 +0000 (18:04 +0300)]
dpaa_eth: reuse the dma_ops provided by the FMan MAC device
Remove the use of arch_setup_dma_ops() that was not exported
and was breaking loadable module compilation.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Mon, 19 Jun 2017 15:04:16 +0000 (18:04 +0300)]
fsl/fman: propagate dma_ops
Make sure dma_ops are set, to be later used by the Ethernet driver.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Bogendoerfer [Mon, 19 Jun 2017 14:00:22 +0000 (16:00 +0200)]
net: phy: lxt: Export link partner advertising
Provide link partner advertising information.
Removed testing for gigabit modes, which is useless for a fast ethernet phy.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 20 Jun 2017 17:40:36 +0000 (13:40 -0400)]
Merge branch 'mediatek-various-performance-improvements'
John Crispin says:
====================
net-next: mediatek: various performance improvements
During development we mainly ran testing using iperf doing 1500 byte
tcp frames. It was pointed out recently, that the driver does not perform
very well when using 512 byte udp frames. The biggest problem was that
RPS was not working as no rx queue was being set. fixing this more than
doubled the throughput. Additionally the IRQ mask register is now locked
independently for RX and TX. RX IRQ aggregation is also added. With all
these patches applied we can almost triple the throughput.
While at it we also add PHY status change reporting for GMACs connecting
directly to a PHY.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Crispin [Mon, 19 Jun 2017 13:37:06 +0000 (15:37 +0200)]
net-next: mediatek: set the rx_queue to 0
The get_rps_cpu() function will not do any RPS on the data flow when no
queue is setup and always use the current cpu where the IRQ was handled
to also handle the backlog. As we only have one physical queue we always
set this to 0 unconditionally.
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Crispin [Mon, 19 Jun 2017 13:37:05 +0000 (15:37 +0200)]
net-next: mediatek: split IRQ register locking into TX and RX
Originally the driver only utilised the new QDMA engine. The current code
still assumes this is the case when locking the IRQ mask register. Since
RX now runs on the old style PDMA engine we can add a second lock. This
patch reduces the IRQ latency as the TX and RX path no longer need to wait
on each other under heavy load.
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Crispin [Mon, 19 Jun 2017 13:37:04 +0000 (15:37 +0200)]
net-next: mediatek: add RX IRQ delay support
The PDMA engine used for RX allows IRQ aggregation. The patch sets up the
corresponding registers to aggregate 4 IRQs into one. Using aggregation
reduces the load on the core handling to a quarter thus reducing IRQ
latency and increasing RX performance by around 10%.
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Crispin [Mon, 19 Jun 2017 13:37:03 +0000 (15:37 +0200)]
net-next: mediatek: print phy status changes for non DSA GMACs
Currently PHY status changes are only printed for DSA ports. This patch
adds code to also print status changes for non-fixed links.
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 20 Jun 2017 17:37:04 +0000 (13:37 -0400)]
Merge branch 'vxlan-cleanup-and-IPv6-link-local-support'
Matthias Schiffer says:
====================
vxlan: cleanup and IPv6 link-local support
Running VXLANs over IPv6 link-local addresses allows to use them as a
drop-in replacement for VLANs, avoiding to allocate additional outer IP
addresses to run the VXLAN over.
Since v1, I have added a lot more consistency checks to the address
configuration, making sure address families and scopes match. To simplify
the implementation, I also did some general refactoring of the
configuration handling in the new first patch of the series.
The second patch is more cleanup; is slightly touches OVS code, so that
list is in CC this time, too.
As in v1, the last two patches actually make VXLAN over IPv6 link-local
work, and allow multiple VXLANs with the same VNI and port, as long as
link-local addresses on different interfaces are used. As suggested, I now
store in the flags field if the VXLAN uses link-local addresses or not.
v3 removes log messages as suggested by Roopa Prabhu (as it is very unusual
for errors in netlink requests to be printed to the kernel log.) The commit
message of patch 5 has been extended to add a note about IPv4.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Schiffer [Mon, 19 Jun 2017 08:04:00 +0000 (10:04 +0200)]
vxlan: allow multiple VXLANs with same VNI for IPv6 link-local addresses
As link-local addresses are only valid for a single interface, we can allow
to use the same VNI for multiple independent VXLANs, as long as the used
interfaces are distinct. This way, VXLANs can always be used as a drop-in
replacement for VLANs with greater ID space.
This also extends VNI lookup to respect the ifindex when link-local IPv6
addresses are used, so using the same VNI on multiple interfaces can
actually work.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Schiffer [Mon, 19 Jun 2017 08:03:59 +0000 (10:03 +0200)]
vxlan: fix snooping for link-local IPv6 addresses
If VXLAN is run over link-local IPv6 addresses, it is necessary to store
the ifindex in the FDB entries. Otherwise, the used interface is undefined
and unicast communication will most likely fail.
Support for link-local IPv4 addresses should be possible as well, but as
the semantics aren't as well defined as for IPv6, and there doesn't seem to
be much interest in having the support, it's not implemented for now.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Schiffer [Mon, 19 Jun 2017 08:03:58 +0000 (10:03 +0200)]
vxlan: check valid combinations of address scopes
* Multicast addresses are never valid as local address
* Link-local IPv6 unicast addresses may only be used as remote when the
local address is link-local as well
* Don't allow link-local IPv6 local/remote addresses without interface
We also store in the flags field if link-local addresses are used for the
follow-up patches that actually make VXLAN over link-local IPv6 work.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>