David S. Miller [Thu, 29 Jun 2017 16:30:16 +0000 (12:30 -0400)]
Merge tag 'mlx5-updates-2017-06-27' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2017-06-27 (Innova IPsec offload support)
This patchset adds support for Innova IPSec network interface card.
About Innova device:
--------------------
Innova is a network card with a ConnectX chip and an FPGA chip as a
bump-on-the-wire.
Internal
+----------+ Link +-----------------+
| +--------------+ FPGA | +------+
| ConnectX | | Shell +--+ QSFP |
| +--------------+ +-------+ | | Port |
+----------+ I2C | | SBU | | +------+
| +-------+ |
+--+----------+---+
| |
+--+--+ +---+---+
| DDR | | Flash |
+-----+ +-------+
The FPGA synthesized logic is loaded from dedicated flash storage and has
access to its own dedicated DDR RAM.
The ConnectX chip firmware programs the FPGA by accessing its configuration
space over either the slow internal I2C link or the high-speed internal link.
The FPGA logic is divided into a "Shell" and a "Sandbox Unit" (SBU).
mlx5_core driver (with CONFIG_MLX5_FPGA) handles all shell functionality,
while other components may handle the various SBU functionalities.
The driver opens high-speed reliable communication channels with the shell and
the SBU over the internal link.
These channels may be used for high-bandwidth configuration or for SBU-specific
out-of-band data paths.
About Innova IPSec device:
--------------------------
Innova IPSec is a network card that allows offloading IPSec cryptography operations
from the host CPU to the NIC. It is an Innova card with an IPSec SBU.
The hardware keeps the database of IPSec Security Associations (SADB) in the FPGA's
DDR memory.
Internal
+----------+ Link +-----------------+
| +--------------+ FPGA | +------+
| ConnectX | | Shell +--+ QSFP |
| +--------------+ +-------+ | | Port |
+----------+ Internal I2C | | IPSec | | +------+
| | SBU | |
| +-------+ |
+--+----------+---+
| |
+--+--+ +---+---+
| DDR | | |
| | | Flash |
|SADB | | |
+-----+ +-------+
Modes and ciphers:
Currently the following modes and ciphers are supported:
IPv4 and IPv6
ESP tunnel and transport modes
AES 128 and 256 bit encryption, with GCM authentication (RFC4106)
IV is generated using seqiv, in sync with Linux's geniv.
More modes and ciphers may be added later.
Notes:
In the future similar functionality will be included in a single-chip NIC.
About the driver:
-----------------
Patches 1-4 prepare some existing driver code for the new feature:
* Add support for reserved GIDs in the hardware GID table
* Allow multiple modules to enable hardware RoCE support independently
Patches 5-6 define structs and helper functions for QP work-queues.
Patches 7-11 add various FPGA-related features required for Innova.
IPSec.
Patch 12 adds abstraction layer for Mellanox IPSec-offload capable devices.
atches 13-16 add IPSec offload support to the mlx5 netdevice.
This driver services the new IPSec offload API introduced in commit
d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
Configuration Path:
If Innova IPSec device is detected, the mlx5e netdevice gets the new
NETIF_F_HW_ESP feature and the xdo callbacks, indicating ESP offload
capabilities, and also the matching TX checksum and GSO features.
The driver configures offloaded Security Associations (SAs) by sending
an ADD_SA or DEL_SA message to the IPSec SBU, which updates the SADB in DDR.
These messages and their responses are sent over a high-speed channel.
Counters for ethtool are retrieved by the driver from the SBU.
Data path:
On receive path, the SBU decrypts ESP packets which match the offloaded SADB,
but keeps them encapsulated.
The SBU injects metadata (Mellanox owned ethertype) indicating that crypto-offload
has taken place, the SA with which it was done, and the authentication result.
The ConnectX chip performs RX checksum offload on the packet, and RSS using the
ESP SPI value. The driver detects the special ethertype, and attaches a struct
secpath to the RX SKB, including flags to indicate that crypto offload took place,
the authentication result, and which xfrm_state was used for decryption, in the
olen and ovec members. The RX SKB may have useful CHECKSUM_COMPLETE. A separate
patchset will add support for that in the xfrm stack.
On transmit path, the stack encapsulates the packet but does not encrypt it, and
indicates in the SKB's secpath that crypto offload is to be performed and the SA
to use to do so.
The driver avoids performing crypto-offload for ESP fragments, and packets with
IP options, as the SBU cannot currently do that. For eligible packets, the driver
prepends a special ethertype with metadata instructing the hardware to perform crypto offload.
The stack builds regular (non-GSO) SKBs so that they contain a placeholder for the ESP trailer.
The driver trims it off, because the SBU automatically appends the trailer for offloaded packets.
The ConnectX chip performs TX checksum offload on inner UDP or TCP packets,
and GSO for TCP packets (duplicating the prepended metadata).
The segmented packets then undergo encryption in the SBU before going on the wire.
Performance:
We measure single stream of TCP on Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
Using AES-NI with ESP GSO we get constant 4.1 Gbps.
Using crypto offload we get constant 18 Gbps.
Note that these numbers require CHECKSUM_COMPLETE support in XFRM, which we submit separately.
- Ilan Tayari
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 29 Jun 2017 16:28:57 +0000 (12:28 -0400)]
Merge branch 'net-fix-sw-timestamping'
Ivan Khoronzhuk says:
====================
net: fix sw timestamping for non PTP packets
This series contains several corrections connected with timestamping
for cpsw and netcp drivers based on same cpts module.
Based on net/next
====================
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Tue, 27 Jun 2017 13:58:53 +0000 (16:58 +0300)]
net: ethernet: ti: netcp_ethss: use cpts to check if packet needs timestamping
There is cpts function to check if packet can be timstamped with cpts.
Seems that ptp_classify_raw cover all cases listed with "case".
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Tue, 27 Jun 2017 13:58:52 +0000 (16:58 +0300)]
net: ethernet: ti: cpsw: fix sw timestamping for non PTP packets
The cpts can timestmap only ptp packets at this moment, so driver
cannot mark every packet as though it's going to be timestamped,
only because h/w timestamping for given skb is enabled with
SKBTX_HW_TSTAMP. It doesn't allow to use sw timestamping, as result
outgoing packet is not timestamped at all if it's not PTP and h/w
timestamping is enabled. So, fix it by setting SKBTX_IN_PROGRESS
only for PTP packets.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Tue, 27 Jun 2017 13:58:51 +0000 (16:58 +0300)]
net: ethernet: ti: cpsw: move skb timestamp to packet_submit
Move sw timestamp function close to channel submit function.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Tue, 27 Jun 2017 10:56:54 +0000 (03:56 -0700)]
cavium: thunder: Remove duplicate "netdev->name" logging output
Using netdev_<level>(netdev, "%s: ...", netdev->name) duplicates the
name in the output. Remove those uses.
Miscellanea:
o Use the netif_<level> convenience macros at the same time
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 27 Jun 2017 10:36:49 +0000 (11:36 +0100)]
net/mlx4: fix spelling mistake: "enforcment" -> "enforcement"
Trivial fix to spelling mistake in mlx4_dbg debug message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 27 Jun 2017 09:51:22 +0000 (10:51 +0100)]
net: atl1c: fix spelling mistake: "droppted" -> "dropped"
Trivial fix to spelling mistake in netif_info message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Tue, 27 Jun 2017 09:28:06 +0000 (11:28 +0200)]
arm: sun8i: orangepi-2: use internal phy-mode
Since the PHY used is internal, simply set phy-mode as internal.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Tue, 27 Jun 2017 09:28:05 +0000 (11:28 +0200)]
arm: sun8i: nanopi-neo: use internal phy-mode
Since the PHY used is internal, simply set phy-mode as internal.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Tue, 27 Jun 2017 09:28:04 +0000 (11:28 +0200)]
arm: sun8i: orangepi-one: use internal phy-mode
Since the PHY used is internal, simply set phy-mode as internal.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Tue, 27 Jun 2017 09:28:03 +0000 (11:28 +0200)]
arm: sun8i: orangepi-zero: use internal phy-mode
Since the PHY used is internal, simply set phy-mode as internal.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Tue, 27 Jun 2017 09:28:02 +0000 (11:28 +0200)]
arm: sun8i: orangepipc: use internal phy-mode
Since the PHY used is internal, simply set phy-mode as internal.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Tue, 27 Jun 2017 09:28:01 +0000 (11:28 +0200)]
net: stmmac: support future possible different internal phy mode
The current way to find if the phy is internal is to compare DT phy-mode
and emac_variant/internal_phy.
But it will negate a possible future SoC where an external PHY use the
same phy mode than the internal one.
By using phy-mode = "internal" we permit to have an external PHY with
the same mode than the internal one.
Reported-by: André Przywara <andre.przywara@arm.com>
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Dilmore [Mon, 26 Jun 2017 15:49:46 +0000 (16:49 +0100)]
Bonding: Convert multiple netdev_info messages to netdev_dbg
The bond_options.c file contains multiple netdev_info statements that clutter kernel output.
This patch replaces all netdev_info with netdev_dbg and adds a netdev_dbg statement for the
packets per slave parameter. Also fixes misalignment at line 467.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Michael J Dilmore <michael.j.dilmore@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Jun 2017 19:48:50 +0000 (15:48 -0400)]
Merge branch 'nfp-get_phys_port_name-for-representors-and-SR-IOV-reorder'
Jakub Kicinski says:
====================
nfp: get_phys_port_name for representors and SR-IOV reorder
This series starts by making the error message if FW cannot be located
easier to understand. Then I move some functions from PCI probe files
into library code (nfpcore) where they belong, and remove one function
which is never used.
Next few patches equip representors with nfp_port structure and make
their NDOs fully shared (not defined in apps), thanks to which we can
easily determine which netdevs are NFP's by comparing the NDO pointers.
10th patch makes use of the shared NDOs and nfp_ports to deliver
netdev-type independent .ndo_get_phys_port_name() implementation.
Patches 11 and 12 reorder the nfp_app SR-IOV callbacks with enabling
SR-IOV VFs. Unfortunately due to how PCI subsystem works we can't
guarantee being able to disable SR-IOV at exit or that it will be
disabled when we first probe... We must therefore make sure FW is
able to deal with being loaded while SR-IOV is already on.
Patch 13 fixes potential deadlock when enabling SR-IOV happens at
the same time as port state refresh. Note that this can't happen
at this point, since Flower doesn't refresh ports... but lockdep
doesn't know about such details and we will have to deal with this
sooner or later anyway.
Last but not least a new Kconfig is added to make sure those who
don't care about flower offloads have a way of not including the
code in their kernels. Thanks to nfp_app separation this costs us
a single ifdef and excluding flower files from the build.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:28 +0000 (00:50 -0700)]
nfp: flower: add Kconfig for flower app
Give users an option not to build the flower-offload related code.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:27 +0000 (00:50 -0700)]
nfp: allocate a private workqueue for driver work
Since we grab pf->lock around pci_enable_sriov() we can no longer
safely queue work which may also grab that lock onto system workqueue.
pci_enable_sriov() will flush system workqueue as part to wait for VF
probing.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:26 +0000 (00:50 -0700)]
nfp: reorder SR-IOV config and nfp_app SR-IOV callbacks
We previously assumed that app callback can be guaranteed to be
executed before SR-IOV is actually enabled. Given that we can't
guarantee that SR-IOV will be disabled during probe or that we
will be able to disable it on remove, we should reorder the callbacks.
We should also call the app's sriov_enable if SR-IOV was enabled
during probe.
Application FW must be able to disable VFs internally and not depend
on them being removed at PCIe level.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:25 +0000 (00:50 -0700)]
nfp: handle SR-IOV already enabled when driver is probing
We assumed that when we probe number of enabled VFs will be at 0.
This doesn't have to be the case for example if previous driver left
SR-IOV enabled due to some VFs being assigned. Read the number of VFs
enabled. Fail probe if it's above current FWs limit.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:24 +0000 (00:50 -0700)]
nfp: wire get_phys_port_name on representors
Make nfp_port_get_phys_port_name() support new port types and
wire it up to representors' struct net_device_ops.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:23 +0000 (00:50 -0700)]
nfp: allow converting representor's netdev into nfp_port
Based on struct net_device_ops figure out if netdev is a nfp_repr.
Use this knowledge to convert netdev directly to nfp_port.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:22 +0000 (00:50 -0700)]
nfp: move representors' struct net_device_ops to shared code
Apps shouldn't declare their own struct net_device_ops for
representors, this makes sharing code harder. Add necessary
nfp_app callbacks and move the definition of representors'
struct net_device_ops to common code.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:21 +0000 (00:50 -0700)]
nfp: make the representor get stats app-independent
Thanks to the fact that all representors will now have an nfp_port,
we can depend on information there to provide a app-independent
.ndo_get_stats64().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:20 +0000 (00:50 -0700)]
nfp: spawn nfp_ports for PF and VF ports
nfp_port is an abstraction which is supposed to allow us sharing
code between different netdev types (vNIC vs repr). Spawn ports
for PFs and VFs to enable this sharing.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:19 +0000 (00:50 -0700)]
nfp: add nfp_app cleanup callback and make flower use it
Add a cleanup callback for undoing what app init callback did.
Make flower allocate its private structure on init and free
it from the new callback.
While at it remember to set the app pointer to NULL on the
error path to avoid any races while probe path unwinds.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:18 +0000 (00:50 -0700)]
nfp: remove unused nfp_cpp_area_check_range()
Remove unused nfp_cpp_area_check_range() function.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:17 +0000 (00:50 -0700)]
nfp: add helper for mapping runtime symbols
Move most of the helper for mapping RTsyms from nfp_net_main.c
to nfpcore. Use the new helper directly for mapping MAC statistics,
since they don't need to include the PCIe interface ID in the symbol
name.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:16 +0000 (00:50 -0700)]
nfp: move area mapping helper into nfpcore
nfp_net_map_area() is a helper for mapping areas of NFP memory
defined in nfp_net_main.c. Move it to nfpcore to allow reuse
and rename accordingly. Create an additional helper -
nfp_cpp_area_alloc_acquire() the opposite of already existing
nfp_cpp_area_release_free().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 27 Jun 2017 07:50:15 +0000 (00:50 -0700)]
nfp: explicitly check if application FW is loaded
We support application FW being either loaded automatically at
boot from flash or (more commonly) by the driver from disk.
If FW is not found on disk and nothing is preloaded users are
faced with this unintuitive error:
nfp 0000:04:00.0: nfp: Failed to find PF symbol _pf0_net_bar0
We can do better. Since we rely on symbol table being present -
check early if it could be correctly read out of from the device
and if not print a more informative message.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Jun 2017 19:43:57 +0000 (15:43 -0400)]
Merge branch 'udp-ipv6-use-scratch-helpers'
Paolo Abeni says:
====================
ipv6: udp: exploit dev_scratch helpers
When bringing in the recent cache optimization for the UDP protocol, I forgot
to leverage the newly introduced scratched area helpers in the UDPv6 code path.
As a result, the UDPv6 implementation suffers some unnecessary performance
penality when compared to v4.
This series aim to bring back UDPv6 on equal footing in respect to v4.
The first patch moves the shared helpers to the common include files, while
the second uses them in the UDPv6 code.
This gives 5-8% performance improvement for a system under flood with small
UDPv6 packets. The performance delta is less than the one reported on the
original patch set because the UDPv6 code path already leveraged some of the
optimization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 26 Jun 2017 17:01:51 +0000 (19:01 +0200)]
ipv6: udp: leverage scratch area helpers
The commit
b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
leveraged the scratched area helpers for UDP v4 but I forgot to
update accordingly the IPv6 code path.
This change extends the scratch area usage to the IPv6 code, synching
the two implementations and giving some performance benefit.
IPv6 is again almost on the same level of IPv4, performance-wide.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 26 Jun 2017 17:01:50 +0000 (19:01 +0200)]
udp: move scratch area helpers into the include file
So that they can be later used by the IPv6 code, too.
Also lift the comments a bit.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Mon, 26 Jun 2017 15:36:47 +0000 (08:36 -0700)]
tcp: fix null ptr deref in getsockopt(..., TCP_ULP, ...)
If icsk_ulp_ops is unset, it dereferences a null ptr.
Add a null ptr check.
BUG: KASAN: null-ptr-deref in copy_to_user include/linux/uaccess.h:168 [inline]
BUG: KASAN: null-ptr-deref in do_tcp_getsockopt.isra.33+0x24f/0x1e30 net/ipv4/tcp.c:3057
Read of size 4 at addr
0000000000000020 by task syz-executor1/15452
Signed-off-by: Dave Watson <davejwatson@fb.com>
Reported-by: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Schiffer [Tue, 27 Jun 2017 12:42:43 +0000 (14:42 +0200)]
vxlan: fix incorrect nlattr access in MTU check
The access to the wrong variable could lead to a NULL dereference and
possibly other invalid memory reads in vxlan newlink/changelink requests
with a IFLA_MTU attribute.
Fixes: a985343ba906 "vxlan: refactor verification and application of configuration"
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vincent Bernat [Tue, 27 Jun 2017 13:42:57 +0000 (15:42 +0200)]
net: remove policy-routing.txt documentation
It dates back from 2.1.16 and is obsolete since 2.1.68 when the current
rule system has been introduced.
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilan Tayari [Thu, 22 Jun 2017 09:01:17 +0000 (12:01 +0300)]
net/mlx5e: IPSec, Add IPSec ethtool stats
Add Innova IPSec SBU counters to the ethtool -S stats.
Add IPSec offload error counters to the ethtool -S stats.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Tue, 18 Apr 2017 13:08:23 +0000 (16:08 +0300)]
net/mlx5e: IPSec, Add Innova IPSec offload TX data path
In the TX data path, prepend a special metadata ethertype which
instructs the hardware to perform cryptography.
In addition, fill Software-Parser segment in TX descriptor so
that the hardware may parse the ESP protocol, and perform TX
checksum offload on the inner payload.
Support GSO, by providing the inverse of gso_size in the metadata.
This allows the FPGA to update the ESP header (seqno and seqiv) on the
resulting packets, by calculating the packet number within the GSO
back from the TCP sequence number.
Note that for GSO SKBs, the stack does not include an ESP trailer,
unlike the non-GSO case.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Mon, 19 Jun 2017 11:04:36 +0000 (14:04 +0300)]
net/mlx5e: IPSec, Add Innova IPSec offload RX data path
In RX data path, the hardware prepends a special metadata ethertype
which indicates that the packet underwent decryption, and the result of
the authentication check.
Communicate this to the stack in skb->sp.
Make wqe_size large enough to account for the injected metadata.
Support only Linked-list RQ type.
IPSec offload RX packets may have useful CHECKSUM_COMPLETE information,
which the stack may not be able to use yet.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Tue, 18 Apr 2017 13:04:28 +0000 (16:04 +0300)]
net/mlx5e: IPSec, Innova IPSec offload infrastructure
Add Innova IPSec ESP crypto offload configuration paths.
Detect Innova IPSec device and set the NETIF_F_HW_ESP flag.
Configure Security Associations using the API introduced in a previous
patch.
Add Software-parser hardware descriptor layout
Software-Parser (swp) is a hardware feature in ConnectX which allows the
host software to specify protocol header offsets in the TX path, thus
overriding the hardware parser.
This is useful for protocols that the ASIC may not be able to parse on
its own.
Note that due to inline metadata, XDP is not supported in Innova IPSec.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Tue, 25 Apr 2017 19:42:31 +0000 (22:42 +0300)]
net/mlx5: Accel, Add IPSec acceleration interface
Add routines for manipulating the hardware IPSec SA database (SADB).
In Innova IPSec, a Security Association (SA) is added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.
Add implementation for Innova IPSec (FPGA-based) hardware.
These routines will be used by the IPSec offload support in a later patch
However they may also be used by others such as RDMA and RoCE IPSec.
mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs
to work directly with mlx5_core rather than Innova FPGA or other mlx5
acceleration providers.
In this patchset we add Innova IPSec support and mlx5/accel delegates
IPSec offloads to Innova routines.
In the future, when IPSec/TLS or any other acceleration gets integrated
into ConnectX chip, mlx5/accel layer will provide the integrated
acceleration, rather than the Innova one.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Tue, 18 Apr 2017 10:10:41 +0000 (13:10 +0300)]
net/mlx5: FPGA, Add SBU infrastructure
Add interface to initialize and interact with Innova FPGA SBU
connections.
A client driver may use these functions to set up a high-speed DMA
connection with its SBU hardware logic, and send/receive messages
over this connection.
A later patch in this patchset will make use of these functions for
Innova IPSec offload in mlx5 Ethernet driver.
Add commands to retrieve Innova FPGA SBU capabilities, and to
read/write Innova FPGA configuration space registers and memory,
over internal I2C.
At high level, the FPGA configuration space is divided such:
0x00000000 - 0x007fffff is reserved for the SBU
0x00800000 - 0xffffffff is reserved for the Shell
0x400000000 - ... is DDR memory
A later patchset will add support for accessing FPGA CrSpace and memory
over a high-speed connection. This is the reason for the ACCESS_TYPE
enumeration, which currently only supports I2C.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Tue, 18 Apr 2017 09:54:27 +0000 (12:54 +0300)]
net/mlx5: FPGA, Add SBU bypass and reset flows
The Innova FPGA includes shell hardware and Sandbox-Unit (SBU) hardware.
The shell hardware is handled by mlx5_core itself, while the SBU is
handled by a client driver.
Reset the SBU to a well-known initial state when initializing a new
device, and set the FPGA to bypass mode when uninitializing a device.
This allows the client driver to assume that its device has been
reset when a new device is detected.
During SBU reset, the FPGA is put into SBU-bypass mode. In this mode
packets do not pass through the SBU, so it cannot affect the network
data stream at all.
A factory-image does not have an SBU, so skip these flows.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Mon, 27 Mar 2017 11:48:38 +0000 (14:48 +0300)]
net/mlx5: FPGA, Add high-speed connection routines
An FPGA high-speed connection has two endpoints, an FPGA QP and a
ConnectX QP.
Add library routines to create and connect the endpoints of an
FPGA high-speed connection.
These routines allow creating and interacting with both types of
connections: Shell and Sandbox Unit (SBU).
Shell connection provides an interface to the FPGA's address space,
which includes the configuration space and the DDR.
Use of the shell connection will be introduced in a later patchset.
SBU connection provides a command and/or data interface to the
application-specific logic within the FPGA.
Use of the SBU connection will be introduced in a later patch in
this patchset.
Some struct definitions are added to a new header file sdk.h, which
will be extended in later patches in the patchset.
This header file will contain the in-kernel FPGA client driver API.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Mon, 27 Mar 2017 11:52:09 +0000 (14:52 +0300)]
net/mlx5: FPGA, Add FW commands for FPGA QPs
The FPGA QP is a high-bandwidth communication channel between the host
CPU and the FPGA device. It allows performing DMA operations between
host memory and the FPGA logic via the ConnectX chip.
Add ConnectX FW commands which create and manipulate FPGA QPs.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Wed, 14 Jun 2017 07:19:54 +0000 (10:19 +0300)]
net/mlx5: FPGA, Move FPGA init/cleanup to init_once
The FPGA init and cleanup routines should be called just once per
device.
Move them to the init_once and cleanup_once routines.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Sun, 26 Mar 2017 14:46:03 +0000 (17:46 +0300)]
net/mlx5: Add QP WQ support
A QP in ConnectX is a concatenation of RQ and SQ which share a QP-number
and work together.
Add support for allocating and managing the work-queue buffer for a QP, in
a similar way to how SQs and RQs are already supported.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Mon, 19 Jun 2017 09:53:25 +0000 (12:53 +0300)]
net/mlx5: Make get_cqe routine not ethernet-specific
Move mlx5e_get_cqe routine to wq.h and rename it to
mlx5_cqwq_get_cqe.
This allows it to be used by other CQ users outside of the
ethernet driver code.
A later patch in this patchset will make use of it from
FPGA code for the FPGA high-speed connection.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Sun, 14 May 2017 13:04:30 +0000 (16:04 +0300)]
IB/mlx5: Respect mlx5_core reserved GIDs
Reserved gids are taken by the mlx5_core, report smaller GID table
size to IB core.
Set mlx5_query_roce_port's return value back to int. In case of
error, return an indication. This rolls back some of the change
in commit
50f22fd8ecf9 ("IB/mlx5: Set mlx5_query_roce_port's return value to void")
Change set_roce_addr to use gid_set function, instead of directly
sending the command.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Sun, 26 Mar 2017 14:23:42 +0000 (17:23 +0300)]
net/mlx5: Add support for multiple RoCE enable
Previously, only mlx5_ib enabled RoCE on the port, but FPGA needs it as
well.
Add support for counting number of enables, so that FPGA and IB can work
in parallel and independently.
Program the HW to enable RoCE on the first enable call, and program to
disable RoCE on the last disable call.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Sun, 26 Mar 2017 14:01:57 +0000 (17:01 +0300)]
net/mlx5: Add reserved-gids support
Reserved GIDs are entries in the GID table in use by the mlx5_core
and its submodules (e.g. FPGA, SRIOV, E-Swtich, netdev).
The entries are reserved at the high indexes of the GID table.
A mlx5 submodule may reserve a certain amount of GIDs for its own use
during the load sequence by calling mlx5_core_reserve_gids, and must
also take care to un-reserve these GIDs when it closes.
Reservation is only allowed during the load sequence and before any
interfaces (e.g. mlx5_ib or mlx5_en) are up.
After reservation, a submodule may call mlx5_core_reserved_gid_alloc/
free to allocate entries from the reserved GIDs pool.
Reserve a GID table entry for every supported FPGA QP.
A later patch in the patchset will remove them from being reported to
IB core.
Another such patch will make use of these for FPGA QPs in Innova NIC.
Added lib/mlx5.h to serve as a library for mlx5 submodlues, and to
expose only public mlx5 API, more mlx5 library files will be added in
future submissions.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ilan Tayari [Thu, 25 May 2017 05:42:07 +0000 (08:42 +0300)]
net/mlx5: Set interface flags before cleanup in unload_one
In load_one, the interface flags are changed from down to up,
only after initializing the interfaces.
In unload_one, the flags are changed from up to down before the
interface cleanup.
Change the cleanup order to be opposite to initialization order.
This fixes flag consistency between init and cleanup.
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Colin Ian King [Mon, 26 Jun 2017 12:53:46 +0000 (13:53 +0100)]
net/mlx4: fix spelling mistake: "coalesing" -> "coalescing"
Trivial fix to spelling mistake in en_dbg debug message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Jun 2017 03:13:23 +0000 (23:13 -0400)]
Merge branch 'net-add-netlink_ext_ack-support-to-rtnl_link_ops'
Matthias Schiffer says:
====================
net: add netlink_ext_ack support to rtnl_link_ops
Same changes as http://patchwork.ozlabs.org/patch/780351/ , split into
separate patches for each rtnl_link_ops field as requested.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Schiffer [Sun, 25 Jun 2017 21:56:03 +0000 (23:56 +0200)]
net: add netlink_ext_ack argument to rtnl_link_ops.slave_validate
Add support for extended error reporting.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Schiffer [Sun, 25 Jun 2017 21:56:02 +0000 (23:56 +0200)]
net: add netlink_ext_ack argument to rtnl_link_ops.slave_changelink
Add support for extended error reporting.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Schiffer [Sun, 25 Jun 2017 21:56:01 +0000 (23:56 +0200)]
net: add netlink_ext_ack argument to rtnl_link_ops.validate
Add support for extended error reporting.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Schiffer [Sun, 25 Jun 2017 21:56:00 +0000 (23:56 +0200)]
net: add netlink_ext_ack argument to rtnl_link_ops.changelink
Add support for extended error reporting.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Matthias Schiffer [Sun, 25 Jun 2017 21:55:59 +0000 (23:55 +0200)]
net: add netlink_ext_ack argument to rtnl_link_ops.newlink
Add support for extended error reporting.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Grzeschik [Fri, 23 Jun 2017 14:54:10 +0000 (16:54 +0200)]
net: macb: add fixed-link node support
In case the MACB is directly connected to a
non-mdio PHY/device, it should be possible to provide
a fixed link configuration in the DT.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 25 Jun 2017 18:45:34 +0000 (14:45 -0400)]
Merge tag 'wireless-drivers-next-for-davem-2017-06-25' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.13
New features and bug fixes to quite a few different drivers, but
nothing really special standing out.
What makes me happy that we have now more vendors actively
contributing to upstream drivers. In this pull request we have patches
from Broadcom, Intel, Qualcomm, Realtek and Redpine Signals, and I
still have patches from Marvell and Quantenna pending in patchwork. Now
that's something comparing to how things looked 11 years ago in Jeff
Garzik's "State of the Union: Wireless" email:
https://lkml.org/lkml/2006/1/5/671
Major changes:
wil6210
* add low level RF sector interface via nl80211 vendor commands
* add module parameter ftm_mode to load separate firmware for factory
testing
* support devices with different PCIe bar size
* add support for PCIe D3hot in system suspend
* remove ioctl interface which should not be in a wireless driver
ath10k
* go back to using dma_alloc_coherent() for firmware scratch memory
* add per chain RSSI reporting
brcmfmac
* add support multi-scheduled scan
* add scheduled scan support for specified BSSIDs
* add support for brcm43430 revision 0
wlcore
* add wil1285 compatible
rsi
* add RS9113 USB support
iwlwifi
* FW API documentation improvements (for tools and htmldoc)
* continuing work for the new A000 family
* bump the maximum supported FW API to 31
* improve the differentiation between 8000, 9000 and A000 families
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 25 Jun 2017 18:43:53 +0000 (14:43 -0400)]
Merge branch 'sctp-RFC-4960-Errata-fixes'
Marcelo Ricardo Leitner says:
====================
sctp: RFC 4960 Errata fixes
This patchset contains fixes for 4 Errata topics from
https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01
Namely, sections:
3.12. Order of Adjustments of partial_bytes_acked and cwnd
3.22. Increase of partial_bytes_acked in Congestion Avoidance
3.26. CWND Increase in Congestion Avoidance Phase
3.27. Refresh of cwnd and ssthresh after Idle Period
Tests performed with netperf using net namespaces, with drop rates at
0%, 0.5% and 1% by netem, IPv4 and IPv6, 10 runs for each combination.
I couldn't spot differences on the stats. With and without these patches
the results vary in a similar way in terms of throughput and
retransmissions.
Tests with 20ms delay and 20ms delay + drops at 0.5% and 1% also had
results in a similar way, no noticeable difference.
Looking at cwnd, it was possible to notice slightly lower values being
used while still sustaining same throughput profile.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 23 Jun 2017 22:59:36 +0000 (19:59 -0300)]
sctp: adjust ssthresh when transport is idle
RFC 4960 Errata 3.27 identifies that ssthresh should be adjusted to cwnd
because otherwise it could cause the transport to lock into congestion
avoidance phase specially if ssthresh was previously reduced by some
packet drop, leading to poor performance.
The Errata says to adjust ssthresh to cwnd only once, though the same
goal is achieved by updating it every time we update cwnd too. The
caveat is that we could take longer to get back up to speed but that
should be compensated by the fact that we don't adjust on RTO basis (as
RFC says) but based on Heartbeats, which are usually way longer.
See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.27
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 23 Jun 2017 22:59:35 +0000 (19:59 -0300)]
sctp: adjust cwnd increase in Congestion Avoidance phase
RFC4960 Errata 3.26 identified that at the same time RFC4960 states that
cwnd should never grow more than 1*MTU per RTT, Section 7.2.2 was
underspecified and as described could allow increasing cwnd more than
that.
This patch updates it so partial_bytes_acked is maxed to cwnd if
flight_size doesn't reach cwnd, protecting it from such case.
See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.26
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 23 Jun 2017 22:59:34 +0000 (19:59 -0300)]
sctp: allow increasing cwnd regardless of ctsn moving or not
As per RFC4960 Errata 3.22, this condition is not needed anymore as it
could cause the partial_bytes_acked to not consider the TSNs acked in
the Gap Ack Blocks although they were received by the peer successfully.
This patch thus drops the check for new Cumulative TSN Ack Point,
leaving just the flight_size < cwnd one.
See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.22
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 23 Jun 2017 22:59:33 +0000 (19:59 -0300)]
sctp: update order of adjustments of partial_bytes_acked and cwnd
RFC4960 Errata 3.12 says RFC4960 is unclear about the order of
adjustments applied to partial_bytes_acked and cwnd in the congestion
avoidance phase, and that the actual order should be:
partial_bytes_acked is reset to (partial_bytes_acked - cwnd). Next, cwnd
is increased by MTU.
We were first increasing cwnd, and then subtracting the new value pba,
which leads to a different result as pba is smaller than what it should
and could cause cwnd to not grow as much.
See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.12
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 25 Jun 2017 08:09:12 +0000 (11:09 +0300)]
net: Remove ndo_dfwd_start_xmit
Looks like commit
f663dd9aaf9e ("net: core: explicitly select a txq before doing l2 forwarding")
has removed the need for this dedicated xmit function [it even explicitly
states so in its commit log message] but it hasn't removed the definition
of the ndo.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
CC: Jason Wang <jasowang@redhat.com>
CC: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 25 Jun 2017 15:44:29 +0000 (11:44 -0400)]
Merge branch 'qcom-emac-various-minor-improvements'
Timur Tabi says:
====================
net: qcom/emac: various minor improvements
A collection of minor fixes and features to the Qualcomm Technologies
EMAC network driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Fri, 23 Jun 2017 19:33:30 +0000 (14:33 -0500)]
net: qcom/emac: add support for emulation systems
On emulation systems, the EMAC's internal PHY ("SGMII") is not present,
but is not needed for network functionality. So just display a warning
message and ignore the SGMII.
Tested-by: Philip Elcan <pelcan@codeaurora.org>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Fri, 23 Jun 2017 19:33:29 +0000 (14:33 -0500)]
net: qcom/emac: do not reset the EMAC during initialization
On ACPI systems, the driver depends on firmware pre-initializing the
EMAC because we don't have access to the clocks, and the EMAC has specific
clock programming requirements. Therefore, we don't want to reset the
EMAC while we are completing the initialization.
Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi [Fri, 23 Jun 2017 19:33:28 +0000 (14:33 -0500)]
net: qcom/emac: add shutdown function
The shutdown function halts all DMA and interrupts, so that all
operations are discontinued when the system shuts down, e.g. via
kexec or a forced reboot.
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mateusz Jurczyk [Fri, 23 Jun 2017 17:32:28 +0000 (19:32 +0200)]
af_iucv: Move sockaddr length checks to before accessing sa_family in bind and connect handlers
Verify that the caller-provided sockaddr structure is large enough to
contain the sa_family field, before accessing it in bind() and connect()
handlers of the AF_IUCV socket. Since neither syscall enforces a minimum
size of the corresponding memory region, very short sockaddrs (zero or
one byte long) result in operating on uninitialized memory while
referencing .sa_family.
Fixes: 52a82e23b9f2 ("af_iucv: Validate socket address length in iucv_sock_bind()")
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
[jwi: removed unneeded null-check for addr]
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hans Wippel [Fri, 23 Jun 2017 17:32:27 +0000 (19:32 +0200)]
net/iucv: improve endianness handling
Use proper endianness conversion for an skb protocol assignment. Given
that IUCV is only available on big endian systems (s390), this simply
avoids an endianness warning reported by sparse.
Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 23 Jun 2017 15:17:04 +0000 (18:17 +0300)]
net: dsa: mv88e6xxx: fix error code in mv88e6390_serdes_power()
We're accidentally returning the wrong variable. "cmode" is
uninitialized at this point so it causes a static checker warning.
Fixes: 6335e9f2446b ("net: dsa: mv88e6xxx: mv88e6390X SERDES support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 25 Jun 2017 15:42:09 +0000 (11:42 -0400)]
Merge branch 'nfp-add-flower-app-with-representors'
Simon Horman says:
====================
nfp: add flower app with representors
this series adds a flower app to the NFP driver.
It initialises four types of netdevs:
* PF netdev - lower-device for communication of packets to device
* PF representor netdev
* VF representor netdevs
* Phys port representor netdevs
The PF netdev acts as a lower-device which sends and receives packets to
and from the firmware. The representors act as upper-devices. For TX
representors attach a metadata dst to the skb which is used by the PF
netdev to prepend metadata to the packet before forwarding the firmware. On
RX the PF netdev looks up the representor based on the prepended metadata
received from the firmware and forwards the skb to the representor after
removing the metadata.
Control queues are used to send and receive control messages which are
used to communicate configuration information with the firmware. These
are in separate vNIC to the queues belonging to the PF netdev. The control
queues are not exposed to use-space via a netdev or any other means.
The first 9 patches of this series provide app-independent infrastructure
to instantiate representors and the remaining 3 patches provide an app
which uses this infrastructure.
As the name implies this app is targeted at providing offload of TC flower.
Flower offload - allowing classifiers to be attached to representor netdevs
- is intended to be provided by follow-up patches at which point it will
become the dominant feature of the app.
Minor changes since v2 noted in changelogs of individual patches.
Review of v1 and v2 of this patchset have been addressed either
through discussion on-list or changes in this patchset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 23 Jun 2017 20:12:09 +0000 (22:12 +0200)]
nfp: add VF and PF representors to flower app
Initialise VF and PF representors in flower app.
Based in part on work by Benjamin LaHaise, Bert van Leeuwen and
Jakub Kicinski.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 23 Jun 2017 20:12:08 +0000 (22:12 +0200)]
nfp: add flower app
Add app for flower offload. At this point the PF netdev and phys port
representor netdevs are initialised. Follow-up work will add support for
VF and PF representors and beyond that offloading the flower classifier.
Based in part on work by Benjamin LaHaise and Bert van Leeuwen.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 23 Jun 2017 20:12:07 +0000 (22:12 +0200)]
nfp: add support for control messages for flower app
In preparation for adding a new flower app - targeted at offloading
the flower classifier - provide support for control message that it will
use to communicate with the NFP.
Based in part on work by Bert van Leeuwen.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 23 Jun 2017 20:12:06 +0000 (22:12 +0200)]
nfp: add support for tx/rx with metadata portid
Allow tx/rx with metadata port id. This will be used for tx/rx of
representor netdevs acting as upper-devices while a pf netdev acts
as a lower-device.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 23 Jun 2017 20:12:05 +0000 (22:12 +0200)]
nfp: provide nfp_port to of nfp_net_get_mac_addr()
Provide port rather than vNIC as parameter of nfp_net_get_mac_addr.
This is to allow this function to be used by representor netdevs where
a vNIC may have more than one physical port none of which are associated
with the vNIC.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 23 Jun 2017 20:12:04 +0000 (22:12 +0200)]
nfp: app callbacks for SRIOV
Add app-callbacks for app-specific initialisation of SRIOV.
Disabling SRIOV is brought forward in nfp_pci_remove()
so that nfp_app_sriov_disable is called while the app still exists.
This is intended to be used to implement representor netdevs for virtual
ports.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 23 Jun 2017 20:12:03 +0000 (22:12 +0200)]
nfp: add stats and xmit helpers for representors
Provide helpers for stats and xmit on representor netdevs.
Parts based on work by Bert van Leeuwen, Benjamin LaHaise and
Jakub Kicinski.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 23 Jun 2017 20:12:02 +0000 (22:12 +0200)]
nfp: general representor implementation
Provide infrastructure to create and destroy representors of a given type.
Parts based on work by Bert van Leeuwen, Benjamin LaHaise,
and Jakub Kicinski.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 23 Jun 2017 20:12:01 +0000 (22:12 +0200)]
nfp: map mac_stats and vf_cfg BARs
If present map mac_stats and vf_cfg BARs. These will be used by
representor netdevs to read statistics for phys port and vf representors.
Also provide defines describing the layout of the mac_stats area.
Similar defines are already present for the cf_cfg area.
Based in part on work by Jakub Kicinski.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 23 Jun 2017 20:12:00 +0000 (22:12 +0200)]
nfp: move physical port init into a helper
Move MAC/PHY port init into a helper to make it easier to reuse
it in the representor code.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 23 Jun 2017 20:11:59 +0000 (22:11 +0200)]
nfp: devlink add support for getting eswitch mode
Add app callback for reporting eswitch mode. Non-SRIOV apps
should not implement this callback, nfp_app code will then
respond with -EOPNOTSUPP.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 23 Jun 2017 20:11:58 +0000 (22:11 +0200)]
net: store port/representator id in metadata_dst
Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
representators and control messages over single set of hardware queues.
Control messages and muxed traffic may need ordered delivery.
Those requirements make it hard to comfortably use TC infrastructure today
unless we have a way of attaching metadata to skbs at the upper device.
Because single set of queues is used for many netdevs stopping TC/sched
queues of all of them reliably is impossible and lower device has to
retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
the fastpath.
This patch attempts to enable port/representative devs to attach metadata
to skbs which carry port id. This way representatives can be queueless and
all queuing can be performed at the lower netdev in the usual way.
Traffic arriving on the port/representative interfaces will be have
metadata attached and will subsequently be queued to the lower device for
transmission. The lower device should recognize the metadata and translate
it to HW specific format which is most likely either a special header
inserted before the network headers or descriptor/metadata fields.
Metadata is associated with the lower device by storing the netdev pointer
along with port id so that if TC decides to redirect or mirror the new
netdev will not try to interpret it.
This is mostly for SR-IOV devices since switches don't have lower netdevs
today.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Jun 2017 19:06:44 +0000 (15:06 -0400)]
Merge branch 'phy-internal'
Florian Fainelli says:
====================
net: phy: Support "internal" PHY interface
This makes the "internal" phy-mode property generally available and
documented and this allows us to remove some custom parsing code
we had for bcmgenet and bcm_sf2 which both used that specific value.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 23 Jun 2017 17:33:16 +0000 (10:33 -0700)]
net: dsa: bcm_sf2: Remove special handling of "internal" phy-mode
The PHY library now supports an "internal" phy-mode, thus making our
custom parsing code now unnecessary.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 23 Jun 2017 17:33:15 +0000 (10:33 -0700)]
net: bcmgenet: Remove special handling of "internal" phy-mode
The PHY library now supports an "internal" phy-mode, thus making our
custom parsing code now unnecessary.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 23 Jun 2017 17:33:14 +0000 (10:33 -0700)]
net: phy: Support "internal" PHY interface
Now that the Device Tree binding has been updated, update the PHY
library phy_interface_t and phy_modes to support the "internal" PHY
interface type.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 23 Jun 2017 17:33:13 +0000 (10:33 -0700)]
dt-bindings: Add "internal" as a valid 'phy-mode' property
A number of Ethernet MACs have internal Ethernet PHYs and the internal
wiring makes it so that this knowledge needs to be available using the
standard 'phy-mode' property.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Jun 2017 18:24:28 +0000 (14:24 -0400)]
Merge tag 'mlx5-updates-2017-06-23' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2017-06-23
This series provides some updates to the mlx5 core and netdevice drivers.
Three patches from Tariq, Introduces page reuse mechanism in non-Striding
RQ RX datapath, we allow the the RX descriptor to reuse its allocated page
as much as it could, until the page is fully consumed. RX page reuse
reduces the stress on page allocator and improves RX performance especially
with high speeds (100Gb/s).
Next four patches of the series from Or allows to offload tc flower matching
on ttl/hoplimit and header re-write of hoplimit.
The rest of the series from Yotam and Or enhances mlx5 to support FW flashing
through the mlxfw module, in a similar manner done by the mlxsw driver.
Currently, only ethtool based flashing is implemented, where both Eth and IB ports
are supported.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Fri, 23 Jun 2017 13:44:37 +0000 (19:14 +0530)]
cxgb4: Use Firmware params to get buffer-group map
Buffer group mappings can be obtained using FW_PARAMs cmd for newer FW.
Since some of the bg_maps are obtained in atomic context, created another
t4_query_params_ns(), that wont sleep when awaiting mbox cmd completion.
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Fri, 23 Jun 2017 13:44:36 +0000 (19:14 +0530)]
cxgb4: Update T6 Buffer Group and Channel Mappings
We were using t4_get_mps_bg_map() for both t4_get_port_stats()
to determine which MPS Buffer Groups to report statistics on for a given
Port, and also for t4_sge_alloc_rxq() to provide a TP Ingress Channel
Congestion Map. For T4/T5 these are actually the same values (because they
are ~somewhat~ related), but for T6 they should return different values
(T6 has Port 0 associated with MPS Buffer Group 0 (with MPS Buffer Group 1
silently cascading off) and Port 1 is associated with MPS Buffer Group 2
(with 3 cascading off)).
Based on the original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 23 Jun 2017 10:15:44 +0000 (13:15 +0300)]
tls: return -EFAULT if copy_to_user() fails
The copy_to_user() function returns the number of bytes remaining but we
want to return -EFAULT here.
Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Jun 2017 18:17:31 +0000 (14:17 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2017-06-23
1) Use memdup_user to spmlify xfrm_user_policy.
From Geliang Tang.
2) Make xfrm_dev_register static to silence a sparse warning.
From Wei Yongjun.
3) Use crypto_memneq to check the ICV in the AH protocol.
From Sabrina Dubroca.
4) Remove some unused variables in esp6.
From Stephen Hemminger.
5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
From Antony Antony.
6) Include the UDP encapsulation port to km_migrate announcements.
From Antony Antony.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Jun 2017 18:15:12 +0000 (14:15 -0400)]
Merge branch 'ena-new-features-and-improvements'
Netanel Belgazal says:
====================
net: update ena ethernet driver to version 1.2.0
This patchset contains some new features/improvements that were added
to the ENA driver to increase its robustness and are based on
experience of wide ENA deployment.
Change log:
V2:
* Remove patch that add inline to C-file static function (contradict coding style).
* Remove patch that moves MTU parameter validation in ena_change_mtu() instead of
using the network stack.
* Use upper_32_bits()/lower_32_bits() instead of casting.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Fri, 23 Jun 2017 08:22:00 +0000 (11:22 +0300)]
net: ena: update ena driver to version 1.2.0
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Fri, 23 Jun 2017 08:21:59 +0000 (11:21 +0300)]
net: ena: update driver's rx drop statistics
rx drop counter is reported by the device in the keep-alive
event.
update the driver's counter with the device counter.
Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>