git.openwrt.org Git - openwrt/staging/blogic.git/log

amd-xgbe: Improve SFP 100Mbps auto-negotiation

After changing speed to 100Mbps as a result of auto-negotiation (AN),
some 10/100/1000Mbps SFPs indicate a successful link (no faults or loss
of signal), but cannot successfully transmit or receive data. These
SFPs required an extra auto-negotiation (AN) after the speed change in
order to operate properly. Add a quirk for these SFPs so that if the
outcome of the AN actually results in changing to a new speed, re-initiate
AN at that new speed.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Update the BelFuse quirk to support SGMII

Instead of using a quirk to make the BelFuse 1GBT-SFP06 part look like
a 1000baseX part, program the SFP PHY to support SGMII and 10/100/1000
baseT.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Advertise FEC support with the KR re-driver

When a KR re-driver is present, indicate the FEC support is available
during auto-negotiation.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Always attempt link training in KR mode

Link training is always attempted when in KR mode, but the code is
structured to check if link training has been enabled before attempting
to perform it. Since that check will always be true, simplify the code
to always enable and start link training during KR auto-negotiation.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Add ethtool show/set channels support

Add ethtool support to show and set the device channel configuration.
Changing the channel configuration will result in a device restart.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Prepare for ethtool set-channel support

In order to support being able to dynamically set/change the number of
Rx and Tx channels, update the code to:
- Move alloc and free of device memory into callable functions
- Move setting of the real number of Rx and Tx channels to device startup
- Move mapping of the RSS channels to device startup

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Add ethtool show/set ring parameter support

Add ethtool support to show and set the number of the Rx and Tx ring
descriptors. Changing the ring configuration will result in a device
restart.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Add ethtool support to retrieve SFP module info

Add support to get SFP module information using ethtool.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Remove field that indicates SFP diagnostic support

The driver currently sets an indication of whether the SFP supports, and
that the driver can obtain, diagnostics data. This isn't currently used
by the driver and the logic to set this indicator is flawed because the
field is cleared each time the SFP is checked and only set when a new SFP
is detected. Remove this field and the logic supporting it.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Remove use of comm_owned field

The comm_owned field can hide logic where double locking is attempted
and prevent multiple threads for the same device from accessing the
mutex properly. Remove the comm_owned field and use the mutex API
exclusively for gaining ownership. The current driver has been audited
and is obtaining communications ownership properly.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Read and save the port property registers during probe

Read and save the port property registers once during the device probe
and then use the saved values as they are needed.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Fix debug output of max channel counts

A debug output print statement uses the wrong variable to output the
maximum Rx channel count (cut and paste error, basically). Fix the
statement to use the proper variable.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'smc-next'

Ursula Braun says:

====================
patches 2018-05-23

here are more smc-patches for net-next:

Patch 1 fixes an ioctl problem detected by syzbot.

Patch 2 improves smc_lgr_list locking in case of abnormal link
group termination. If you want to receive a version for the net-tree,
please let me know. It would look somewhat different, since the port
terminate code has been moved to smc_core.c on net-next.

Patch 3 enables SMC to deal with urgent data.

Patch 4 is a minor improvement to avoid out-of-sync linkgroups
between 2 peers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: longer delay when freeing client link groups

Client link group creation always follows the server linkgroup creation.
If peer creates a new server link group, client has to create a new
client link group. If peer reuses a server link group for a new
connection, client has to reuse its client link group as well. To
avoid out-of-sync conditions for link groups a longer delay for
for client link group removal is defined to make sure this link group
still exists, once the peer decides to reuse a server link group.

Currently the client link group delay time is just 10 jiffies larger
than the server link group delay time. This patch increases the delay
difference to 10 seconds to have a better protection against
out-of-sync link groups.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: urgent data support

Add support for out of band data send and receive.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: lock smc_lgr_list in port_terminate()

Currently, smc_port_terminate() is not holding the lock of the lgr list
while it is traversing the list. This patch adds locking to this
function and changes smc_lgr_terminate() accordingly.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: return 0 for ioctl calls in states INIT and CLOSED

A connected SMC-socket contains addresses of descriptors for the
send buffer and the rmb (receive buffer). Fields of these descriptors
are used to determine the answer for certain ioctl requests.
Add extra handling for unconnected SMC socket states without valid
buffer descriptor addresses.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Reported-by: syzbot+e6714328fda813fc670f@syzkaller.appspotmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: do L1 config when module is inserted

trigger an L1 configure operation when a transceiver module
is inserted in order to cause current "sticky" options like
Requested Forward Error Correction to be reapplied.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: change the port capability bits definition

MDI Port Capabilities bit definitions were inconsistent with
regard to the MDI enum values. 2 bits used to define MDI in
the port capabilities are not really separable, it's a 2-bit
field with 4 different values. Change the port capability bit
definitions to be "AUTO" and "STRAIGHT" in order to get them
to line up with the enum's.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-next-for-davem-2018-05-23' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

For this round, we have various things all over the place, notably
* a fix for a race in aggregation, which I want to let
bake for a bit longer before sending to stable
* some new statistics (ACK RSSI, TXQ)
* TXQ configuration
* preparations for HE, particularly radiotap
* replace confusing "country IE" by "country element" since it's
not referring to Ireland

Note that I merged net-next to get a fix from mac80211 that got
there via net, to apply one patch that would otherwise conflict.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qca8k-QCA8334-switch-support'

Michal Vokáč says:

====================
Add support for QCA8334 switch

This series basically adds support for a QCA8334 ethernet switch to the
qca8k driver. It is a four-port variant of the already supported seven
port QCA8337. Register map is the same for the whole familly and all chips
have the same device ID.

Major part of this series enhances the CPU port setting. Currently the CPU
port is not set to any sensible defaults compatible with the xGMII
interface. This series forces the CPU port to its maximum bandwidth and
also allows to adjust the new defaults using fixed-link device tree
sub-node.

Alongside these changes I fixed two checkpatch warnings regarding SPDX and
redundant parentheses.

Changes in v3:
- Rebased on latest net-next/master.
- Corrected fixed-link documentation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: Remove redundant parentheses

Fix warning reported by checkpatch.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: Replace GPL boilerplate by SPDX

Replace the GPLv2 license boilerplate with the SPDX license identifier.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: Allow overwriting CPU port setting

Implement adjust_link function that allows to overwrite default CPU port
setting using fixed-link device tree subnode.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: Force CPU port to its highest bandwidth

By default autonegotiation is enabled to configure MAC on all ports.
For the CPU port autonegotiation can not be used so we need to set
some sensible defaults manually.

This patch forces the default setting of the CPU port to 1000Mbps/full
duplex which is the chip maximum capability.

Also correct size of the bit field used to configure link speed.

Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: Enable RXMAC when bringing up a port

When a port is brought up/down do not enable/disable only the TXMAC
but the RXMAC as well. This is essential for the CPU port to work.

Fixes: 6b93fb46480a ("net-next: dsa: add new driver for qca8xxx family")
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: Add support for QCA8334 switch

Add support for the four-port variant of the Qualcomm QCA833x switch.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: Add QCA8334 binding documentation

Add support for the four-port variant of the Qualcomm QCA833x switch.

The CPU port default link settings can be reconfigured using
a fixed-link sub-node.

Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add new T6 device ids

Add 0x6088 and 0x6089 device ids for new T6 cards.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: uevent filtering

Recent discussions around uevent filtering (cf. net-next commit [1], [2],
and [3] and discussions in [4], [5], and [6]) have shown that the semantics
around uevent filtering where not well understood.
Now that we have settled - at least for the moment - how uevent filtering
should look like let's add some selftests to ensure we don't regress
anything in the future.
Note, the semantics of uevent filtering are described in detail in my
commit message to [2] so I won't repeat them here.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=90d52d4fd82007005125d9a8d2d560a1ca059b9d
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a3498436b3a0f8ec289e6847e1de40b4123e1639
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=26045a7b14bc7a5455e411d820110f66557d6589
[4]: https://lkml.org/lkml/2018/4/4/739
[5]: https://lkml.org/lkml/2018/4/26/767
[6]: https://lkml.org/lkml/2018/4/26/738

Signed-off-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fib-rule-selftest'

Roopa Prabhu says:

====================
fib rule selftest

This series adds a new test to test fib rules.
ip route get is used to test fib rule matches.
This series also extends ip route get to match on
sport and dport to test recent support of sport
and dport fib rule match.

v2 - address ido's commemt to make sport dport
ip route get to work correctly for input route
get. I don't support ip route get on ip-proto match yet.
ip route get creates a udp packet and i have left
it at that. We could extend ip route get to support
a few ip proto matches in followup patches.

v3 - Support ip_proto (only tcp and udp) match in getroute.
dropped printing of new match attrs in ip route get,
because ipv6 does not print it. And ipv6 currrently shares
the dump api with ipv6 notify and its better to not add them
to the notify api. dropped it to keep the api consistent between
ipv4 and ipv6 (though uid is already printed in the ipv4 case).
If we need it, both ipv4 and ipv6 can be enhanced to provide
a separate get api. Moved skb creation for ipv4 to a separate func.

v4 - drop separate skb for netlink and fix concerns around rcu and netlink
     reply (as pointed out by DaveM). I now try to reset the skb after the route
     lookup and before the netlink send (testing shows this is ok. More eyes and
     any feedback here will be helpful)

v5 - dropped RTA_TABLE ipv4_rtm_policy update from this series and posted
     it separately for net (feedback from Eric)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: net: initial fib rule tests

This adds a first set of tests for fib rule match/action for
ipv4 and ipv6. Initial tests only cover action lookup table.
can be extended to cover other actions in the future.
Uses ip route get to validate the rule lookup.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: support sport, dport and ip_proto in RTM_GETROUTE

This is a followup to fib6 rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: support sport, dport and ip_proto in RTM_GETROUTE

This is a followup to fib rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: Add handlers for ethtool get/set msg level

The handlers for ethtool get/set msg level are missing from netvsc.
This patch adds them.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: vxge: fix spelling mistake in macro VXGE_HW_ERR_PRIVILAGED_OPEARATION

Rename VXGE_HW_ERR_PRIVILAGED_OPEARATION to VXGE_HW_ERR_PRIVILEGED_OPERATION
to fix spelling mistake.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'udp-gso-fixes'

Willem de Bruijn says:

====================
udp gso fixes

A few small fixes:
- disallow segmentation with XFRM
- do not leak gso packets into the ingress path

Changes
  v1 -> v2
  - fix build failure in team.c
  - drop scatter-gather fix:
      this is now fixed by commit 113f99c33585 ("net: test tailroom
      before appending to linear skb"). After this patch gso skbs are
      built non-linear regardless of NETIF_F_SG and skb_segment builds
      linear segs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

gso: limit udp gso to egress-only virtual devices

Until the udp receive stack supports large packets (UDP GRO), GSO
packets must not loop from the egress to the ingress path.

Revert the change that added NETIF_F_GSO_UDP_L4 to various virtual
devices through NETIF_F_GSO_ENCAP_ALL as this included devices that
may loop packets, such as veth and macvlan.

Instead add it to specific devices that forward to another device's
egress path, bonding and team.

Fixes: 83aa025f535f ("udp: add gso support to virtual devices")
CC: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: exclude gso from xfrm paths

UDP GSO delays final datagram construction to the GSO layer. This
conflicts with protocol transformations.

Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
CC: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-sfp-small-improvements'

Antoine Tenart says:

====================
net: sfp: small improvements

A small series of patches improving the SFP support by adding a warning
when no Tx disable pin is available, and making the i2c-bus property
mandatory.

Thanks!
Antoine

Since v1:
  - Removed the patch fixing the sfp driver when no i2c bus was described.
  - Made two new patches to make the i2c-bus property mandatory for sfp modules.

Since the phylink series:
  - s/-EOPNOTSUPP/-ENODEV/ in patch 1/2.
  - I added the acked-by tag in patch 2/2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation/bindings: net: the sfp i2c-bus property is now mandatory

The i2c-bus property for sfp modules was made mandatory. Update the
documentation to keep it in sync with the driver's behaviour.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: sfp: make the i2c-bus dt property mandatory

This patch makes the i2c-bus property mandatory when using a device
tree. If the sfp i2c bus isn't described it's impossible to guess the
protocol to use for a given module, and the sfp module would then not
work in most cases.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: sfp: warn the user when no tx_disable pin is available

In case no Tx disable pin is available the SFP modules will always be
emitting. This could be an issue when using modules using laser as their
light source as we would have no way to disable it when the fiber is
removed. This patch adds a warning when registering an SFP cage which do
not have its tx_disable pin wired or available.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp-abm-add-basic-support-for-advanced-buffering-NIC'

Jakub Kicinski says:

====================
nfp: abm: add basic support for advanced buffering NIC

This series lays groundwork for advanced buffer management NIC feature.
It makes necessary NFP core changes, spawns representors and adds devlink
glue. Following series will add the actual buffering configuration (patch
series size limit).

First three patches add support for configuring NFP buffer pools via a
mailbox. The existing devlink APIs are used for the purpose.

Third patch allows us to perform small reads from the NFP memory.

The rest of the patch set adds eswitch mode change support and makes
the driver spawn appropriate representors.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: assign vNIC id as phys_port_name of vNICs which are not ports

When NFP is modelled as a switch we assign phys_port_name to respective
port(representor )s:

vNIC0 - | - PF port (pf%d) MAC/PHY (p%d[s%d]) - |E==

In most cases there is only one vNIC for communication with the switch.
If there is more than one we need to be able to identify them. Use %d
as phys_port_name of the vNICs.

We don't have to pass ID to nfp_net_debugfs_vnic_add() separately any
more.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: use split in naming of PCIe PF ports

PCI PFs can host more than one logical endpoint.  In NFP terms
this means having more than one vNIC for PCIe PF.  The vNICs
are usually corresponding 1:1 to Ethernet ports.  In core NIC
we use the legacy idea of vNIC *being* the Ethernet port,
hence netdevs put pX(sY) in their phys_port_name, like Ethernet
ports would.  When ASIC ports are fully represented we need to
be able to name different PCIe PF ports, too.  Use a scheme
similar to Ethernet ports - pfXsY, for PCIe PF number X,
sub-port Y.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: abm: force Ethternet port up

Current control firmware does not cater too well to multi-host
applications. There is no way to check which hosts are up or
otherwise negotiate what the state of the external port (the
Ethernet port) should be. Make sure the link is up when driver
loads, and don't take it down when Ethernet port netdev is
closed.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: abm: spawn port netdevs

To configure buffering points we need full set of netdevs:

ASIC

user netdev -- | -- PCIe port MAC port -- | --

Configuring egrees qdiscs on user netdev configures standard
Linux TC software qdiscs, configuring PCIe port qdiscs will
provide a way of setting ASIC queuing parameters for PCIe block.
MAC port netdev egress qdiscs correspond to ASIC MAC Traffic
Manager block.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add devlink_eswitch_mode_set callback

Our previous apps all assumed to use only one eswitch mode (legacy
or switchdev) without the ability to change it. ABM NIC will
want to support the switch so plumb devlink_eswitch_mode_set through.
The devlink_eswitch_mode_set is expected to spawn representors and
potentially devlink ports so it's called under big devlink lock and
pf->lock.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: don't take instance lock around eswitch mode set

Changing switch mode may want to register and unregister devlink
ports. Therefore similarly to DEVLINK_CMD_PORT_SPLIT/UNSPLIT it
should not take the instance lock. Drivers don't depend on existing
locking since it's a very recent addition.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add app pointer to port representors

nfp_apps can currently associate their structures with vNICs but
not representors. Add app priv pointer to representors as well.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: abm: create project-specific vNIC structure

ABM NIC requires more complex vNIC handling, allocate
per-vNIC structure. Find out RX queue base and PCI PF id.
There will be multiple PFs sharing the same MAC port, therefore
the MAC address assigned to the vNIC must be looked up in the
HWInfo database.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: abm: add initial active buffer management NIC skeleton

Add a very rudimentary active buffer management NIC support.
For now it's like a core NIC without SR-IOV support. Next
commits will extend its functionality.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: core: allow 4-byte aligned accesses to Memory Units

Current code doesn't enforce length requirements on 32bit accesses
with action NFP_CPP_ACTION_RW to memory units, but if the access
is only aligned to 4 bytes as well we will fall into the explicit
access case and error out. Such accesses are correct, allow them
by lowering the width earlier.

While at it use a switch statement to improve readability.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add shared buffer configuration

Allow app FW to advertise its shared buffer pool information.
Use the per-PF mailbox to configure them from devlink.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add support for per-PCI PF mailbox

When working with devlink-related functionality for locking reasons
it's easier to create a new mailbox per-PCI PF device than try to
use one of the netdev/vNIC mailboxes.

Define new mailbox structure and resolve its symbol during probe.
For forward compatibility allow silent truncation of mailbox command
data.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: move rtsym helpers to pf code

nfp_net_pf_rtsym_read_optional() and nfp_net_pf_map_rtsym() are not
really related to networking code. Move them to the PF code and
remove the net from their names. They will soon be needed by code
outside of nfp_net_main.c anyway.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpfilter'

Alexei Starovoitov says:

====================
bpfilter

v2->v3:
- followed Luis's suggestion and significantly simplied first patch
  with shmem_kernel_file_setup+kernel_write. Added kdoc for new helper
- fixed typos and race to access pipes with mutex
- tested with bpfilter being 'builtin'. CONFIG_BPFILTER_UMH=y|m both work.
  Interesting to see a usermode executable being embedded inside vmlinux.
- it doesn't hurt to enable bpfilter in .config.
  ip_setsockopt commands sent to usermode via pipes and -ENOPROTOOPT is
  returned from userspace, so kernel falls back to original iptables code

v1->v2:
this patch set is almost a full rewrite of the earlier umh modules approach
The v1 of patches and follow up discussion was covered by LWN:
https://lwn.net/Articles/749108/

I believe the v2 addresses all issues brought up by Andy and others.
Mainly there are zero changes to kernel/module.c
Instead of teaching module loading logic to recognize special
umh module, let normal kernel modules execute part of its own
.init.rodata as a new user space process (Andy's idea)
Patch 1 introduces this new helper:
int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
Input:
  data + len == executable file
Output:
  struct umh_info {
       struct file *pipe_to_umh;
       struct file *pipe_from_umh;
       pid_t pid;
  };

Advantages vs v1:
- the embedded user mode executable is stored as .init.rodata inside
  normal kernel module. These pages are freed when .ko finishes loading
- the elf file is copied into tmpfs file. The user mode process is swappable.
- the communication between user mode process and 'parent' kernel module
  is done via two unix pipes, hence protocol is not exposed to
  user space
- impossible to launch umh on its own (that was the main issue of v1)
  and impossible to be man-in-the-middle due to pipes
- bpfilter.ko consists of tiny kernel part that passes the data
  between kernel and umh via pipes and much bigger umh part that
  doing all the work
- 'lsmod' shows bpfilter.ko as usual.
  'rmmod bpfilter' removes kernel module and kills corresponding umh
- signed bpfilter.ko covers the whole image including umh code

Few issues:
- the user can still attach to the process and debug it with
  'gdb /proc/pid/exe pid', but 'gdb -p pid' doesn't work.
  (a bit worse comparing to v1)
- tinyconfig will notice a small increase in .text
  +766 | TEXT | 7c8b94806bec umh: introduce fork_usermode_blob() helper
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: add skeleton of bpfilter kernel module

bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
and user mode helper code that is embedded into bpfilter.ko

The steps to build bpfilter.ko are the following:
- main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
- with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
  is converted into bpfilter_umh.o object file
  with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
  Example:
  $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
  0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
  0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
  0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
- bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko

bpfilter_kern.c is a normal kernel module code that calls
the fork_usermode_blob() helper to execute part of its own data
as a user mode process.

Notice that _binary_net_bpfilter_bpfilter_umh_start - end
is placed into .init.rodata section, so it's freed as soon as __init
function of bpfilter.ko is finished.
As part of __init the bpfilter.ko does first request/reply action
via two unix pipe provided by fork_usermode_blob() helper to
make sure that umh is healthy. If not it will kill it via pid.

Later bpfilter_process_sockopt() will be called from bpfilter hooks
in get/setsockopt() to pass iptable commands into umh via bpfilter.ko

If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
kill umh as well.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

umh: introduce fork_usermode_blob() helper

Introduce helper:
int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
struct umh_info {
       struct file *pipe_to_umh;
       struct file *pipe_from_umh;
       pid_t pid;
};

that GPLed kernel modules (signed or unsigned) can use it to execute part
of its own data as swappable user mode process.

The kernel will do:
- allocate a unique file in tmpfs
- populate that file with [data, data + len] bytes
- user-mode-helper code will do_execve that file and, before the process
  starts, the kernel will create two unix pipes for bidirectional
  communication between kernel module and umh
- close tmpfs file, effectively deleting it
- the fork_usermode_blob will return zero on success and populate
  'struct umh_info' with two unix pipes and the pid of the user process

As the first step in the development of the bpfilter project
the fork_usermode_blob() helper is introduced to allow user mode code
to be invoked from a kernel module. The idea is that user mode code plus
normal kernel module code are built as part of the kernel build
and installed as traditional kernel module into distro specified location,
such that from a distribution point of view, there is
no difference between regular kernel modules and kernel modules + umh code.
Such modules can be signed, modprobed, rmmod, etc. The use of this new helper
by a kernel module doesn't make it any special from kernel and user space
tooling point of view.

Such approach enables kernel to delegate functionality traditionally done
by the kernel modules into the user space processes (either root or !root) and
reduces security attack surface of the new code. The buggy umh code would crash
the user process, but not the kernel. Another advantage is that umh code
of the kernel module can be debugged and tested out of user space
(e.g. opening the possibility to run clang sanitizers, fuzzers or
user space test suites on the umh code).
In case of the bpfilter project such architecture allows complex control plane
to be done in the user space while bpf based data plane stays in the kernel.

Since umh can crash, can be oom-ed by the kernel, killed by the admin,
the kernel module that uses them (like bpfilter) needs to manage life
time of umh on its own via two unix pipes and the pid of umh.

The exit code of such kernel module should kill the umh it started,
so that rmmod of the kernel module will cleanup the corresponding umh.
Just like if the kernel module does kmalloc() it should kfree() it
in the exit code.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

nl80211: Reject disconnect commands except from conn_owner

Reject NL80211_CMD_DISCONNECT, NL80211_CMD_DISASSOCIATE,
NL80211_CMD_DEAUTHENTICATE and NL80211_CMD_ASSOCIATE commands
from clients other than the connection owner set in the connect,
authenticate or associate commands, if it was set.

The main point of this check is to prevent chaos when two processes
try to use nl80211 at the same time, it's not a security measure.
The same thing should possibly be done for JOIN_IBSS/LEAVE_IBSS and
START_AP/STOP_AP.

Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

rfkill: Create rfkill-none LED trigger

Creates a new trigger rfkill-none, as a complement to rfkill-any, which
drives LEDs when any radio is enabled. The new trigger is meant to turn
a LED ON whenever all radios are OFF, and turn it OFF otherwise.

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

rfkill: Rename rfkill_any_led_trigger* functions

Rename these functions to rfkill_global_led_trigger*, as they are going
to be extended to handle another global rfkill led trigger.

This commit does not change any functionality.

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

nl80211: Update ERP info using NL80211_CMD_UPDATE_CONNECT_PARAMS

Use NL80211_CMD_UPDATE_CONNECT_PARAMS to update new ERP information,
Association IEs and the Authentication type to driver / firmware which
will be used in subsequent roamings.

Signed-off-by: Vidyullatha Kanchanapally <vidyullatha@codeaurora.org>
[arend: extended fils-sk kernel doc and added check in wiphy_register()]
Reviewed-by: Jithu Jance <jithu.jance@broadcom.com>
Reviewed-by: Eylon Pedinovsky <eylon.pedinovsky@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

nl80211: add FILS related parameters to ROAM event

In case of FILS shared key offload the parameters can change
upon roaming of which user-space needs to be notified.

Reviewed-by: Jithu Jance <jithu.jance@broadcom.com>
Reviewed-by: Eylon Pedinovsky <eylon.pedinovsky@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cfg80211: use separate struct for FILS parameters

Put FILS related parameters into their own struct definition so
it can be reused for roam events in subsequent change.

Reviewed-by: Jithu Jance <jithu.jance@broadcom.com>
Reviewed-by: Eylon Pedinovsky <eylon.pedinovsky@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

nl80211: Optimize cfg80211_bss_expire invocations

Only invoke cfg80211_bss_expire on the first nl80211_dump_scan
invocation to avoid (likely) redundant processing.

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

mac80211: Support adding duration for prepare_tx() callback

There are specific cases, such as SAE authentication exchange, that
might require long duration to complete. For such cases, add support
for indicating to the driver the required duration of the prepare_tx()
operation, so the driver would still be able to complete the frame
exchange.

Currently, indicate the duration only for SAE authentication exchange,
as SAE authentication can take up to 2000 msec (as defined in IEEE
P802.11-REVmd D1.0 p. 3504).

As the patch modified the prepare_tx() callback API, also modify
the relevant code in iwlwifi.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Merge remote-tracking branch 'net-next/master' into mac80211-next

Bring in net-next which had pulled in net, so I have the changes
from mac80211 and can apply a patch that would otherwise conflict.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Merge branch 'qed-firmware-TLV'

Sudarsana Reddy Kalluru says:

====================
qed*: Add support for management firmware TLV request.

Management firmware (MFW) requires config and state information from
the driver. It queries this via TLV (type-length-value) request wherein
mfw specificies the list of required TLVs. Driver fills the TLV data
and responds back to MFW.
This patch series adds qed/qede/qedf/qedi driver implementation for
supporting the TLV queries from MFW.

Changes from previous versions:
-------------------------------
v2: Split patch (2) into multiple simpler patches.
v2: Update qed_tlv_parsed_buf->p_val datatype to void pointer to avoid
bunch of unnecessary typecasts.

Please consider applying this series to "net-next".
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qedi: Add get_generic_tlv_data handler.

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qedi: Add support for populating ethernet TLVs.

This patch adds callbacks for providing the ethernet protocol driver TLVs.

Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qedf: Add get_generic_tlv_data handler.

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qedf: Add support for populating ethernet TLVs.

This patch adds callbacks for providing the ethernet protocol driver TLVs.

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: Add support for populating ethernet TLVs.

This patch adds callbacks for providing the ethernet protocol driver TLVs.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Add driver infrastucture for handling mfw requests.

MFW requests the TLVs in interrupt context. Extracting of the required
data from upper layers and populating of the TLVs require process context.
The patch adds work-queues for processing the tlv requests. It also adds
the implementation for requesting the tlv values from appropriate protocol
driver.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Add support for processing iscsi tlv request.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Add support for processing fcoe tlv request.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Add support for tlv request processing.

The patch adds driver support for processing TLV requests/repsonses
from the mfw and upper driver layers respectively. The implementation
reads the requested TLVs from the shared memory, requests the values
from upper layer drivers, populates this info (TLVs) shared memory and
notifies MFW about the TLV values.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Add MFW interfaces for TLV request support.

The patch adds required management firmware (MFW) interfaces such as
mailbox commands, TLV types etc.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-05-22

This series contains updates to i40e only.

Jake provides all the changes in this series starting with making it
consistent in how we approach the bit lock.  Fixed the reporting of the
VEB statistics and the queue statistics to always return every queue
even if it is not currently in use.  Use WARN_ONCE() so that the first
time we end up with an incorrect size we will dump a stack trace and a
message to help highlight the issue early in testing.  Folded the fixed
string prefix into the stat string definition.  Instead of using a
separate char *p pointer when copying strings, use the data pointer
directly.  Added code comments for several of the statistic functions to
better explain the number and ordering of statistics.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-ECN-quickack'

Eric Dumazet says:

====================
tcp: reduce quickack pressure for ECN

Small patch series changing TCP behavior vs quickack and ECN

First patch is a refactoring, adding parameter to tcp_incr_quickack()
and tcp_enter_quickack_mode() helpers.

Second patch implements the change, lowering number of ACK packets
sent after an ECN event.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: do not aggressively quick ack after ECN events

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode

We want to add finer control of the number of ACK packets sent after
ECN events.

This patch is not changing current behavior, it only enables following
change.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: don't disable bh when accessing action idr

Initial net_device implementation used ingress_lock spinlock to synchronize
ingress path of device. This lock was used in both process and bh context.
In some code paths action map lock was obtained while holding ingress_lock.
Commit e1e992e52faa ("[NET_SCHED] protect action config/dump from irqs")
modified actions to always disable bh, while using action map lock, in
order to prevent deadlock on ingress_lock in softirq. This lock was removed
from net_device, so disabling bh, while accessing action map, is no longer
necessary.

Replace all action idr spinlock usage with regular calls that do not
disable bh.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-ipv6-Fix-route-append-and-replace-use-cases'

David Ahern says:

====================
net/ipv6: Fix route append and replace use cases

This patch set fixes a few append and replace uses cases for IPv6 and
adds test cases that codifies the expectations of how append and replace
are expected to work. In paricular it allows a multipath route to have
a dev-only nexthop, something Thomas tried to accomplish with commit
edd7ceb78296 ("ipv6: Allow non-gateway ECMP for IPv6") which had to be
reverted because of breakage, and to replace an existing FIB entry
with a reject route.

There are a number of inconsistent and surprising aspects to the Linux
API for adding, deleting, replacing and changing FIB entries. For example,
with IPv4 NLM_F_APPEND means insert the route after any existing entries
with the same key (prefix + priority + TOS for IPv4) and NLM_F_CREATE
without the append flag inserts the new route before any existing entries.

IPv6 on the other hand attempts to guess whether a new route should be
appended to an existing one, possibly creating a multipath route, or to
add a new entry after any existing ones. This applies to both the 'append'
(NLM_F_CREATE + NLM_F_APPEND) and 'prepend' (NLM_F_CREATE only) cases
meaning for IPv6 the NLM_F_APPEND is basically ignored. This guessing
whether the route should be added to a multipath route (gateway routes)
or inserted after existing entries (non-gateway based routes) means a
multipath route can not have a dev only nexthop (potentially required in
some cases - tunnels or VRF route leaking for example) and route 'replace'
is a bit adhoc treating gateway based routes and dev-only / reject routes
differently.

This has led to frustration with developers working on routing suites
such as FRR where workarounds such as delete and add are used instead of
replace.

After this patch set there are 2 differences between IPv4 and IPv6:
1. 'ip ro prepend' = NLM_F_CREATE only
    IPv4 adds the new route before any existing ones
    IPv6 adds new route after any existing ones

2. 'ip ro append' = NLM_F_CREATE|NLM_F_APPEND
   IPv4 adds the new route after any existing ones
   IPv6 adds the nexthop to existing routes converting to multipath

For the former, there are cases where we want same prefix routes added
after existing ones (e.g., multicast, prefix routes for macvlan when used
for virtual router redundancy). Requiring the APPEND flag to add a new
route to an existing one helps here but is a slight change in behavior
since prepend with gateway routes now create a separate entry.

For the latter IPv6 behavior is preferred - appending a route for the same
prefix and metric to make a multipath route, so really IPv4 not allowing an
existing route to be updated is the limiter. This will be fixed when
nexthops become separate objects - a future patch set.

Thank you to Thomas and Ido for testing earlier versions of this set, and
to Ido for providing an update to the mlxsw driver.

Changes since RFC
- cleanup wording in test script; add comments about expected failures
  and why
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: fib_tests: Add ipv4 route add append replace tests

Add IPv4 route tests covering add, append and replace permutations.
Assumes the ability to add a basic single path route works; this is
required for example when adding an address to an interface.

$ fib_tests.sh -t ipv4_rt

IPv4 route add / append tests
    TEST: Attempt to add duplicate route - gw                           [ OK ]
    TEST: Attempt to add duplicate route - dev only                     [ OK ]
    TEST: Attempt to add duplicate route - reject route                 [ OK ]
    TEST: Add new nexthop for existing prefix                           [ OK ]
    TEST: Append nexthop to existing route - gw                         [ OK ]
    TEST: Append nexthop to existing route - dev only                   [ OK ]
    TEST: Append nexthop to existing route - reject route               [ OK ]
    TEST: Append nexthop to existing reject route - gw                  [ OK ]
    TEST: Append nexthop to existing reject route - dev only            [ OK ]
    TEST: add multipath route                                           [ OK ]
    TEST: Attempt to add duplicate multipath route                      [ OK ]
    TEST: Route add with different metrics                              [ OK ]
    TEST: Route delete with metric                                      [ OK ]

IPv4 route replace tests
    TEST: Single path with single path                                  [ OK ]
    TEST: Single path with multipath                                    [ OK ]
    TEST: Single path with reject route                                 [ OK ]
    TEST: Single path with single path via multipath attribute          [ OK ]
    TEST: Invalid nexthop                                               [ OK ]
    TEST: Single path - replace of non-existent route                   [ OK ]
    TEST: Multipath with multipath                                      [ OK ]
    TEST: Multipath with single path                                    [ OK ]
    TEST: Multipath with single path via multipath attribute            [ OK ]
    TEST: Multipath with reject route                                   [ OK ]
    TEST: Multipath - invalid first nexthop                             [ OK ]
    TEST: Multipath - invalid second nexthop                            [ OK ]
    TEST: Multipath - replace of non-existent route                     [ OK ]

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: fib_tests: Add ipv6 route add append replace tests

Add IPv6 route tests covering add, append and replace permutations.
Assumes the ability to add a basic single path route works; this is
required for example when adding an address to an interface.

$ fib_tests.sh -t ipv6_rt

IPv6 route add / append tests
    TEST: Attempt to add duplicate route - gw                           [ OK ]
    TEST: Attempt to add duplicate route - dev only                     [ OK ]
    TEST: Attempt to add duplicate route - reject route                 [ OK ]
    TEST: Add new route for existing prefix (w/o NLM_F_EXCL)            [ OK ]
    TEST: Append nexthop to existing route - gw                         [ OK ]
    TEST: Append nexthop to existing route - dev only                   [ OK ]
    TEST: Append nexthop to existing route - reject route               [ OK ]
    TEST: Append nexthop to existing reject route - gw                  [ OK ]
    TEST: Append nexthop to existing reject route - dev only            [ OK ]
    TEST: Add multipath route                                           [ OK ]
    TEST: Attempt to add duplicate multipath route                      [ OK ]
    TEST: Route add with different metrics                              [ OK ]
    TEST: Route delete with metric                                      [ OK ]

IPv6 route replace tests
    TEST: Single path with single path                                  [ OK ]
    TEST: Single path with multipath                                    [ OK ]
    TEST: Single path with reject route                                 [ OK ]
    TEST: Single path with single path via multipath attribute          [ OK ]
    TEST: Invalid nexthop                                               [ OK ]
    TEST: Single path - replace of non-existent route                   [ OK ]
    TEST: Multipath with multipath                                      [ OK ]
    TEST: Multipath with single path                                    [ OK ]
    TEST: Multipath with single path via multipath attribute            [ OK ]
    TEST: Multipath with reject route                                   [ OK ]
    TEST: Multipath - invalid first nexthop                             [ OK ]
    TEST: Multipath - invalid second nexthop                            [ OK ]
    TEST: Multipath - replace of non-existent route                     [ OK ]

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: fib_tests: Add option to pause after each test

Add option to pause after each test before cleanup is done. Allows
user to do manual inspection or more ad-hoc testing after each test
with the setup in tact.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: fib_tests: Add command line options

Add command line options for controlling pause on fail, controlling
specific tests to run and verbose mode rather than relying on environment
variables.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: fib_tests: Add success-fail counts

As more tests are added, it is convenient to have a tally at the end.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Simplify route replace and appending into multipath route

Bring consistency to ipv6 route replace and append semantics.

Remove rt6_qualify_for_ecmp which is just guess work. It fails in 2 cases:
1. can not replace a route with a reject route. Existing code appends
   a new route instead of replacing the existing one.

2. can not have a multipath route where a leg uses a dev only nexthop

Existing use cases affected by this change:
1. adding a route with existing prefix and metric using NLM_F_CREATE
   without NLM_F_APPEND or NLM_F_EXCL (ie., what iproute2 calls
   'prepend'). Existing code auto-determines that the new nexthop can
   be appended to an existing route to create a multipath route. This
   change breaks that by requiring the APPEND flag for the new route
   to be added to an existing one. Instead the prepend just adds another
   route entry.

2. route replace. Existing code replaces first matching multipath route
   if new route is multipath capable and fallback to first matching
   non-ECMP route (reject or dev only route) in case one isn't available.
   New behavior replaces first matching route. (Thanks to Ido for spotting
   this one)

Note: Newer iproute2 is needed to display multipath routes with a dev-only
      nexthop. This is due to a bug in iproute2 and parsing nexthops.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Add support for route append

Handle append for gateway based routes. Dev-only multipath routes will
be handled by a follow on patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: use the more traditional 'i' loop variable

Since we no longer use i as an array index for the data variable,
replace the use of 'j' with 'i' so that we match the general loop
variable name.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: add function doc headers for ethtool stats functions

Add documentation for the i40e_get_stats_count, i40e_get_stat_strings
and i40e_get_ethtool_stats explaining that the number and ordering of
statistics must remain constant for a given netdevice.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: update data pointer directly when copying to the buffer

A future patch is going to add a helper function i40e_add_ethtool_stats
that will help lower the amount of boiler plate code in the
i40e_get_ethtool_stats function.

This conversion will take place over many patches, and the helper
function will work by directly updating a reference to the data pointer.

Since this would not work combined with the current method of accessing
data like an array, update all the code that copies stats into the data
buffer to use direct updates to the pointer instead of array accesses.

This will prevent incorrect stat updates for patches in between the
conversion.

Similarly, when copying strings, we used a separate char *p pointer.
Instead, use the data pointer directly as it's already a (u8 *) type
which is the same size.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fold prefix strings directly into stat names

We always prefix these stats with a fixed string, so just fold this
prefix into the stat string definition. This preparatory work will make
it easier to implement a helper function to copy stats and strings into
the supplied buffers in a future patch.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: use WARN_ONCE to replace the commented BUG_ON size check

We don't really want to use BUG_ON here since that would completely
crash the kernel, thus the reason we commented it out. We *can't* use
BUILD_BUG_ON because at least now (a) the sizes aren't constant (we are
fixing this) and (b) not all compilers are smart enough to understand
that "p - data" is a constant.

Instead, just use a WARN_ONCE so that the first time we end up with an
incorrect size we will dump a stack trace and a message, hopefully
highlighting the issues early in testing.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: split i40e_get_strings() into smaller functions

Split the statistic strings and private flags strings into their own
separate functions to aid code readability.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: always return all queue stat strings

The ethtool API for obtaining device statistics is not intended to allow
runtime changes in the number of statistics reported. It may *appear*
this way, as there is an ability to request the number of stats using
ethtool_get_set_count(). However, it is expected that this must always
return the same value for invocations of the same device.

If we don't satisfy this contract, and allow the number of stats to
change during run time, we could cause invalid memory accesses or report
the stat strings incorrectly. This is because the API for obtaining
stats is to (1) get the size, (2) get the strings and finally (3) get
the stats. Since these are each separate ethtool op commands, it is not
possible to maintain consistency by holding the RTNL lock over the whole
operation. This results in the potential for a race condition to occur
where the size changed between any of the 3 calls.

Avoid this issue by requiring that we always return the same value for
a given device. We can check any values which remain constant for the
life of the device, but must not report different sizes depending on
runtime attributes.

This patch specifically fixes the queue statistics to always return
every queue even if it's not currently in use.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>