git.openwrt.org Git - openwrt/staging/blogic.git/log

nfp: lock area cache earlier

We shouldn't access area_cache_list without its lock even
to check if it's empty.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: document expected locking in the core

Document which fields of nfp_cpp are protected by which locks.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: move mutex code out of nfp_cppcore.c

After mutex cache removal we can put the mutex code in a separate
source file. This makes it clear it doesn't play with internals
of struct nfp_cpp any more.

No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: remove cpp mutex cache

CPP mutex cache was introduced to work around the fact that the
same host could successfully acquire a lock multiple times. It
used to collapse multiple users to the same struct nfp_cpp_mutex
and track use count. Unfortunately it's racy. Since we now force
all nfp_mutex_lock() callers within the host to actually succeed
at acquiring the lock we no longer need the cache, let's remove it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: fail graciously when someone tries to grab global lock

The global device lock is acquired to search the resource table.
The lock is actually itself part of the table (entry 0).
Therefore if someone asks for resource 0 we would deadlock since
double locking is no longer allowed.

Currently the driver doesn't try to lock that resource so let's
simply make sure we fail graciously and not add special handling
of this case until really need. Hide the relevant defines in
the source file.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: disallow sharing mutexes on the same machine

NFP can be connected to multiple machines via PCI or other buses.
Access to hardware resources is arbitrated using locks residing
in device memory. Currently nfpcore only respects the mutexes
when it comes to inter-host locking, but if we try to acquire
the same lock again, on one host - it will simply return success
because owner of the lock is already set to that host.

This makes the locks useless for arbitration within one host
and unfair because whichever host grabbed the lock will have
a chance to reacquire it without others getting a shot.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dwc-xlgmac: fix an error code in xlgmac_alloc_pages()

The dma_mapping_error() returns true if there is an error but we want
to return -ENOMEM and not 1.

Fixes: 65e0ace2c5cd ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jie Deng <jiedeng@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: Add dump all for netconf

Use the rtnl_dump_all to dump all netconf handlers that have been
registered. Allows userspace to send a dump request for PF_UNSPEC
and get all families.

Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phy-mmd-cleanup'

Russell King says:

====================
Clean up PHY MMD accessors

This series cleans up phylib's MMD accessors, so that we have a common
way of accessing the Clause 45 register set.

The current situation is far from ideal - we have phy_(read|write)_mmd()
which accesses Clause 45 registers over Clause 45 accesses, and we have
phy_(read|write)_mmd_indirect(), which accesses Clause 45 registers via
Clause 22 register 13/14.

Generic code uses the indirect methods to access standard Clause 45
features, and when we come to add Clause 45 PHY support to phylib, we
would need to make these conditional upon the PHY type, or duplicate
these functions.

An alternative solution is to merge these accessors together, and select
the appropriate access method depending upon the 802.3 clause that the
PHY conforms with.  The result is that we have a single set of
phy_(read|write)_mmd() accessors.

For cases which require special handling, we still allow PHY drivers to
override all MMD accesses - except rather than just overriding the
indirect accesses.  This keeps existing overrides working.

Combining the two also has another beneficial side effect - we get rid
of similar functions that take arguments in different orders.  The
old direct accessors took the phy structure, devad and register number,
whereas the indirect accessors took the phy structure, register number
and devad in that order.  Care must be taken when updating future
drivers that the argument order is correct, and the function name is
not merely replaced.

This patch set is against net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: clean up mmd_phy_indirect()

Make mmd_phy_indirect() use the same terminology as the rest of the
code, making clear what each address is - phy address, devad, and
register number.

While here, remove the "inline" from this static function, leaving
it to the compiler to decide whether to inline this function, and
get rid of unnecessary parens.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: remove the indirect MMD read/write methods

Remove the indirect MMD read/write methods which are now no longer
necessary.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: convert micrel to new read_mmd/write_mmd driver methods

Convert micrel to the new read_mmd/write_mmd driver methods. This
Clause 22 PHY does not support any MMD access method.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: switch remaining users to phy_(read|write)_mmd()

Switch everyone over to using phy_read_mmd() and phy_write_mmd() now
that they are able to handle both Clause 22 indirect addressing and
Clause 45 direct addressing methods to the MMD registers.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: lan78xx: update for phy_(read|write)_mmd_indirect() removal

lan78xx appears to use phylib in a rather weird way, accessing the PHY
partly through phylib, and partly by making direct accesses to it,
including to the Clause 45 registers. As the indirect MMD accessors are
going away, update this driver to use the plain phy_(read|write)_mmd()
accessors instead.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: make phy_(read|write)_mmd() generic MMD accessors

Make phy_(read|write)_mmd() generic 802.3 clause 45 register accessors
for both Clause 22 and Clause 45 PHYs, using either the direct register
reading for Clause 45, or the indirect method for Clause 22 PHYs.
Allow this behaviour to be overriden by PHY drivers where necessary.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: move phy MMD accessors to phy-core.c

Move the phy_(read|write)__mmd() helpers out of line, they will become
our main MMD accessor functions, and so will be a little more complex.
This complexity doesn't belong in an inline function. Also move the
_indirect variants as well to keep like functionality together.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Use AVB mode by default

Prior to the recent multi-queue changes the driver would configure the
queues to use the AVB mode, but the mode then got switched to DCB. The
hardware still works fine in DCB mode, but my testing capabilities are
limited, so it's safer to revert to the prior setting anyway.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-By: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Restore DT backwards-compatibility

Recent changes to support multiple queues in the device tree bindings
resulted in the number of RX and TX queues to be initialized to zero for
device trees not adhering to the new bindings.

Restore backwards-compatibility with those device trees by falling back
to a single RX and TX queues each.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-By: Joao Pinto <jpinto@synopsys.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Always enable MAC RX queues

The MAC RX queues always need to be enabled in order to receive network
packets. Remove the condition that this only needs to be done for multi-
queue configurations.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: convert sk_filter.refcnt from atomic_t to refcount_t

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: greth: Utilize of_get_mac_address()

Do not open code getting the MAC address exclusively from the
"local-mac-address" property, but instead use of_get_mac_address() which
looks up the MAC address using the 3 typical property names.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: fix Coverity scan errors

Fix Coverity scan errors by not dereferencing lio->glists_dma_base pointer
if it's NULL.

See http://marc.info/?l=linux-netdev&m=149002294305614&w=2

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: VSR Burru <veerasenareddy.burru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: tcp: Permit user set TCP_MAXSEG to default value

When user_mss is zero, it means use the default value. But the current
codes don't permit user set TCP_MAXSEG to the default value.
It would return the -EINVAL when val is zero.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ovs-sample-action-optimization'

Andy Zhou says:

====================
net-next sample action optimization v4

The sample action can be used for translating Openflow 'clone' action.
However its implementation has not been sufficiently optimized for this
use case. This series attempts to close the gap.

Patch 3 commit message has more details on the specific optimizations
implemented.

---
v3->v4: Enhance patch 4.
        Fix two bugs pointed out by Pravin,
        Remove 'is_sample' variable.

v2->v3: Enhance patch 4, Rafctor to move more common logic to clone_execute().

v1->v2: Address Pravin's comment, Refactor recirc and sample
        to share more common code
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Openvswitch: Refactor sample and recirc actions implementation

Added clone_execute() that both the sample and the recirc
action implementation can use.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Optimize sample action for the clone use cases

With the introduction of open flow 'clone' action, the OVS user space
can now translate the 'clone' action into kernel datapath 'sample'
action, with 100% probability, to ensure that the clone semantics,
which is that the packet seen by the clone action is the same as the
packet seen by the action after clone, is faithfully carried out
in the datapath.

While the sample action in the datpath has the matching semantics,
its implementation is only optimized for its original use.
Specifically, there are two limitation: First, there is a 3 level of
nesting restriction, enforced at the flow downloading time. This
limit turns out to be too restrictive for the 'clone' use case.
Second, the implementation avoid recursive call only if the sample
action list has a single userspace action.

The main optimization implemented in this series removes the static
nesting limit check, instead, implement the run time recursion limit
check, and recursion avoidance similar to that of the 'recirc' action.
This optimization solve both #1 and #2 issues above.

One related optimization attempts to avoid copying flow key as
long as the actions enclosed does not change the flow key. The
detection is performed only once at the flow downloading time.

Another related optimization is to rewrite the action list
at flow downloading time in order to save the fast path from parsing
the sample action list in its original form repeatedly.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Refactor recirc key allocation.

The logic of allocating and copy key for each 'exec_actions_level'
was specific to execute_recirc(). However, future patches will reuse
as well. Refactor the logic into its own function clone_key().

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Deferred fifo API change.

add_deferred_actions() API currently requires actions to be passed in
as a fully encoded netlink message. So far both 'sample' and 'recirc'
actions happens to carry actions as fully encoded netlink messages.
However, this requirement is more restrictive than necessary, future
patch will need to pass in action lists that are not fully encoded
by themselves.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vrf-perf'

David Ahern says:

====================
net: vrf: performance improvements

Device based features for VRF such as qdisc, netfilter and packet
captures are implemented by switching the dst on skbuffs to its per-VRF
dst. This has the effect of controlling the output function which points
a function in the VRF driver. [1] The skb proceeds down the stack with
dst->dev pointing to the VRF device. Netfilter, qdisc and tc rules and
network taps are evaluated based on this device. Finally, the skb makes
it to the vrf_xmit function which resets the dst based on a FIB lookup.

The feature comes at cost - between 5 and 10% depending on test (TCP vs
UDP, stream vs RR and IPv4 vs IPv6). The main cost is requiring a FIB
lookup in the VRF driver for each packet sent through it. The FIB lookup
is required because the real dst gets dropped so that the skb can
traverse the stack with dst->dev set to the VRF device.

All of that is really driven by the qdisc and not replicating the
processing of __dev_queue_xmit if a qdisc is set up on the device. But,
VRF devices by default do not have a qdisc and really have no need for
multiple Tx queues. This means the performance overhead is inflicted upon
all users for the potential use case of a qdisc being configured.

The overhead can be avoided by checking if the default configuration
applies to a specific VRF device before switching the dst. If a device
does not have a qdisc, the pass through netfilter hooks and packet taps
can be done inline without dropping the dst and thus avoiding the
performance penalty. With this change performance overhead of VRF drops
to neglible (difference with run-over-run variance) to 3% depending on
test type.

netperf performance comparison for 3 cases:
1. L3_MASTER_DEVICE compiled out
2. VRF with this patch set
3. current VRF code

IPv4
----
           no-l3mdev     new-vrf     old-vrf
TCP_RR       28778        28938*       27169
TCP_CRR      10706        10490         9770
UDP_RR       30750        29813        29256

* Although higher in the final run used for submitting this patch set, I
  think what this really represents is a neglible performance overhead for
  VRF with this change (i.e, within the +-1% variance of runs). Most
  notably the FIB lookups in the Tx path are avoided for TCP_RR.

IPv6
----
           no-l3mdev     new-vrf     old-vrf
TCP_RR       29495        29432       27794
TCP_CRR      10520        10338        9870
UDP_RR       26137        27019*      26511

* UDP is consistently better with VRF for two reasons:
  1. Source address selection with L3 domains is considering fewer
     addresses since only addresses on interfaces in the domain are
     considered for the selection. Specifically, perf-top shows
     shows ipv6_get_saddr_eval, ipv6_dev_get_saddr and __ipv6_dev_get_saddr
     running much lower with vrf than without.

  2. The VRF table contains all routes (i.e, there are no separate local
     and main tables per VRF). That means ip6_pol_route_output only has 1
     lookup for VRF where it does 2 without it (1 in the local table and 1
     in the main table).

[1] http://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: vrf: performance improvements for IPv6

The VRF driver allows users to implement device based features for an
entire domain. For example, a qdisc or netfilter rules can be attached
to a VRF device or tcpdump can be used to view packets for all devices
in the L3 domain.

The device-based features come with a performance penalty, most
notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
to switch the dst on an skb to its private dst. This allows the skb
to traverse the xmit stack with the device set to the VRF device
which in turn enables the netfilter and qdisc features. The VRF
driver then performs the FIB lookup again and reinserts the packet.

This patch avoids the redirect for IPv6 packets if a qdisc has not
been attached to a VRF device which is the default config. In this
case the netfilter hooks and network taps are directly traversed in
the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
then the redirect using the vrf dst is done.

Additional overhead is removed by only checking packet taps if a
socket is open on the device (vrf_dev->ptype_all list is not empty).
Packet sockets bound to any device will still get a copy of the
packet via the real ingress or egress interface.

The end result of this change is a decrease in the overhead of VRF
for the default, baseline case (ie., no netfilter rules, no packet
sockets, no qdisc) from a +3% improvement for UDP which has a lookup
per packet (VRF being better than no l3mdev) to ~2% loss for TCP_CRR
which connects a socket for each request-response.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: vrf: performance improvements for IPv4

The VRF driver allows users to implement device based features for an
entire domain. For example, a qdisc or netfilter rules can be attached
to a VRF device or tcpdump can be used to view packets for all devices
in the L3 domain.

The device-based features come with a performance penalty, most
notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
to switch the dst on an skb to its private dst. This allows the skb
to traverse the xmit stack with the device set to the VRF device
which in turn enables the netfilter and qdisc features. The VRF
driver then performs the FIB lookup again and reinserts the packet.

This patch avoids the redirect for IPv4 packets if a qdisc has not
been attached to a VRF device which is the default config. In this
case the netfilter hooks and network taps are directly traversed in
the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
then the redirect using the vrf dst is done.

Additional overhead is removed by only checking packet taps if a
socket is open on the device (vrf_dev->ptype_all list is not empty).
Packet sockets bound to any device will still get a copy of the
packet via the real ingress or egress interface.

The end result of this change is a decrease in the overhead of VRF
for the default, baseline case (ie., no netfilter rules, no packet
sockets, no qdisc) to ~3% for UDP which has a lookup per packet and
< 1% overhead for connected sockets that leverage early demux and
avoid FIB lookups.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: introduce SO_MEMINFO getsockopt

Allows reading of SK_MEMINFO_VARS via socket option. This way an
application can get all meminfo related information in single socket
option call instead of multiple calls.

Adds helper function, sk_get_meminfo(), and uses that for both
getsockopt and sock_diag_put_meminfo().

Suggested by Eric Dumazet.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Jason Baron <jbaron@akamai.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: fix swapped order of arguments packets and bytes

The arguments packets and bytes to call mlxsw_sp_acl_rule_get_stats are
in the wrong order. Fix this by swapping them.

Detected by CoverityScan, CID#1419705 ("Arguments in wrong order")

Fixes: 7c1b8eb175b69add8ea ("mlxsw: spectrum: Add support for TC flower offload statistics")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Update IngPad and IngPack values

We are using the smallest padding boundary (8 bytes), which isn't
smaller than the Memory Controller Read/Write Size

We get best performance in 100G when the Packing Boundary is a multiple
of the Maximum Payload Size. Its related to inefficient chopping of DMA
packets by PCIe, that causes more overhead on bus. So driver is helping
by making the starting address alignment to be MPS size.

We will try to determine PCIE MaxPayloadSize capabiltiy and set
IngPackBoundary based on this value. If cache line size is greater than
MPS or determinig MPS fails, we will use cache line size to determine
IngPackBoundary(as before).

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dwc-xlgmac: add module license

When building the driver as a module, we get a warning about the
lack of a license:

WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/synopsys/dwc-xlgmac.o
see include/linux/module.h for more information

Curiously the text in the .c files only mentions GPLv2+, while the license
tag in the PCI driver contains both GPL and BSD. I picked the license text
as the more definite reference here and put a GPL tag in there.

Fixes: 65e0ace2c5cd ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dwc-xlgmac: include dcbnl.h

Without this header, we can run into a build error:

drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c: In function 'xlgmac_config_queue_mapping':
drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c:1548:36: error: 'IEEE_8021QAZ_MAX_TCS' undeclared (first use in this function)
prio_queues = min_t(unsigned int, IEEE_8021QAZ_MAX_TCS,

Fixes: 65e0ace2c5cd ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jie Deng <jiedeng@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

neighbour: fix nlmsg_pid in notifications

neigh notifications today carry pid 0 for nlmsg_pid
in all cases. This patch fixes it to carry calling process
pid when available. Applications (eg. quagga) rely on
nlmsg_pid to ignore notifications generated by their own
netlink operations. This patch follows the routing subsystem
which already sets this correctly.

Reported-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2017-03-21

This series contains updates to e1000, e1000e, igb, igbvf and ixgb.

This finishes up the work Philippe Reynes did to update the Intel drivers
to the new API for ethtool (get|set)_link_ksettings.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qed-IOV-cleanups'

Yuval Mintz says:

====================
qed: IOV related clenaups

This patch series targets IOV functionality [on both PF and VF].

Patches #2, #3 and #5 fix flows relating to malicious VFs, either by
upgrading and aligning current safe-guards or by correcing racy flows.

Patches #1 and #8 make some malicious/dysnfunctional VFs logging appear
by default in logs.

The rest of the patches either cleanup the existing code or else correct
some possible [yet fairly insignicant] issues in VF behavior.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Always publish VF link from leading hwfn

The link information exists only on the leading hwfn,
but some of its derivatives [e.g., min/max rate] need to
be configured for each hwfn.
When re-basing the VF link view, use the leading hwfn
information as basis for all existing hwfns to allow
said configurations to stick.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Raise verbosity of Malicious VF indications

Malicious VF existance should be interesting enough for the
hyperuser. Change the PF indication that one of its child VF
became malicious to appear by default.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Make qed_iov_mark_vf_flr() return bool

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Deprecate VF multiple queue-stop

The PF<->VF interface allows for the VF to request
multiple queues closure via a single message, but this has
never been used by any official driver.

We now deprecate this option, forcing each queue close
to arrive via a different command; This would be required
for future TLVs that are going to extend the queue TLVs with
additional information on a per-queue basis.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Uniform IOV queue validation

PF needs to validate the status of VF queues before asking firmware
to configure anything for them, but that validation is done in various
different forms - sometimes inadequate.

Add auxillary functions that can be used for testing of the queue
state and convert the various flows to use those instead of current
existing flows; Also, add missing validations where needed.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Correct default VF coalescing configuration

When starting the VF's vport, the PF would first configure
the status blocks of the VF and then reset them.
That would cause some of the configured information to be lost -
specifically it would mean that all the VFs queues would use
the Rx coalescing state-machine of the status block.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Set HW-channel to ready before ACKing VF

When PF responds to the VF requests it also cleans the HW-channel
indication in firmware to allow further VF messages to arrive,
but the order currently applied is wrong -
The PF is copying by DMAE the response the VF is polling on for
completion, and only afterwards sets the HW-channel to ready state.

This creates a race condition where the VF would be able to send
an additional message to the PF before the channel would get ready
again, causing the firmware to consider the VF as malicious.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Clean VF malicious indication when disabling IOV

When a VF is considered malicious, driver handling of the VF
FLR flow would clean said indication - but not if the FLR is
part of an sriov-disable flow.
That leads to further issues, as PF wouldn't re-enable the
previously malicious VF when sriov is re-enabled.

No reason for that - simply clean malicious indications in
the sriov-disable flow as well.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Increase verbosity of VF -> PF errors

VFs are currently logging errors when communicating
with their PFs in a too-low verbosity that wouldn't
be shown by default. As timeouts and failed commands
are crucial for VF operability, make them appear by
default.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rhashtable: Add rhashtable_lookup_get_insert_fast

Add rhashtable_lookup_get_insert_fast for fixed keys, similar to
rhashtable_lookup_get_insert_key for explicit keys.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: fix for vf mac addr command sent to nic firmware

Change to support host<->firmware command return value.
Fix for vf mac addr state command.
1. Added support for firmware commands to return a value:
   - previously, the returned code overlapped with host codes, thus
     commands were only returning 0 (success) or -1 (interpreted as
     timeout)
   - per 'response_manager.h', the error codes are split into two fields
     (major/minor) now, firmware commands are grouped into their own
     'major' group, separate from the host's 'major' group, which allow f/w
     commands to return any 16-bit value
2. The command to set vf mac addr was logging a success message even if
   command failed.  Now command uses a callback function to log the status
   message.
3. The command to set vf mac addr was not logging a message when set via
   the host 'ip' command.  Now, the callback function will log an
   appropriate message.

Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: pegasus: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Petko Manolov <petkan@nucleusys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ibmvnic-init'

John Allen says:

====================
ibmvnic: Initialization fixes and improvements

These patches resolve issues with the ibmvnic initialization process.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Correct ibmvnic handling of device open/close

When closing the ibmvnic device we need to release the resources used
in communicating to the virtual I/O server. These need to be
re-negotiated with the server at open time.

This patch moves the releasing of resources a separate routine
and updates the open and close handlers to release all resources at
close and re-negotiate and allocate these resources at open.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Move ibmvnic adapter intialization to its own routine

The intialization of the ibmvnic driver with respect to the virtual
server it connects to should be moved to its own routine. This will
alolow the driver to initiate this process from places outside of
the drivers probe routine.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Move login to its own routine

Move the code that handles login and renegotiation of ibmvnic
capabilities to its own routine.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Move login and queue negotiation into ibmvnic_open

VNIC server expects LINK_STATE_UP to be sent within 30s of the login. If we
exceed the timeout, VNIC server will attempt to fail over. Since time
between probe and open of the device is indeterminate, move login and queue
negotiation into ibmvnic open so we can guarantee that login and sending
LINK_STATE_UP occur within the 30s window.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sun: sungem: rix a possible null dereference

The function gem_begin_auto_negotiation dereference
the pointer ep before testing if it's null. This
patch add a check on ep before dereferencing it.

Fixes: 92552fdda557 ("net: sun: sungem: use new api
ethtool_{get|set}_link_ksettings")

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: add debug error messages to report command timeout

Add timeout error message in lio_process_ordered_list(). Add host failure
status in existing error message in if_cfg_callback().

Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: remove duplicate code

Remove code duplicated in PF and VF; define that code once only in a common
header file included by PF and VF.

Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'stmmac-mq-part3'

Joao Pinto says:

====================
net: stmmac: adding multiple buffers and routing

As agreed with David Miller, this patch-set is the third and last to enable
multiple queues in stmmac.

This third one focuses on:

a) Enable multiple buffering to the driver and queue independent data
b) Configuration of RX and TX queues' priority
c) Configuration of RX queues' routing
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: RX queue routing configuration

This patch adds the configuration of RX queues' routing.

Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: TX and RX queue priority configuration

This patch adds the configuration of RX and TX queues' priority.

Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: enable multiple buffers

This patch creates 2 new structures (stmmac_tx_queue and stmmac_rx_queue)
in include/linux/stmmac.h, enabling that each RX and TX queue has its
own buffers and data.

Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethoc: Use ether_addr_copy()

Use ether_addr_copy() instead of memcpy() to set netdev->dev_addr (which
is 2-byte aligned).

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-cleanups'

Jiri Pirko says:

====================
mlxsw: small driver update

Contains two cleanup patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Align the matchall default case returned value

Align the default case for matchall offload with what's there
for flower.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Cosmetic naming change

Currently the struct representing router interface "mlxsw_sp_rif"
is reffered as "r" in various places in the driver. Furthermore it
contains a member which specify the index which is called "rif".
This patch change "r" to "rif" and "rif" to "rif_index".

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: check hw version first

Check hw version first in probe(). Do nothing if the driver doesn't
support the chip.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'usbnet-ksettings'

Philippe Reynes says:

====================
net: usbnet: move to new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated. On usbnet, it
was often implemented with usbnet_{get|set}_settings.

In this series, I add usbnet_{get|set}_link_ksettings
in the first patch, then I update all the driver to
use this new api, and in the last patch I remove the
old api usbnet_{get|set}_settings.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: usb: remove old api ethtool_{get|set}_settings

The function usbnet_{get|set}_settings is no longer used,
so we remove it.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: asix: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: sr9700: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: smsc75xx: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: sierra_net: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: mcs7830: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: poma <poma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: dm9601: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: cdc_ncm: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: sr9800: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: smsc95xx: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: usbnet: add new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We add the new api {get|set}_link_ksettings to this driver.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgb: use new API ethtool_{get|set}_link_ksettings

The ethtool API {get|set}_settings is deprecated.
We move this driver to new API {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igbvf: use new API ethtool_{get|set}_link_ksettings

The ethtool API {get|set}_settings is deprecated.
We move this driver to new API {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: use new API ethtool_{get|set}_link_ksettings

The ethtool API {get|set}_settings is deprecated.
We move this driver to new API {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: use new API ethtool_{get|set}_link_ksettings

The ethtool API {get|set}_settings is deprecated.
We move this driver to new API {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net: bcmgenet: Track per TX/RX rings statistics

__bcmgenet_tx_reclaim() is currently summing TX bytes/packets in a way
that is not SMP friendly, mutliples CPUs could run
__bcmgenet_tx_reclaim() independently and still update stats->tx_bytes
and stats->tx_packets, cloberring the other CPUs statistics.

Fix this by tracking per RX and TX rings the number of bytes, packets,
dropped and errors statistics, and provide a bcmgenet_get_stats()
function which aggregates everything and returns a consistent output.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: use new API ethtool_{get|set}_link_ksettings

The ethtool API {get|set}_settings is deprecated.
We move this driver to new API {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net: ipv4: add support for ECMP hash policy choice

This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
0 - layer 3 (default)
1 - layer 4
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated (currently only for L4).
In L3 mode we always calculate the hash due to the ICMP error special
case, the flow dissector's field consistentification should handle the
address order thus we can remove the address reversals.
If the skb is provided we always use it for the hash calculation,
otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/8021q: create device with all possible features in wanted_features

wanted_features is a set of features which have to be enabled if a
hardware allows that.

Currently when a vlan device is created, its wanted_features is set to
current features of its base device.

The problem is that the base device can get new features and they are
not propagated to vlan-s of this device.

If we look at bonding devices, they doesn't have this problem and this
patch suggests to fix this issue by the same way how it works for bonding
devices.

We meet this problem, when we try to create a vlan device over a bonding
device. When a system are booting, real devices require time to be
initialized, so bonding devices created without slaves, then vlan
devices are created and only then ethernet devices are added to the
bonding device. As a result we have vlan devices with disabled
scatter-gather.

* create a bonding device
  $ ip link add bond0 type bond
  $ ethtool -k bond0 | grep scatter
  scatter-gather: off
tx-scatter-gather: off [requested on]
tx-scatter-gather-fraglist: off [requested on]

* create a vlan device
  $ ip link add link bond0 name bond0.10 type vlan id 10
  $ ethtool -k bond0.10 | grep scatter
  scatter-gather: off
tx-scatter-gather: off
tx-scatter-gather-fraglist: off

* Add a slave device to bond0
  $ ip link set dev eth0 master bond0

And now we can see that the bond0 device has got the scatter-gather
feature, but the bond0.10 hasn't got it.
[root@laptop linux-task-diag]# ethtool -k bond0 | grep scatter
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: on
[root@laptop linux-task-diag]# ethtool -k bond0.10 | grep scatter
scatter-gather: off
tx-scatter-gather: off
tx-scatter-gather-fraglist: off

With this patch the vlan device will get all new features from the
bonding device.

Here is a call trace how features which are set in this patch reach
dev->wanted_features.

register_netdevice
   vlan_dev_init
...
dev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG |
       NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE |
       NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC |
       NETIF_F_ALL_FCOE;

dev->features |= dev->hw_features;
...
    dev->wanted_features = dev->features & dev->hw_features;
    __netdev_update_features(dev);
        vlan_dev_fix_features
   ...

Cc: Alexey Kuznetsov <kuznet@virtuozzo.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Cleanup some warning from timestamping code.

Following checkpatch.pl recommendations (which include
replacing with <linux/io.h> the <asm/io.h>, since linux/io.h includes
it).

Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Enable tx timestamping on loopback and dummy

This enables developing code that uses SOF_TIMESTAMPING_TX_SOFTWARE
by using localhost addresses (without needing to send packets outside),
as well as enabling unit and functional testing of TX timestamping code
without needing hardware support or network access.

It also fulfills the expectation of software network devices supporting
software-based timestamping.

Tested on qemu using txtimestamping.c from the kernel selftests, and
ethtool -T.

Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-03-20

This series contains updates to i40e and i40evf only.

Philippe Reynes updates i40e and i40evf to use the new ethtool API for
{get|set}_link_ksettings.

Jake provides the remaining patches in the series, starting with a fix
for i40e where the firmware expected the port numbers for the offloaded
UDP tunnels in Little Endian format and we were sending them in Big Endian
format which put the wrong port number to be put in the UDP tunnel list.
Changed the driver to use __be32 values instead of arrays for
(src|dst)_ip.  Refactored the exit flow of i40e_add_fdir_ethtool() which
removes the dependency on having a non-zero return value.  Fixed a memory
leak by running kfree() and returning immediately when we fail to add
flow director filter.  Fixed a potential issue where could update the
filter count without actually succeeding in adding a filter, by moving
the ATR exit check to after we have sent the TCP/IPv4 filter to the ring
successfully.  Ensures that the fd_tcp_rule count is reset to 0, before
we reprogram the filters so that we do not end up with a stale count
which does not correctly reflect the number of programmed filters.  Added
a check whether we have TCP/IPv4 filters before re-enabling ATR after
flushing and replaying FDIR filters.  Added counters for each filter
type in preparation for adding code to properly check the mask value.
Fixed potential issues by explicitly checking the flow type at the
start of i40e_add_fdir_ethtool().  To avoid possible memory leaks,
we now unconditionally delete the old filter, even if it is identical to
the new filter and ensures will always update the filters as expected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your
net-next tree. A couple of new features for nf_tables, and unsorted
cleanups and incremental updates for the Netfilter tree. More
specifically, they are:

1) Allow to check for TCP option presence via nft_exthdr, patch
   from Phil Sutter.

2) Add symmetric hash support to nft_hash, from Laura Garcia Liebana.

3) Use pr_cont() in ebt_log, from Joe Perches.

4) Remove some dead code in arp_tables reported via static analysis
   tool, from Colin Ian King.

5) Consolidate nf_tables expression validation, from Liping Zhang.

6) Consolidate set lookup via nft_set_lookup().

7) Remove unnecessary rcu read lock side in bridge netfilter, from
   Florian Westphal.

8) Remove unused variable in nf_reject_ipv4, from Tahee Yoo.

9) Pass nft_ctx struct to object initialization indirections, from
   Florian Westphal.

10) Add code to integrate conntrack helper into nf_tables, also from
    Florian.

11) Allow to check if interface index or name exists via
    NFTA_FIB_F_PRESENT, from Phil Sutter.

12) Simplify resolve_normal_ct(), from Florian.

13) Use per-limit spinlock in nft_limit and xt_limit, from Liping Zhang.

14) Use rwlock in nft_set_rbtree set, also from Liping Zhang.

15) One patch to remove a useless printk at netns init path in ipvs,
    and several patches to document IPVS knobs.

16) Use refcount_t for reference counter in the Netfilter/IPVS code,
    from Elena Reshetova.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2017-03-17

This series contains updates to mainly igb, with one fix for ixgbe.

Alex does all the changes in the series, starting with adding support
for DMA_ATTR_WEAK_ORDERING to improve performance on some platforms.
Modified igb to use the length of the packet instead of the DD status
bit to determine if a new descriptor is ready to be processed.  Modified
the driver to only go through the region in the receive ring that was
designated to be cleaned up, instead of going through the entire ring
on cleanup.  Cleaned up the transmit side, by clearing the transmit
buffer_info only when resetting the rings.  Added a new upper limit for
receive, which is based on the size of a 2K buffer minus padding, which
will allow us to support build_skb going forward.  Fixed ethtool testing
to only sync on the size of the frame that is being tested, instead of
the entire receive buffer.  Updated the handling of page addresses to
always use a void pointer with the consistent name of "va" to indicate
that we are working with a virtual address.  Added a "chicken bit" so
that we can turn off the new receive allocation feature, in the case
where we need to fallback to the legacy receive path.  Added support for
using 3K buffers in order 1 pages the same way we were using 2K buffers
in 4K pages.  Added support for padding packet, since we limit the size
of the frame, we are able to write to an offset within the buffer instead
of having to write at the very start of the buffer.  This allows us to
leaving padding room for things like supporting XDP in the future.
Refactored the receive buffer page management, since there are 2-3 paths
that can be taken depending on what receive modes are enabled, so to
improve maintainability, break out the common bits into their own
functions.  Add support for build_skb, again.  Lastly, fixed a typo in
igb and ixgbe code comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: always remove old filter when adding new FDir filter

The previous code relied on i40e_match_fdir_input_set to determine when
determining whether to free the old filter. Change this code so that we
simply unconditionally delete the old filter, even if it's identical to
the new filter. This ensures that we don't leak any memory, and that we
always update the filters as expected.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: explicitly fail on extended MAC field for ethtool_rx_flow_spec

Although we will fail the filter later due to checking flow_type which
will have a bogus invalid type, it is possible future refactoring will
remove this hidden failure case. Avoid a possible issue in the future by
explicitly checking the flow type at the start.

Change-Id: Ia98eb26f7b93ccbe38c7141e8f203ef496fc6598
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: add counters for UDP/IPv4 and IPv4 filters

In preparation for adding code to properly check the mask values, we
will need to know the number of active filters for each type. Add
counters for each filter type. Rename the already existing fd_tcp_rule
to fd_tcp4_filter_cnt to match the style of other names. To avoid style
warnings, avoid assigning multiple parameters at once, and fix up one
other case where we did so previously.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: don't re-enable ATR when flushing filters if SB has TCP4/IPv4 rules

When flushing and replaying FDIR filters, it is possible we would
disable ATR, and then re-enable it even though we should have kept
it disabled due to existing TCP/IPv4 filters. Fix this by checking
whether we have TCP4/IPv4 filters before re-enabling.

Alternatively, we could instead restore ATR and then replay filters,
however, this would cause us to rapidly enable and then disable ATR in
some cases.

Change-ID: I076e4cc1e4409bce7f98f3c213295433a4ff43d8
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Avinash Dayanand <avinash.dayanand@intel.com>
Reviewed-by: Alan Brady <alan.brady@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: reset fd_tcp_rule count when restoring filters

Since we're about to reprogram the filters, we need to ensure that the
fd_tcp_rule count is correctly reset to 0. Otherwise, we will keep
a stale count that does not accurately reflect the number of programmed
TCPv4 filters.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: remove redundant check for fd_tcp_rule when restoring filters

i40e_fdir_filter_restore re-adds all existing filters, which already
checks when adding a TCPv4 filter to disable ATR. We don't need to make
the check twice, so remove this redundant code.

Change-ID: Ia0b0690e23523915199d601494557def135c9d7f
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: exit ATR mode only when adding TCP/IPv4 filter succeeds

Move ATR exit check after we have sent the TCP/IPv4 filter to the ring
successfully. This avoids an issue where we potentially update the
filter count without actually succeeding in adding the filter. Now, we
only increment the fd_tcp_rule after we've succeeded. Additionally, we
will re-enable ATR mode only after deletion of the filter is actually
posted to the FDIR ring.

Change-ID: If5c1dea422081cc5e2de65618b01b4c3bf6bd586
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>