openwrt/staging/blogic.git
5 years agonet: dsa: Deal with non-existing PHY/fixed-link
Florian Fainelli [Mon, 10 Jun 2019 19:31:49 +0000 (12:31 -0700)]
net: dsa: Deal with non-existing PHY/fixed-link

We need to specifically deal with phylink_of_phy_connect() returning
-ENODEV, because this can happen when a CPU/DSA port does connect
neither to a PHY, nor has a fixed-link property. This is a valid use
case that is permitted by the binding and indicates to the switch:
auto-configure port with maximum capabilities.

Fixes: 0e27921816ad ("net: dsa: Use PHYLINK for the CPU/DSA ports")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: lock mutex in port_fdb_dump
Vivien Didelot [Wed, 12 Jun 2019 16:42:47 +0000 (12:42 -0400)]
net: dsa: mv88e6xxx: lock mutex in port_fdb_dump

During a port FDB dump operation, the mutex protecting the concurrent
access to the switch registers is currently held by the internal
mv88e6xxx_port_db_dump and mv88e6xxx_port_db_dump_fid helpers.

It must be held at the higher level in mv88e6xxx_port_fdb_dump which
is called directly by DSA through ds->ops->port_fdb_dump. Fix this.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodt-bindings: net: wiznet: add w5x00 support
Nicolas Saenz Julienne [Wed, 12 Jun 2019 12:25:27 +0000 (14:25 +0200)]
dt-bindings: net: wiznet: add w5x00 support

Add bindings for Wiznet's w5x00 series of SPI interfaced Ethernet chips.

Based on the bindings for microchip,enc28j60.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: wiznet: w5X00 add device tree support
Nicolas Saenz Julienne [Wed, 12 Jun 2019 12:25:25 +0000 (14:25 +0200)]
net: ethernet: wiznet: w5X00 add device tree support

The w5X00 chip provides an SPI to Ethernet inteface. This patch allows
platform devices to be defined through the device tree.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: ingress: set 'unlocked' flag for Qdisc ops
Vlad Buslov [Wed, 12 Jun 2019 07:14:35 +0000 (10:14 +0300)]
net: sched: ingress: set 'unlocked' flag for Qdisc ops

To remove rtnl lock dependency in tc filter update API when using ingress
Qdisc, set QDISC_CLASS_OPS_DOIT_UNLOCKED flag in ingress Qdisc_class_ops.

Ingress Qdisc ops don't require any modifications to be used without rtnl
lock on tc filter update path. Ingress implementation never changes its
q->block and only releases it when Qdisc is being destroyed. This means it
is enough for RTM_{NEWTFILTER|DELTFILTER|GETTFILTER} message handlers to
hold ingress Qdisc reference while using it without relying on rtnl lock
protection. Unlocked Qdisc ops support is already implemented in filter
update path by unlocked cls API patch set.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'tls-add-support-for-kernel-driven-resync-and-nfp-RX-offload'
David S. Miller [Tue, 11 Jun 2019 19:22:27 +0000 (12:22 -0700)]
Merge branch 'tls-add-support-for-kernel-driven-resync-and-nfp-RX-offload'

Jakub Kicinski says:

====================
tls: add support for kernel-driven resync and nfp RX offload

This series adds TLS RX offload for NFP and completes the offload
by providing resync strategies.  When TLS data stream looses segments
or experiences reorder NIC can no longer perform in line offload.
Resyncs provide information about placement of records in the
stream so that offload can resume.

Existing TLS resync mechanisms are not a great fit for the NFP.
In particular the TX resync is hard to implement for packet-centric
NICs.  This patchset adds an ability to perform TX resync in a way
similar to the way initial sync is done - by calling down to the
driver when new record is created after driver indicated sync had
been lost.

Similarly on the RX side, we try to wait for a gap in the stream
and send record information for the next record.  This works very
well for RPC workloads which are the primary focus at this time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: tls: make use of kernel-driven TX resync
Jakub Kicinski [Tue, 11 Jun 2019 04:40:10 +0000 (21:40 -0700)]
nfp: tls: make use of kernel-driven TX resync

When TCP stream gets out of sync (driver stops receiving skbs
with expected TCP sequence numbers) request a TX resync from
the kernel.

We try to distinguish retransmissions from missed transmissions
by comparing the sequence number to expected - if it's further
than the expected one - we probably missed packets.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: add kernel-driven resync mechanism for TX
Jakub Kicinski [Tue, 11 Jun 2019 04:40:09 +0000 (21:40 -0700)]
net/tls: add kernel-driven resync mechanism for TX

TLS offload drivers keep track of TCP seq numbers to make sure
the packets are fed into the HW in order.

When packets get dropped on the way through the stack, the driver
will get out of sync and have to use fallback encryption, but unless
TCP seq number is resynced it will never match the packets correctly
(or even worse - use incorrect record sequence number after TCP seq
wraps).

Existing drivers (mlx5) feed the entire record on every out-of-order
event, allowing FW/HW to always be in sync.

This patch adds an alternative, more akin to the RX resync.  When
driver sees a frame which is past its expected sequence number the
stream must have gotten out of order (if the sequence number is
smaller than expected its likely a retransmission which doesn't
require resync).  Driver will ask the stack to perform TX sync
before it submits the next full record, and fall back to software
crypto until stack has performed the sync.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: generalize the resync callback
Jakub Kicinski [Tue, 11 Jun 2019 04:40:08 +0000 (21:40 -0700)]
net/tls: generalize the resync callback

Currently only RX direction is ever resynced, however, TX may
also get out of sequence if packets get dropped on the way to
the driver.  Rename the resync callback and add a direction
parameter.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: tls: enable TLS RX offload
Jakub Kicinski [Tue, 11 Jun 2019 04:40:07 +0000 (21:40 -0700)]
nfp: tls: enable TLS RX offload

Set ethtool TLS RX feature based on NIC capabilities, and enable
TLS RX when connections are added for decryption.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: tls: implement RX TLS resync
Dirk van der Merwe [Tue, 11 Jun 2019 04:40:06 +0000 (21:40 -0700)]
nfp: tls: implement RX TLS resync

Enable kernel-controlled RX resync and propagate TLS connection
RX resync from kernel TLS to firmware.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: add async version of mailbox communication
Jakub Kicinski [Tue, 11 Jun 2019 04:40:05 +0000 (21:40 -0700)]
nfp: add async version of mailbox communication

Some control messages must be sent from atomic context.  The mailbox
takes sleeping locks and uses a waitqueue so add a "posted" version
of communication.

Trylock the semaphore and if that's successful kick of the device
communication.  The device communication will be completed from
a workqueue, which will also release the semaphore.

If locks are taken queue the message and return.  Schedule a
different workqueue to take the semaphore and run the communication.
Note that the there are currently no atomic users which would actually
need the return value, so all replies to posted messages are just
freed.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: rename nfp_ccm_mbox_alloc()
Jakub Kicinski [Tue, 11 Jun 2019 04:40:04 +0000 (21:40 -0700)]
nfp: rename nfp_ccm_mbox_alloc()

We need the name nfp_ccm_mbox_alloc() for allocating the mailbox
communication channel itself.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonfp: tls: set skb decrypted flag
Dirk van der Merwe [Tue, 11 Jun 2019 04:40:03 +0000 (21:40 -0700)]
nfp: tls: set skb decrypted flag

Firmware indicates when a packet has been decrypted by reusing the
currently unused BPF flag.  Transfer this information into the skb
and provide a statistic of all decrypted segments.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: add kernel-driven TLS RX resync
Jakub Kicinski [Tue, 11 Jun 2019 04:40:02 +0000 (21:40 -0700)]
net/tls: add kernel-driven TLS RX resync

TLS offload device may lose sync with the TCP stream if packets
arrive out of order.  Drivers can currently request a resync at
a specific TCP sequence number.  When a record is found starting
at that sequence number kernel will inform the device of the
corresponding record number.

This requires the device to constantly scan the stream for a
known pattern (constant bytes of the header) after sync is lost.

This patch adds an alternative approach which is entirely under
the control of the kernel.  Kernel tracks records it had to fully
decrypt, even though TLS socket is in TLS_HW mode.  If multiple
records did not have any decrypted parts - it's a pretty strong
indication that the device is out of sync.

We choose the min number of fully encrypted records to be 2,
which should hopefully be more than will get retransmitted at
a time.

After kernel decides the device is out of sync it schedules a
resync request.  If the TCP socket is empty the resync gets
performed immediately.  If socket is not empty we leave the
record parser to resync when next record comes.

Before resync in message parser we peek at the TCP socket and
don't attempt the sync if the socket already has some of the
next record queued.

On resync failure (encrypted data continues to flow in) we
retry with exponential backoff, up to once every 128 records
(with a 16k record thats at most once every 2M of data).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: rename handle_device_resync()
Jakub Kicinski [Tue, 11 Jun 2019 04:40:01 +0000 (21:40 -0700)]
net/tls: rename handle_device_resync()

handle_device_resync() doesn't describe the function very well.
The function checks if resync should be issued upon parsing of
a new record.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: pass record number as a byte array
Jakub Kicinski [Tue, 11 Jun 2019 04:40:00 +0000 (21:40 -0700)]
net/tls: pass record number as a byte array

TLS offload code casts record number to a u64.  The buffer
should be aligned to 8 bytes, but its actually a __be64, and
the rest of the TLS code treats it as big int.  Make the
offload callbacks take a byte array, drivers can make the
choice to do the ugly cast if they want to.

Prepare for copying the record number onto the stack by
defining a constant for max size of the byte array.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: simplify seq calculation in handle_device_resync()
Jakub Kicinski [Tue, 11 Jun 2019 04:39:59 +0000 (21:39 -0700)]
net/tls: simplify seq calculation in handle_device_resync()

We subtract "TLS_HEADER_SIZE - 1" from req_seq, then if they
match we add the same constant to seq.  Just add it to seq,
and we don't have to touch req_seq.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agopacket: remove unused variable 'status' in __packet_lookup_frame_in_block
Mao Wenan [Tue, 11 Jun 2019 01:32:13 +0000 (09:32 +0800)]
packet: remove unused variable 'status' in __packet_lookup_frame_in_block

The variable 'status' in  __packet_lookup_frame_in_block() is never used since
introduction in commit f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer
implementation."), we can remove it.

Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: openvswitch: remove unnecessary ASSERT_OVSL in ovs_vport_del()
Taehee Yoo [Sun, 9 Jun 2019 17:19:06 +0000 (02:19 +0900)]
net: openvswitch: remove unnecessary ASSERT_OVSL in ovs_vport_del()

ASSERT_OVSL() in ovs_vport_del() is unnecessary because
ovs_vport_del() is only called by ovs_dp_detach_port() and
ovs_dp_detach_port() calls ASSERT_OVSL() too.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: netlink: make netlink_walk_start() void return type
Taehee Yoo [Sun, 9 Jun 2019 17:05:30 +0000 (02:05 +0900)]
net: netlink: make netlink_walk_start() void return type

netlink_walk_start() needed to return an error code because of
rhashtable_walk_init(). but that was converted to rhashtable_walk_enter()
and it is a void type function. so now netlink_walk_start() doesn't need
any return value.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: pmtu: Introduce list_flush_ipv6_exception test case
Stefano Brivio [Thu, 6 Jun 2019 20:15:09 +0000 (22:15 +0200)]
selftests: pmtu: Introduce list_flush_ipv6_exception test case

This test checks that route exceptions can be successfully listed and
flushed using ip -6 route {list,flush} cache.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-Enable-nexthop-objects-with-IPv4-and-IPv6-routes'
David S. Miller [Mon, 10 Jun 2019 17:44:57 +0000 (10:44 -0700)]
Merge branch 'net-Enable-nexthop-objects-with-IPv4-and-IPv6-routes'

David Ahern says:

====================
net: Enable nexthop objects with IPv4 and IPv6 routes

This is the final set of the initial nexthop object work. When I
started this idea almost 2 years ago, it took 18 seconds to inject
700k+ IPv4 routes with 1 hop and about 28 seconds for 4-paths. Some
of that time was due to inefficiencies in 'ip', but most of it was
kernel side with excessive synchronize_rcu calls in ipv4, and redundant
processing validating a nexthop spec (device, gateway, encap). Worse,
the time increased dramatically as the number of legs in the routes
increased; for example, taking over 72 seconds for 16-path routes.

After this set, with increased dirty memory limits (fib_sync_mem sysctl),
an improved ip and nexthop objects a full internet fib (743,799 routes
based on a pull in January 2019) can be pushed to the kernel in 4.3
seconds. Even better, the time to insert is "almost" constant with
increasing number of paths. The 'almost constant' time is due to
expanding the nexthop definitions when generating notifications. A
follow on patch will be sent adding a sysctl that allows an admin to
avoid the nexthop expansion and truly get constant route insert time
regardless of the number of paths in a route! (Useful once all programs
used for a deployment that care about routes understand nexthop objects).

To be clear, 'ip' is used for benchmarking for no other reason than
'ip -batch' is a trivial to use for the tests. FRR, for example, better
manages nexthops and route changes and the way those are pushed to the
kernel and thus will have less userspace processing times than 'ip -batch'.

Patches 1-10 iterate over fib6_nh with a nexthop invoke a processing
function per fib6_nh. Prior to nexthop objects, a fib6_info referenced
a single fib6_nh. Multipath routes were added as separate fib6_info for
each leg of the route and linked as siblings:

    f6i -> sibling -> sibling ... -> sibling
     |                                   |
     +--------- multipath route ---------+

With nexthop objects a single fib6_info references an external
nexthop which may have a series of fib6_nh:

     f6i ---> nexthop ---> fib6_nh
                           ...
                           fib6_nh

making IPv6 routes similar to IPv4. The side effect is that a single
fib6_info now indirectly references a series of fib6_nh so the code
needs to walk each entry and call the local, per-fib6_nh processing
function.

Patches 11 and 13 wire up use of nexthops with fib entries for IPv4
and IPv6. With these commits you can actually use nexthops with routes.

Patch 12 is an optimization for IPv4 when using nexthops in the most
predominant use case (no metrics).

Patches 14 handles replace of a nexthop config.

Patches 15-18 add update pmtu and redirect tests to use both old and
new routing.

Patches 19 and 20 add new tests for the nexthop infrastructure. The first
is single nexthop is used by multiple prefixes to communicate with remote
hosts. This is on top of the functional tests already committed. The
second verifies multipath selection.

v4
- changed return to 'goto out' in patch 9 since the rcu_read_lock is
  held (noticed by Wei)

v3
- removed found arg in patch 7 and changed rt6_nh_remove_exception_rt
  to return 1 when a match is found for an exception

v2
- changed ++i to i++ in patches 1 and 14 as noticed by DaveM
- improved commit message for patch 14 (nexthop replace)
- removed the skip_fib argument to remove_nexthop; vestige of an
  older design
====================

Reviewed-By: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add version of router_multipath.sh using nexthop objects
David Ahern [Sat, 8 Jun 2019 21:53:41 +0000 (14:53 -0700)]
selftests: Add version of router_multipath.sh using nexthop objects

Add a version of router_multipath.sh that uses nexthop objects for
routes.

Ido requested a version that does not cause regressions with mlxsw
testing since it does not support nexthop objects yet.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: Add test with multiple prefixes using single nexthop
David Ahern [Sat, 8 Jun 2019 21:53:40 +0000 (14:53 -0700)]
selftests: Add test with multiple prefixes using single nexthop

Add tests where multiple FIB entries use the same nexthop object. Generate
per-cpu cached routes for each by running ping on each cpu, and then
generate exceptions unique to each prefix (remote host) with different
mtus.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: icmp_redirect: Add support for routing via nexthop objects
David Ahern [Sat, 8 Jun 2019 21:53:39 +0000 (14:53 -0700)]
selftests: icmp_redirect: Add support for routing via nexthop objects

Add a second pass to icmp_redirect.sh to use nexthop objects for
routes.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: pmtu: Add support for routing via nexthop objects
David Ahern [Sat, 8 Jun 2019 21:53:38 +0000 (14:53 -0700)]
selftests: pmtu: Add support for routing via nexthop objects

Add routing setup using nexthop objects and repeat tests with
old and new routing.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: pmtu: Move route installs to a new function
David Ahern [Sat, 8 Jun 2019 21:53:37 +0000 (14:53 -0700)]
selftests: pmtu: Move route installs to a new function

Move the route add commands to a new function called setup_routing_old.
The '_old' refers to the classic way of installing routes.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: pmtu: Move running of test into a new function
David Ahern [Sat, 8 Jun 2019 21:53:36 +0000 (14:53 -0700)]
selftests: pmtu: Move running of test into a new function

Move the block of code that runs a test and prints the verdict to a
new function, run_test.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonexthops: add support for replace
David Ahern [Sat, 8 Jun 2019 21:53:35 +0000 (14:53 -0700)]
nexthops: add support for replace

Add support for atomically upating a nexthop config.

When updating a nexthop, walk the lists of associated fib entries and
verify the new config is valid. Replace is done by swapping nh_info
for single nexthops - new config is applied to old nexthop struct, and
old config is moved to new nexthop struct. For nexthop groups the same
applies but for nh_group. In addition for groups the nh_parent reference
needs to be updated. The old config is released by calling __remove_nexthop
on the 'new' nexthop which now has the old config. This is done to avoid
messing around with the list_heads that track which fib entries are
using the nexthop.

After the swap of config data, bump the sequence counters for FIB entries
to invalidate any dst entries and send notifications to userspace. The
notifications include the new nexthop spec as well as any fib entries
using the updated nexthop struct.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Allow routes to use nexthop objects
David Ahern [Sat, 8 Jun 2019 21:53:34 +0000 (14:53 -0700)]
ipv6: Allow routes to use nexthop objects

Add support for RTA_NH_ID attribute to allow a user to specify a
nexthop id to use with a route. fc_nh_id is added to fib6_config to
hold the value passed in the RTA_NH_ID attribute. If a nexthop id
is given, the gateway, device, encap and multipath attributes can
not be set.

Update ip6_route_del to check metric and protocol before nexthop
specs. If fc_nh_id is set, then it must match the id in the route
entry. Since IPv6 allows delete of a cached entry (an exception),
add ip6_del_cached_rt_nh to cycle through all of the fib6_nh in
a fib entry if it is using a nexthop.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: Optimization for fib_info lookup with nexthops
David Ahern [Sat, 8 Jun 2019 21:53:33 +0000 (14:53 -0700)]
ipv4: Optimization for fib_info lookup with nexthops

Be optimistic about re-using a fib_info when nexthop id is given and
the route does not use metrics. Avoids a memory allocation which in
most cases is expected to be freed anyways.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: Allow routes to use nexthop objects
David Ahern [Sat, 8 Jun 2019 21:53:32 +0000 (14:53 -0700)]
ipv4: Allow routes to use nexthop objects

Add support for RTA_NH_ID attribute to allow a user to specify a
nexthop id to use with a route. fc_nh_id is added to fib_config to
hold the value passed in the RTA_NH_ID attribute. If a nexthop id
is given, the gateway, device, encap and multipath attributes can
not be set.

Update fib_nh_match to check ids on a route delete.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Handle all fib6_nh in a nexthop in mtu updates
David Ahern [Sat, 8 Jun 2019 21:53:31 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in mtu updates

Use nexthop_for_each_fib6_nh to call fib6_nh_mtu_change for each
fib6_nh in a nexthop for rt6_mtu_change_route. For __ip6_rt_update_pmtu,
we need to find the nexthop that correlates to the device and gateway
in the rt6_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Handle all fib6_nh in a nexthop in rt6_do_redirect
David Ahern [Sat, 8 Jun 2019 21:53:30 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in rt6_do_redirect

Use nexthop_for_each_fib6_nh and fib6_nh_find_match to find the
fib6_nh in a nexthop that correlates to the device and gateway
in the rt6_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Handle all fib6_nh in a nexthop in __ip6_route_redirect
David Ahern [Sat, 8 Jun 2019 21:53:29 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in __ip6_route_redirect

Add a hook in __ip6_route_redirect to handle a nexthop struct in a
fib6_info. Use nexthop_for_each_fib6_nh and fib6_nh_redirect_match
to call ip6_redirect_nh_match for each fib6_nh looking for a match.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Handle all fib6_nh in a nexthop in exception handling
David Ahern [Sat, 8 Jun 2019 21:53:28 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in exception handling

Add a hook in rt6_flush_exceptions, rt6_remove_exception_rt,
rt6_update_exception_stamp_rt, and rt6_age_exceptions to handle
nexthop struct in a fib6_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Handle all fib6_nh in a nexthop in fib6_info_uses_dev
David Ahern [Sat, 8 Jun 2019 21:53:27 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in fib6_info_uses_dev

Add a hook in fib6_info_uses_dev to handle nexthop struct in a fib6_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Handle all fib6_nh in a nexthop in rt6_nlmsg_size
David Ahern [Sat, 8 Jun 2019 21:53:26 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in rt6_nlmsg_size

Add a hook in rt6_nlmsg_size to handle nexthop struct in a fib6_info.
rt6_nh_nlmsg_size is used to sum the space needed for all nexthops in
the fib entry.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Handle all fib6_nh in a nexthop in __find_rr_leaf
David Ahern [Sat, 8 Jun 2019 21:53:25 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in __find_rr_leaf

Add a hook in __find_rr_leaf to handle nexthop struct in a fib6_info.
nexthop_for_each_fib6_nh is used to walk each fib6_nh in a nexthop and
call find_match. On a match, use the fib6_nh saved in the callback arg
to setup fib6_result.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Handle all fib6_nh in a nexthop in rt6_device_match
David Ahern [Sat, 8 Jun 2019 21:53:24 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in rt6_device_match

Add a hook in rt6_device_match to handle nexthop struct in a fib6_info.
The new rt6_nh_dev_match uses nexthop_for_each_fib6_nh to walk each
fib6_nh in a nexthop and call __rt6_device_match. On match,
rt6_nh_dev_match returns the fib6_nh and rt6_device_match uses it to
setup fib6_result.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: Handle all fib6_nh in a nexthop in fib6_drop_pcpu_from
David Ahern [Sat, 8 Jun 2019 21:53:23 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in fib6_drop_pcpu_from

Use nexthop_for_each_fib6_nh to walk all fib6_nh in a nexthop when
dropping 'from' reference in pcpu routes.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonexthops: Add ipv6 helper to walk all fib6_nh in a nexthop struct
David Ahern [Sat, 8 Jun 2019 21:53:22 +0000 (14:53 -0700)]
nexthops: Add ipv6 helper to walk all fib6_nh in a nexthop struct

IPv6 has traditionally had a single fib6_nh per fib6_info. With
nexthops we can have multiple fib6_nh associated with a fib6_info.
Add a nexthop helper to invoke a callback for each fib6_nh in a
'struct nexthop'. If the callback returns non-0, the loop is
stopped and the return value passed to the caller.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: Make tcp_fastopen_alloc_ctx static
YueHaibing [Mon, 10 Jun 2019 15:19:08 +0000 (23:19 +0800)]
tcp: Make tcp_fastopen_alloc_ctx static

Fix sparse warning:

net/ipv4/tcp_fastopen.c:75:29: warning:
 symbol 'tcp_fastopen_alloc_ctx' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'r8169-improve-handling-of-chip-specific-configuration'
David S. Miller [Mon, 10 Jun 2019 17:37:34 +0000 (10:37 -0700)]
Merge branch 'r8169-improve-handling-of-chip-specific-configuration'

Heiner Kallweit says:

====================
r8169: improve handling of chip-specific configuration

This series improves and simplifies handling of chip-specific
configuration.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove struct rtl_cfg_info
Heiner Kallweit [Mon, 10 Jun 2019 16:25:29 +0000 (18:25 +0200)]
r8169: remove struct rtl_cfg_info

Simplify the code by removing struct rtl_cfg_info. Only info we need
per PCI ID is whether it supports GBit or not.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove member coalesce_info from struct rtl_cfg_info
Heiner Kallweit [Mon, 10 Jun 2019 16:24:25 +0000 (18:24 +0200)]
r8169: remove member coalesce_info from struct rtl_cfg_info

To prepare removal of struct rtl_cfg_info, set the coalesce
config based on the chip version number.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove callback hw_start from struct rtl_cfg_info
Heiner Kallweit [Mon, 10 Jun 2019 16:23:30 +0000 (18:23 +0200)]
r8169: remove callback hw_start from struct rtl_cfg_info

After the latest changes we don't need separate functions
rtl_hw_start_8168 and rtl_hw_start_8101 any longer. This allows us to
simplify the code. For this change we need to move rtl_hw_start() and
rtl_hw_start_8169(). rtl_hw_start_8169() is unchanged.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: rename CPCMD_QUIRK_MASK and apply it on all chip versions
Heiner Kallweit [Mon, 10 Jun 2019 16:22:33 +0000 (18:22 +0200)]
r8169: rename CPCMD_QUIRK_MASK and apply it on all chip versions

CPCMD_QUIRK_MASK isn't specific to certain chip versions. The vendor
driver applies this mask to all 8168 versions. Therefore remove QUIRK
from the mask name and apply it on all chip versions.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: improve setting interrupt mask
Heiner Kallweit [Mon, 10 Jun 2019 16:21:50 +0000 (18:21 +0200)]
r8169: improve setting interrupt mask

So far several places in the code deal with setting the interrupt mask
for the respective chip versions. Improve this by having one function
for this only. In addition don't set RxFIFOOver for all 8101 chip
versions like in the vendor driver.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4/libcxgb/cxgb4i/cxgbit: enable eDRAM page pods for iSCSI
Varun Prakash [Mon, 10 Jun 2019 13:06:34 +0000 (18:36 +0530)]
cxgb4/libcxgb/cxgb4i/cxgbit: enable eDRAM page pods for iSCSI

Page pods are used for direct data placement, this patch
enables eDRAM page pods if firmware supports this feature.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mvpp2-stats'
David S. Miller [Mon, 10 Jun 2019 16:12:53 +0000 (09:12 -0700)]
Merge branch 'mvpp2-stats'

Maxime Chevallier says:

====================
net: mvpp2: Add extra ethtool stats

This series adds support for more ethtool counters in PPv2 :
 - Per port counters, including one indicating the classifier drops
 - Per RXQ and per TXQ counters

The first 2 patches perform some light rework and renaming, and the 3rd
adds the extra counters.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: Add support for more ethtool counters
Maxime Chevallier [Mon, 10 Jun 2019 08:55:29 +0000 (10:55 +0200)]
net: mvpp2: Add support for more ethtool counters

Besides the MIB counters, some other useful counters can be exposed to
the user. This commit adds support for :

 - Per-port counters, that indicate FIFO drops and classifier drops,
 - Per-rxq counters,
 - Per-txq counters

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: Rename mvpp2_ethtool_counters to mvpp2_ethtool_mib_counters
Maxime Chevallier [Mon, 10 Jun 2019 08:55:28 +0000 (10:55 +0200)]
net: mvpp2: Rename mvpp2_ethtool_counters to mvpp2_ethtool_mib_counters

Since we'll be adding support for other kind of internal counters, make
clear that the currently supported counters are the MIB counters.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: Only clear the stat counters at port init
Maxime Chevallier [Mon, 10 Jun 2019 08:55:27 +0000 (10:55 +0200)]
net: mvpp2: Only clear the stat counters at port init

When first configuring a port on PPv2, we want to clear the internal
counters so that we don't get values from previous boot stages.

However, we can't really clear these counters when resetting the MAC,
since there are valid reasons to do so while the port is being used,
such as when reconfiguring the interface mode with the PHY.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoocelot: remove unused variable 'rc' in vcap_cmd()
Mao Wenan [Sun, 9 Jun 2019 07:11:26 +0000 (15:11 +0800)]
ocelot: remove unused variable 'rc' in vcap_cmd()

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/mscc/ocelot_ace.c: In function ‘vcap_cmd’:
drivers/net/ethernet/mscc/ocelot_ace.c:108:6: warning: variable ‘rc’ set
but not used [-Wunused-but-set-variable]
  int rc;
      ^
It's never used since introduction in commit b596229448dd ("net: mscc:
ocelot: Add support for tcam")

Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: tcp: send consistent autoflowlabel in TIME_WAIT state
Eric Dumazet [Sun, 9 Jun 2019 00:58:51 +0000 (17:58 -0700)]
ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state

In case autoflowlabel is in action, skb_get_hash_flowi6()
derives a non zero skb->hash to the flowlabel.

If skb->hash is zero, a flow dissection is performed.

Since all TCP skbs sent from ESTABLISH state inherit their
skb->hash from sk->sk_txhash, we better keep a copy
of sk->sk_txhash into the TIME_WAIT socket.

After this patch, ACK or RST packets sent on behalf of
a TIME_WAIT socket have the flowlabel that was previously
used by the flow.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'RGMII-delays-for-SJA1105-DSA-driver'
David S. Miller [Mon, 10 Jun 2019 03:06:54 +0000 (20:06 -0700)]
Merge branch 'RGMII-delays-for-SJA1105-DSA-driver'

Vladimir Oltean says:

====================
RGMII delays for SJA1105 DSA driver

This patchset configures the Tunable Delay Lines of the SJA1105 P/Q/R/S
switches. These add a programmable phase offset on the RGMII RX and TX
clock signals and get used by the driver for fixed-link interfaces that
use the rgmii-id, rgmii-txid or rgmii-rxid phy-modes.

Tested on a board where RGMII delays were already set up, by adding
MAC-side delays on the RGMII interface towards a BCM5464R PHY and
noticing that the MAC now reports SFD, preamble, FCS etc. errors.

Conflicts trivially in drivers/net/dsa/sja1105/sja1105_spi.c with
https://patchwork.ozlabs.org/project/netdev/list/?series=112614&state=*
which must be applied first.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Add RGMII delay support for P/Q/R/S chips
Vladimir Oltean [Sat, 8 Jun 2019 16:12:28 +0000 (19:12 +0300)]
net: dsa: sja1105: Add RGMII delay support for P/Q/R/S chips

As per the DT phy-mode specification, RGMII delays are applied by the
MAC when there is no PHY present on the link.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Remove duplicate rgmii_pad_mii_tx from regs
Vladimir Oltean [Sat, 8 Jun 2019 16:12:27 +0000 (19:12 +0300)]
net: dsa: sja1105: Remove duplicate rgmii_pad_mii_tx from regs

The pad_mii_tx registers point to the same memory region but were
unused. So convert to using these for RGMII I/O cell configuration, as
they bear a shorter name.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: broadcom: Add genphy_suspend and genphy_resume for BCM5464
Vladimir Oltean [Sat, 8 Jun 2019 13:53:56 +0000 (16:53 +0300)]
net: phy: broadcom: Add genphy_suspend and genphy_resume for BCM5464

This puts the quad PHY ports in power-down mode when the PHY transitions
to the PHY_HALTED state.  It is likely that all the other PHYs support
the BMCR_PDOWN bit, but I only have the BCM5464R to test.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Rethink-PHYLINK-callbacks-for-SJA1105-DSA'
David S. Miller [Mon, 10 Jun 2019 02:58:59 +0000 (19:58 -0700)]
Merge branch 'Rethink-PHYLINK-callbacks-for-SJA1105-DSA'

Vladimir Oltean says:

====================
Rethink PHYLINK callbacks for SJA1105 DSA

This patchset implements phylink_mac_link_up and phylink_mac_link_down,
while also removing the code that was modifying the EGRESS and INGRESS
MAC settings for STP and replacing them with the "inhibit TX"
functionality.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Rethink the PHYLINK callbacks
Vladimir Oltean [Sat, 8 Jun 2019 13:03:44 +0000 (16:03 +0300)]
net: dsa: sja1105: Rethink the PHYLINK callbacks

The first fact that needs to be stated is that the per-MAC settings in
SJA1105 called EGRESS and INGRESS do *not* disable egress and ingress on
the MAC. They only prevent non-link-local traffic from being
sent/received on this port.

So instead of having .phylink_mac_config essentially mess with the STP
state and force it to DISABLED/BLOCKING (which also brings useless
complications in sja1105_static_config_reload), simply add the
.phylink_mac_link_down and .phylink_mac_link_up callbacks which inhibit
TX at the MAC level, while leaving RX essentially enabled.

Also stop from trying to put the link down in .phylink_mac_config, which
is incorrect.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Export the sja1105_inhibit_tx function
Vladimir Oltean [Sat, 8 Jun 2019 13:03:43 +0000 (16:03 +0300)]
net: dsa: sja1105: Export the sja1105_inhibit_tx function

This will be used to stop egress traffic in .phylink_mac_link_up.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Update some comments about PHYLIB
Vladimir Oltean [Sat, 8 Jun 2019 13:03:42 +0000 (16:03 +0300)]
net: dsa: sja1105: Update some comments about PHYLIB

Since the driver is now using PHYLINK exclusively, it makes sense to
remove all references to it and replace them with PHYLINK.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: sja1105: Use SPEED_{10, 100, 1000, UNKNOWN} macros
Vladimir Oltean [Sat, 8 Jun 2019 13:03:41 +0000 (16:03 +0300)]
net: dsa: sja1105: Use SPEED_{10, 100, 1000, UNKNOWN} macros

This is a cosmetic patch that replaces the link speed numbers used in
the driver with the corresponding ethtool macros.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoaf_key: make use of BUG_ON macro
Hariprasad Kelam [Sat, 8 Jun 2019 09:00:50 +0000 (14:30 +0530)]
af_key: make use of BUG_ON macro

fix below warnings reported by coccicheck

net/key/af_key.c:932:2-5: WARNING: Use BUG_ON instead of if condition
followed by BUG.
net/key/af_key.c:948:2-5: WARNING: Use BUG_ON instead of if condition
followed by BUG.

Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()
Eric Dumazet [Fri, 7 Jun 2019 19:23:48 +0000 (12:23 -0700)]
ipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()

syzbot found a crash in tcp_v6_send_reset() caused by my latest
change.

Problem is that if an skb has been queued to socket prequeue,
skb_dst(skb)->dev can not anymore point to the device.

Fortunately in this case the socket pointer is not NULL.

A similar issue has been fixed in commit 0f85feae6b71 ("tcp: fix
more NULL deref after prequeue changes"), I should have known better.

Fixes: 323a53c41292 ("ipv6: tcp: enable flowlabel reflection in some RST packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'Avoid-local_irq_save-and-use-napi_alloc_frag-where-possible'
David S. Miller [Mon, 10 Jun 2019 02:40:10 +0000 (19:40 -0700)]
Merge branch 'Avoid-local_irq_save-and-use-napi_alloc_frag-where-possible'

Sebastian Andrzej says:

====================
Avoid local_irq_save() and use napi_alloc_frag() where possible

The first two patches remove local_irq_save() around
`netdev_alloc_cache' which does not work on -RT. Besides helping -RT it
whould benefit the users of the function since they can avoid disabling
interrupts and save a few cycles.
The remaining patches are from a time when I tried to remove
`netdev_alloc_cache' but then noticed that we still have non-NAPI
drivers using netdev_alloc_skb() and I dropped that idea. Using
napi_alloc_frag() over netdev_alloc_frag() would skip the not required
local_bh_disable() around the allocation.

v1…v2:
  - 1/7 + 2/7 use now "(in_irq() || irqs_disabled())" instead just
    "irqs_disabled()" to align with __dev_kfree_skb_any(). Pointed out
    by Eric Dumazet.

  - 6/7 has a typo less. Pointed out by Sergei Shtylyov.

  - 3/7 + 4/7 added acks from Ioana Radulescu.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hwbm: Make the hwbm_pool lock a mutex
Sebastian Andrzej Siewior [Fri, 7 Jun 2019 19:20:40 +0000 (21:20 +0200)]
net: hwbm: Make the hwbm_pool lock a mutex

Based on review, `lock' is only acquired in hwbm_pool_add() which is
invoked via ->probe(), ->resume() and ->ndo_change_mtu(). Based on this
the lock can become a mutex and there is no need to disable interrupts
during the procedure.
Now that the lock is a mutex, hwbm_pool_add() no longer invokes
hwbm_pool_refill() in an atomic context so we can pass GFP_KERNEL to
hwbm_pool_refill() and remove the `gfp' argument from hwbm_pool_add().

Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotg3: Use napi_alloc_frag()
Sebastian Andrzej Siewior [Fri, 7 Jun 2019 19:20:39 +0000 (21:20 +0200)]
tg3: Use napi_alloc_frag()

tg3_alloc_rx_data() uses netdev_alloc_frag() for skb allocation. All
callers of tg3_alloc_rx_data() either hold tp->lock (which is held with
BH disabled) or run in NAPI context.

Use napi_alloc_frag() for skb allocations.

Cc: Siva Reddy Kallam <siva.kallam@broadcom.com>
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnx2x: Use napi_alloc_frag()
Sebastian Andrzej Siewior [Fri, 7 Jun 2019 19:20:38 +0000 (21:20 +0200)]
bnx2x: Use napi_alloc_frag()

SKB allocation via bnx2x_frag_alloc() is always performed in NAPI
context. Preemptible context passes GFP_KERNEL and bnx2x_frag_alloc()
uses then __get_free_page() for the allocation.

Use napi_alloc_frag() for memory allocation.

Cc: Ariel Elior <aelior@marvell.com>
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodpaa2-eth: Use napi_alloc_frag()
Sebastian Andrzej Siewior [Fri, 7 Jun 2019 19:20:37 +0000 (21:20 +0200)]
dpaa2-eth: Use napi_alloc_frag()

The driver is using netdev_alloc_frag() for allocation in the
->ndo_start_xmit() path. That one is always invoked in a BH disabled
region so we could also use napi_alloc_frag().

Use napi_alloc_frag() for skb allocation.

Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Acked-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodpaa2-eth: Remove preempt_disable() from seed_pool()
Sebastian Andrzej Siewior [Fri, 7 Jun 2019 19:20:36 +0000 (21:20 +0200)]
dpaa2-eth: Remove preempt_disable() from seed_pool()

According to the comment, the preempt_disable() statement is required
due to synchronisation in napi_alloc_frag(). The awful truth is that
local_bh_disable() is required because otherwise the NAPI poll callback
can be invoked while the open function setup buffers. This isn't
unlikely since the dpaa2 provides multiple devices.

The usage of napi_alloc_frag() has been removed in commit

 27c874867c4e9 ("dpaa2-eth: Use a single page per Rx buffer")

which means that the comment is not accurate and the preempt_disable()
statement is not required.

Remove the outdated comment and the no longer required
preempt_disable().

Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Acked-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Don't disable interrupts in __netdev_alloc_skb()
Sebastian Andrzej Siewior [Fri, 7 Jun 2019 19:20:35 +0000 (21:20 +0200)]
net: Don't disable interrupts in __netdev_alloc_skb()

__netdev_alloc_skb() can be used from any context and is used by NAPI
and non-NAPI drivers. Non-NAPI drivers use it in interrupt context and
NAPI drivers use it during initial allocation (->ndo_open() or
->ndo_change_mtu()). Some NAPI drivers share the same function for the
initial allocation and the allocation in their NAPI callback.

The interrupts are disabled in order to ensure locked access from every
context to `netdev_alloc_cache'.

Let __netdev_alloc_skb() check if interrupts are disabled. If they are, use
`netdev_alloc_cache'. Otherwise disable BH and use `napi_alloc_cache.page'.
The IRQ check is cheaper compared to disabling & enabling interrupts and
memory allocation with disabled interrupts does not work on -RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Don't disable interrupts in napi_alloc_frag()
Sebastian Andrzej Siewior [Fri, 7 Jun 2019 19:20:34 +0000 (21:20 +0200)]
net: Don't disable interrupts in napi_alloc_frag()

netdev_alloc_frag() can be used from any context and is used by NAPI
and non-NAPI drivers. Non-NAPI drivers use it in interrupt context
and NAPI drivers use it during initial allocation (->ndo_open() or
->ndo_change_mtu()). Some NAPI drivers share the same function for the
initial allocation and the allocation in their NAPI callback.

The interrupts are disabled in order to ensure locked access from every
context to `netdev_alloc_cache'.

Let netdev_alloc_frag() check if interrupts are disabled. If they are,
use `netdev_alloc_cache' otherwise disable BH and invoke
__napi_alloc_frag() for the allocation. The IRQ check is cheaper
compared to disabling & enabling interrupts and memory allocation with
disabled interrupts does not work on -RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'SFP-polling-fixes'
David S. Miller [Mon, 10 Jun 2019 02:25:59 +0000 (19:25 -0700)]
Merge branch 'SFP-polling-fixes'

Robert Hancock says:

====================
SFP polling fixes

This has an updated version of an earlier patch to ensure that SFP
operations are stopped during shutdown, and another patch suggested by
Russell King to address a potential concurrency issue with SFP state
checks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sfp: add mutex to prevent concurrent state checks
Robert Hancock [Fri, 7 Jun 2019 16:42:36 +0000 (10:42 -0600)]
net: sfp: add mutex to prevent concurrent state checks

sfp_check_state can potentially be called by both a threaded IRQ handler
and delayed work. If it is concurrently called, it could result in
incorrect state management. Add a st_mutex to protect the state - this
lock gets taken outside of code that checks and handle state changes, and
the existing sm_mutex nests inside of it.

Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sfp: Stop SFP polling and interrupt handling during shutdown
Robert Hancock [Fri, 7 Jun 2019 16:42:35 +0000 (10:42 -0600)]
net: sfp: Stop SFP polling and interrupt handling during shutdown

SFP device polling can cause problems during the shutdown process if the
parent devices of the network controller have been shut down already.
This problem was seen on the iMX6 platform with PCIe devices, where
accessing the device after the bus is shut down causes a hang.

Free any acquired GPIO interrupts and stop all delayed work in the SFP
driver during the shutdown process, so that we ensure that no pending
operations are still occurring after the SFP shutdown completes.

Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonexthop: off by one in nexthop_mpath_select()
Dan Carpenter [Fri, 7 Jun 2019 15:31:07 +0000 (18:31 +0300)]
nexthop: off by one in nexthop_mpath_select()

The nhg->nh_entries[] array is allocated in nexthop_grp_alloc() and it
has nhg->num_nh elements so this check should be >= instead of >.

Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'bonding-clean-up-and-standarize-logging-printks'
David S. Miller [Sun, 9 Jun 2019 20:36:01 +0000 (13:36 -0700)]
Merge branch 'bonding-clean-up-and-standarize-logging-printks'

Jarod Wilson says:

====================
bonding: clean up and standarize logging printks

This set improves a few somewhat terse bonding debug messages, fixes some
errors in others, and then standarizes the majority of them, using new
slave_* printk macros that wrap around netdev_* to ensure both master
and slave information is provided consistently, where relevant. This set
proves very useful in debugging issues on hosts with multiple bonds.

I've run an array of LNST tests over this set, creating and destroying
quite a few different bonds of the course of testing, fixed the little
gotchas here and there, and everything looks stable and reasonable to me,
but I can't guarantee I've tested every possible message and scenario to
catch every possible "slave could be NULL" case.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding/options: convert to using slave printk macros
Jarod Wilson [Fri, 7 Jun 2019 14:59:32 +0000 (10:59 -0400)]
bonding/options: convert to using slave printk macros

All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.

Suggested-by: Joe Perches <joe@perches.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding/alb: convert to using slave printk macros
Jarod Wilson [Fri, 7 Jun 2019 14:59:31 +0000 (10:59 -0400)]
bonding/alb: convert to using slave printk macros

All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.

Suggested-by: Joe Perches <joe@perches.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding/802.3ad: convert to using slave printk macros
Jarod Wilson [Fri, 7 Jun 2019 14:59:30 +0000 (10:59 -0400)]
bonding/802.3ad: convert to using slave printk macros

All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.

Suggested-by: Joe Perches <joe@perches.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding/main: convert to using slave printk macros
Jarod Wilson [Fri, 7 Jun 2019 14:59:29 +0000 (10:59 -0400)]
bonding/main: convert to using slave printk macros

All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.

Suggested-by: Joe Perches <joe@perches.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding: add slave_foo printk macros
Jarod Wilson [Fri, 7 Jun 2019 14:59:28 +0000 (10:59 -0400)]
bonding: add slave_foo printk macros

Where possible, we generally want both the bond master and the relevant slave
information in message output. Standardize the format using new slave_*
printk macros.

Suggested-by: Joe Perches <joe@perches.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding: fix error messages in bond_do_fail_over_mac
Jarod Wilson [Fri, 7 Jun 2019 14:59:27 +0000 (10:59 -0400)]
bonding: fix error messages in bond_do_fail_over_mac

Passing the bond name again to debug output when referencing slave is wrong.
We're trying to set the bond's MAC to that of the new_active slave, so adjust
the error message slightly and pass in the slave's name, not the bond's.
Then we're trying to set the MAC on the old active slave, but putting the
new active slave's name in the output. While we're at it, clarify the
error messages so you know which one actually triggered.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobonding: improve event debug usability
Jarod Wilson [Fri, 7 Jun 2019 14:59:26 +0000 (10:59 -0400)]
bonding: improve event debug usability

Seeing bonding debug log data along the lines of "event: 5" is a bit spartan,
and often requires a lookup table if you don't remember what every event is.
Make use of netdev_cmd_to_name for an improved debugging experience, so for
the prior example, you'll see: "bond_netdev_event received NETDEV_REGISTER"
instead (both are prefixed with the device for which the event pertains).

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: fec_main: Use dev_err() instead of pr_err()
Fabio Estevam [Fri, 7 Jun 2019 12:14:18 +0000 (09:14 -0300)]
net: fec_main: Use dev_err() instead of pr_err()

dev_err() is more appropriate for printing error messages inside
drivers, so switch to dev_err().

Signed-off-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: Set initial IRQ affinity hints
Nirranjan Kirubaharan [Fri, 7 Jun 2019 11:56:45 +0000 (04:56 -0700)]
cxgb4: Set initial IRQ affinity hints

Spread initial IRQ affinity hints across the device node CPUs,
for nic queue and uld queue IRQs, to load balance and avoid
all interrupts on CPU0.

Signed-off-by: Nirranjan Kirubaharan <nirranjan@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-next'
David S. Miller [Sun, 9 Jun 2019 20:20:59 +0000 (13:20 -0700)]
Merge branch 'hns3-next'

Huazhong Tan says:

====================
net: hns3: some code optimizations & cleanups & bugfixes

This patch-set includes code optimizations, cleanups and bugfixes for
the HNS3 ethernet controller driver.

[patch 1/12] logs more detail error info for ROCE RAS errors.

[patch 2/12] fixes a wrong size issue for mailbox responding.

[patch 3/12] makes HW GRO handing compliant with SW one.

[patch 4/12] refactors hns3_get_new_int_gl.

[patch 5/12] adds handling for VF's over_8bd_nfe_err.

[patch 6/12 - 12/12] adds some code optimizations and cleanups, to
make the code more readable and compliant with some static code
analysis tools, these modifications do not change the logic of
the code.

Change log:
V1->V2: fixes comment from David Miller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix some coding style issues
Weihang Li [Fri, 7 Jun 2019 02:03:13 +0000 (10:03 +0800)]
net: hns3: fix some coding style issues

This patch fixes some coding style issues reported by some static code
analysis tools and code review, such as modify some comments, rename
some variables, log some errors in detail, and fixes some alignment
errors.

BTW, these cleanups do not change the logic of code.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: HuiSong Li <lihuisong@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: some modifications to simplify and optimize code
Yufeng Mo [Fri, 7 Jun 2019 02:03:12 +0000 (10:03 +0800)]
net: hns3: some modifications to simplify and optimize code

This patch deletes some redundant code and refactors some bloated
functions.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: refactor PF/VF RSS hash key configuration
Yufeng Mo [Fri, 7 Jun 2019 02:03:11 +0000 (10:03 +0800)]
net: hns3: refactor PF/VF RSS hash key configuration

In order to make it more readable, this patch modifies PF/VF's
RSS hash key configuring function.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: use macros instead of magic numbers
Yufeng Mo [Fri, 7 Jun 2019 02:03:10 +0000 (10:03 +0800)]
net: hns3: use macros instead of magic numbers

This patch adds some macros instead of magic numbers in serval places

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: small changes for magic numbers
Jian Shen [Fri, 7 Jun 2019 02:03:09 +0000 (10:03 +0800)]
net: hns3: small changes for magic numbers

In order to improve readability, this patch uses macros to
replace some magic numbers, and adds some comments for some
others.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: delete the redundant user NIC codes
Yonglong Liu [Fri, 7 Jun 2019 02:03:08 +0000 (10:03 +0800)]
net: hns3: delete the redundant user NIC codes

Since HNAE3_CLIENT_UNIC and HNAE3_DEV_UNIC is not used any more,
this patch removes the redundant codes.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: trigger VF reset if a VF has an over_8bd_nfe_err
Weihang Li [Fri, 7 Jun 2019 02:03:07 +0000 (10:03 +0800)]
net: hns3: trigger VF reset if a VF has an over_8bd_nfe_err

We trigger PF reset when a RAS error of NIC named over_8bd_nfe_err
occurred before. But it is possible that a VF causes that error, it's
reasonable to trigger VF reset instead of PF reset in this case.
This patch add detection of vf_id if a over_8bd_nfe_err occurs, if
vf_id is 0, we trigger PF reset. Otherwise, we will trigger VF reset
on the VF with error.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: refactor hns3_get_new_int_gl function
Yunsheng Lin [Fri, 7 Jun 2019 02:03:06 +0000 (10:03 +0800)]
net: hns3: refactor hns3_get_new_int_gl function

This patch adds a new hns3_get_new_flow_lvl function to calculate
the packet flow level, which is used to decide the interrupt
coalescence parameter, in order to make the flow level calculation
code more readable and make the future calculation ajdustment easier.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: replace numa_node_id with numa_mem_id for buffer reusing
Yunsheng Lin [Fri, 7 Jun 2019 02:03:05 +0000 (10:03 +0800)]
net: hns3: replace numa_node_id with numa_mem_id for buffer reusing

This patch replaces numa_node_id with numa_mem_id when doing buffer
reusing checking, because the buffer still can be reused when the
buffer is from the nearest node and the local node has no memory
attached.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>