git.openwrt.org Git - openwrt/staging/blogic.git/log

net: ll_temac: Fix typo bug for 32-bit

Fixes: d84aec42151b ("net: ll_temac: Fix support for 64-bit platforms")
Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-mvpp2-cls-Add-classification'

Maxime Chevallier says:

====================
net: mvpp2: cls: Add classification

This series is a rework of the previously standalone patch adding
classification support for mvpp2 :

https://lore.kernel.org/netdev/20190423075031.26074-1-maxime.chevallier@bootlin.com/

This patch has been reworked according to Saeed's review, to make sure
that the location of the rule is always respected and serves as a way to
prioritize rules between each other. This the 3rd iteration of this
submission, but since it's now a series, I reset the revision numbering.

This series implements that in a limited configuration for now, since we
limit the total number of rules per port to 4.

The main factors for this limitation are that :
- We share the classification tables between all ports (4 max, although
   one is only used for internal loopback), hence we have to perform a
   logical separation between rules, which is done today by dedicated
   ranges for each port in each table

- The "Flow table", which dictates which lookups operations are
   performed for an ingress packet, in subdivided into 22 "sub flows",
   each corresponding to a traffic type based on the L3 proto, L4
   proto, the presence or not of a VLAN tag and the L3 fragmentation.

   This makes so that when adding a rule, it has to be added into each
   of these subflows, introducing duplications of entries and limiting
   our max number of entries.

These limitations can be overcomed in several ways, but for readability
sake, I'd rather submit basic classification offload support for now,
and improve it gradually.

This series also adds a small cosmetic cleanup patch (1), and also adds
support for the "Drop" action compared to the first submission of this
feature. It is simple enough to be added with this basic support.

Compared to the first submissions, the NETIF_F_NTUPLE flag was also
removed, following Saeed's comment.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: cls: Allow dropping packets with classification offload

This commit introduces support for the "Drop" action in classification
offload. This corresponds to the "-1" action with ethtool -N.

This is achieved using the color marking actions available in the C2
engine, which associate a color to a packet. These colors can be either
Green, Yellow or Red, Red meaning that the packet should be dropped.

Green and Yellow colors are interpreted by the Policer, which isn't
supported yet.

This method of dropping using the Classifier is different than the
already existing early-drop features, such as VLAN filtering and MAC
UC/MC filtering, which are performed during the Parsing step, and
therefore take precedence over classification actions.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: cls: Add Classification offload support

This commit introduces basic classification offloading support for the
PPv2 controller.

The PPv2 classifier has many classification engines, for now we only use
the C2 TCAM match engine.

This engine allows to perform ternary lookups on 64 bits keys (called
Header Extracted Key), that are built by extracting fields from the packet
header and concatenating them. At most 4 fields can be extracted for a
single lookup.

This basic implementation allows to build the HEK from the following
fields :
- L4 source and destination ports (for UDP and TCP)

More fields are to be added in the future.

Classification flows are added through the ethtool interface, using the
newly introduced flow_rule infrastructure as an internal rule
representation, allowing to more easily implement tc flower rules if
need be.

The internal design for now allocates one range of 4 rules per port
due to the internal design of the flow table, which uses 22 sub-flows.

When inserting a classification rule, the rule is created in every
relevant sub-flow.

This low rule-count is a very simple design which reaches quickly the
limitations of the flow table ordering, but guarantees that the rule
ordering will always be respected.

This commit only introduces support for the "steer to rxq" action.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: cls: Use a bitfield to represent the flow_type

As of today, the classification code is used only for RSS. We split the
incoming traffic into multiple flows, that correspond to the ethtool
flow_type parameter.

We don't want to use the ethtool flow definitions such as TCP_V4_FLOW,
for several reason :

- We want to decorrelate the driver code from ethtool as much as
possible, so that we can easily use other interfaces such as tc flower,

- We want the flow_type to be a bitfield, so that we can match flows
embedded into each other, such as TCP4 which is a subset of IP4.

This commit does the conversion to the newer type.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: cls: Remove extra whitespace in mvpp2_cls_flow_write

Cosmetic patch removing extra whitespaces when writing the flow_table
entries

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-ll_temac-x86_64-support'

Esben Haabendal says:

====================
net: ll_temac: x86_64 support

This patch series adds support for use of ll_temac driver with
platform_data configuration and fixes endianess and 64-bit problems so
that it can be used on x86_64 platform.

A few bugfixes are also included.

Changes since v2:
  - Fixed lp->indirect_mutex initialization regression for OF
    platforms introduced in v2

Changes since v1:
  - Make indirect_mutex specification mandatory when using platform_data
  - Move header to include/linux/platform_data
  - Enable COMPILE_TEST for XILINX_LL_TEMAC
  - Rebased to v5.1-rc7
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Enable DMA when ready, not before

As soon as TAILDESCR_PTR is written, DMA transfers might start.
Let's ensure we are ready to receive DMA IRQ's before doing that.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Allow configuration of IRQ coalescing

This allows custom setup of IRQ coalescing for platforms using legacy
platform_device. The irq timeout and count parameters can be used for
tuning cpu load vs. latency.

I have maintained the 0x00000400 bit in TX_CHNL_CTRL. It is specified as
unused in the documentation I have available. It does not make any
difference in the hardware I have available, so it is left in to not risk
breaking other platforms where it might be used.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Replace bad usage of msleep() with usleep_range()

Use usleep_range() to avoid problems with msleep() actually sleeping
much longer than expected.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Fix bug causing buffer descriptor overrun

As we are actually using a BD for both the skb and each frag contained in
it, the oldest TX BD would be overwritten when there was exactly one BD
less than needed.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Fix iommu/swiotlb leak

Unmap the actual buffer length, not the amount of data received, avoiding
resource exhaustion of swiotlb (seen on x86_64 platform).

Signed-off-by: Esben Haabendal <esben@geanix.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Support indirect_mutex share within TEMAC IP

Indirect register access goes through a DCR bus bridge, which
allows only one outstanding transaction.  And to make matters
worse, each TEMAC IP block contains two Ethernet interfaces, and
although they seem to have separate registers for indirect access,
they actually share the registers.  Or to be more specific, MSW, LSW
and CTL registers are physically shared between Ethernet interfaces
in same TEMAC IP, with RDY register being (almost) specificic to
the Ethernet interface.  The 0x10000 bit in RDY reflects combined
bus ready state though.

So we need to take care to synchronize not only within a single
device, but also between devices in same TEMAC IP.

This commit allows to do that with legacy platform devices.

For OF devices, the xlnx,compound parent of the temac node should be
used to find siblings, and setup a shared indirect_mutex between them.
I will leave this work to somebody else, as I don't have hardware to
test that.  No regression is introduced by that, as before this commit
using two Ethernet interfaces in same TEMAC block is simply broken.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Allow use on x86 platforms

With little-endian and 64-bit support in place, the ll_temac driver can
now be used on x86 and x86_64 platforms.

And while at it, enable COMPILE_TEST also.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Fix support for little-endian platforms

Both TEMAC and SDMA is big-endian, so make sure that all values in SDMA
buffer descriptors (cmdac_bd) are handled as big-endian, independent of the
host endianness. With all currently supported platforms being big-endian,
this change does not make a change for any of them.

Note, when using app3 and app4 for piggybacking skb pointers there is no
need to care about endianness, as neither TEMAC nor SDMA access app3 and
app4 in TX buffer descriptors.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Add support for non-native register endianness

Replace the powerpc specific MMIO register access functions with the
generic big-endian mmio access functions, and add support for
little-endian access depending on configuration.

Big-endian access is maintained as the default, but little-endian can
be configured in device-tree binding or in platform data.

The temac_ior()/temac_iow() functions are replaced with macro wrappers
to avoid modifying existing code more than necessary.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Fix support for 64-bit platforms

The use of buffer descriptor APP4 field (32-bit) for storing skb pointer
obviously does not work on 64-bit platforms.
As APP3 is also unused, we can use that to store the other half of 64-bit
pointer values.

Contrary to what is hinted at in commit message of commit 15bfe05c8d63
("net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit")
there are no other pointers stored in cdmac_bd.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Extend support to non-device-tree platforms

Support initialization with platdata, so the driver can be used on
non-device-tree platforms.

For currently supported device-tree platforms, the driver should behave
as before.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ll_temac: Fix and simplify error handling by using devres functions

As a side effect, a few error cases are fixed.

If of_iomap() of sdma_regs failed, no error code was returned. Fixed to
return -ENOMEM similar to of_iomap() fail of regs.

If sysfs_create_group() or register_netdev() failed, lp->phy_node was not
released.

Finally, the order in remove function is corrected to be reverse order
of what is done in probe, i.e. calling temac_mdio_teardown() last, so we
unregister the netdev that most likely is using the mdio_bus first.

Signed-off-by: Esben Haabendal <esben@geanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpsw: Fix inconsistent IS_ERR and PTR_ERR in cpsw_probe()

Fix inconsistent IS_ERR and PTR_ERR in cpsw_probe,
The proper pointer to use is clk instead of mode.

This issue was detected with the help of Coccinelle.

Fixes: 83a8471ba255 ("net: ethernet: ti: cpsw: refactor probe to group common hw initialization")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-sched-taprio-change-schedules'

Vinicius Costa Gomes says:

====================
net/sched: taprio change schedules

Changes from RFC:
- Removed the patches for taprio offloading, because of the lack of
   in-tree users;
- Updated the links to point to the PATCH version of this series;

Original cover letter:

Overview
--------

This RFC has two objectives, it adds support for changing the running
schedules during "runtime", explained in more detail later, and
proposes an interface between taprio and the drivers for hardware
offloading.

These two different features are presented together so it's clear what
the "final state" would look like. But after the RFC stage, they can
be proposed (and reviewed) separately.

Changing the schedules without disrupting traffic is important for
handling dynamic use cases, for example, when streams are
added/removed and when the network configuration changes.

Hardware offloading support allows schedules to be more precise and
have lower resource usage.

Changing schedules
------------------

The same as the other interfaces we proposed, we try to use the same
concepts as the IEEE 802.1Q-2018 specification. So, for changing
schedules, there are an "oper" (operational) and an "admin" schedule.
The "admin" schedule is mutable and not in use, the "oper" schedule is
immutable and is in use.

That is, when the user first adds an schedule it is in the "admin"
state, and it becomes "oper" when its base-time (basically when it
starts) is reached.

What this means is that now it's possible to create taprio with a schedule:

$ tc qdisc add dev IFACE parent root handle 100 taprio \
      num_tc 3 \
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      queues 1@0 1@1 2@2 \
      base-time 10000000 \
      sched-entry S 03 300000 \
      sched-entry S 02 300000 \
      sched-entry S 06 400000 \
      clockid CLOCK_TAI

And then, later, after the previous schedule is "promoted" to "oper",
add a new ("admin") schedule to be used some time later:

$ tc qdisc change dev IFACE parent root handle 100 taprio \
      base-time 1553121866000000000 \
      sched-entry S 02 500000 \
      sched-entry S 0f 400000 \
      clockid CLOCK_TAI

When enabling the ability to change schedules, it makes sense to add
two more defined knobs to schedules: "cycle-time" allows to truncate a
cycle to some value, so it repeats after a well-defined value;
"cycle-time-extension" controls how much an entry can be extended if
it's the last one before the change of schedules, the reason is to
avoid a very small cycle when transitioning from a schedule to
another.

With these, taprio in the software mode should provide a fairly
complete implementation of what's defined in the Enhancements for
Scheduled Traffic parts of the specification.

Hardware offload
----------------

Some workloads require better guarantees from their schedules than
what's provided by the software implementation. This series proposes
an interface for configuring schedules into compatible network
controllers.

This part is proposed together with the support for changing
schedules, because it raises questions like, should the "qdisc" side
be responsible of providing visibility into the schedules or should it
be the driver?

In this proposal, the driver is called passing the new schedule as
soon as it is validated, and the "core" qdisc takes care of displaying
(".dump()") the correct schedules at all times. It means that some
logic would need to be duplicated in the driver, if the hardware
doesn't have support for multiple schedules. But as taprio doesn't
have enough information about the underlying controller to know how
much in advance a schedule needs to be informed to the hardware, it
feels like a fair compromise.

The hardware offloading part of this proposal also tries to define an
interface for frame-preemption and how it interacts with the
scheduling of traffic, see Section 8.6.8.4 of IEEE 802.1Q-2018 for
more information.

One important difference between the qdisc interface and the
qdisc-driver interface, is that the "gate mask" on the qdisc side
references traffic classes, that is bit 0 of the gate mask means
Traffic Class 0, and in the driver interface, it specifies the queues,
that is bit 0 means queue 0. That is to say that taprio converts the
references to traffic classes to references to queues before sending
the offloading request to the driver.

Request for help
----------------

I would like that interested driver maintainers could take a look at
the proposed interface and see if it's going to be too awkward for any
particular device. Also, pointers to available documentation would be
appreciated. The idea here is to start a discussion so we can have an
interface that would work for multiple vendors.

Links
-----

kernel patches:
https://github.com/vcgomes/net-next/tree/taprio-add-support-for-change-v3

iproute2 patches:
https://github.com/vcgomes/iproute2/tree/taprio-add-support-for-change-v3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

taprio: Add support for cycle-time-extension

IEEE 802.1Q-2018 defines the concept of a cycle-time-extension, so the
last entry of a schedule before the start of a new schedule can be
extended, so "too-short" entries can be avoided.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

taprio: Add support for setting the cycle-time manually

IEEE 802.1Q-2018 defines that a the cycle-time of a schedule may be
overridden, so the schedule is truncated to a determined "width".

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

taprio: Add support adding an admin schedule

The IEEE 802.1Q-2018 defines two "types" of schedules, the "Oper" (from
operational?) and "Admin" ones. Up until now, 'taprio' only had
support for the "Oper" one, added when the qdisc is created. This adds
support for the "Admin" one, which allows the .change() operation to
be supported.

Just for clarification, some quick (and dirty) definitions, the "Oper"
schedule is the currently (as in this instant) running one, and it's
read-only. The "Admin" one is the one that the system configurator has
installed, it can be changed, and it will be "promoted" to "Oper" when
it's 'base-time' is reached.

The idea behing this patch is that calling something like the below,
(after taprio is already configured with an initial schedule):

$ tc qdisc change taprio dev IFACE parent root      \
         base-time X                        \
         sched-entry <CMD> <GATES> <INTERVAL>      \
   ...

Will cause a new admin schedule to be created and programmed to be
"promoted" to "Oper" at instant X. If an "Admin" schedule already
exists, it will be overwritten with the new parameters.

Up until now, there was some code that was added to ease the support
of changing a single entry of a schedule, but was ultimately unused.
Now, that we have support for "change" with more well thought
semantics, updating a single entry seems to be less useful.

So we remove what is in practice dead code, and return a "not
supported" error if the user tries to use it. If changing a single
entry would make the user's life easier we may ressurrect this idea,
but at this point, removing it simplifies the code.

For now, only the schedule specific bits are allowed to be added for a
new schedule, that means that 'clockid', 'num_tc', 'map' and 'queues'
cannot be modified.

Example:

$ tc qdisc change dev IFACE parent root handle 100 taprio \
      base-time $BASE_TIME \
      sched-entry S 00 500000 \
      sched-entry S 0f 500000 \
      clockid CLOCK_TAI

The only change in the netlink API introduced by this change is the
introduction of an "admin" type in the response to a dump request,
that type allows userspace to separate the "oper" schedule from the
"admin" schedule. If userspace doesn't support the "admin" type, it
will only display the "oper" schedule.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

taprio: Fix potencial use of invalid memory during dequeue()

Right now, this isn't a problem, but the next commit allows schedules
to be added during runtime. When a new schedule transitions from the
inactive to the active state ("admin" -> "oper") the previous one can
be freed, if it's freed just after the RCU read lock is released, we
may access an invalid entry.

So, we should take care to protect the dequeue() flow, so all the
places that access the entries are protected by the RCU read lock.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-undo-congestion'

Yuchung Cheng says:

====================
undo congestion window on spurious SYN or SYNACK timeout

Linux TCP currently uses the initial congestion window of 1 packet
if multiple SYN or SYNACK timeouts per RFC6298. However such
timeouts are often spurious on wireless or cellular networks that
experience high delay variances (e.g. ramping up dormant radios or
local link retransmission). Another case is when the underlying
path is longer than the default SYN timeout (e.g. 1 second). In
these cases starting the transfer with a minimal congestion window
is detrimental to the performance for short flows.

One naive approach is to simply ignore SYN or SYNACK timeouts and
always use a larger or default initial window. This approach however
risks pouring gas to the fire when the network is already highly
congested. This is particularly true in data center where application
could start thousands to millions of connections over a single or
multiple hosts resulting in high SYN drops (e.g. incast).

This patch-set detects spurious SYN and SYNACK timeouts upon
completing the handshake via the widely-supported TCP timestamp
options. Upon such events the sender reverts to the default
initial window to start the data transfer so it gets best of both
worlds. This patch-set supports this feature for both active and
passive as well as Fast Open or regular connections.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: refactor setting the initial congestion window

Relocate the congestion window initialization from tcp_init_metrics()
to tcp_init_transfer() to improve code readability.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: refactor to consolidate TFO passive open code

Use a helper to consolidate two identical code block for passive TFO.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: undo cwnd on Fast Open spurious SYNACK retransmit

This patch makes passive Fast Open reverts the cwnd to default
initial cwnd (10 packets) if the SYNACK timeout is spurious.

Passive Fast Open uses a full socket during handshake so it can
use the existing undo logic to detect spurious retransmission
by recording the first SYNACK timeout in key state variable
retrans_stamp. Upon receiving the ACK of the SYNACK, if the socket
has sent some data before the timeout, the spurious timeout
is detected by tcp_try_undo_recovery() in tcp_process_loss()
in tcp_ack().

But if the socket has not send any data yet, tcp_ack() does not
execute the undo code since no data is acknowledged. The fix is to
check such case explicitly after tcp_ack() during the ACK processing
in SYN_RECV state. In addition this is checked in FIN_WAIT_1 state
in case the server closes the socket before handshake completes.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: lower congestion window on Fast Open SYNACK timeout

TCP sender would use congestion window of 1 packet on the second SYN
and SYNACK timeout except passive TCP Fast Open. This makes passive
TFO too aggressive and unfair during congestion at handshake. This
patch fixes this issue so TCP (fast open or not, passive or active)
always conforms to the RFC6298.

Note that tcp_enter_loss() is called only once during recurring
timeouts. This is because during handshake, high_seq and snd_una
are the same so tcp_enter_loss() would incorrect set the undo state
variables multiple times.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: undo init congestion window on false SYNACK timeout

Linux implements RFC6298 and use an initial congestion window
of 1 upon establishing the connection if the SYNACK packet is
retransmitted 2 or more times. In cellular networks SYNACK timeouts
are often spurious if the wireless radio was dormant or idle. Also
some network path is longer than the default SYNACK timeout. In
both cases falsely starting with a minimal cwnd are detrimental
to performance.

This patch avoids doing so when the final ACK's TCP timestamp
indicates the original SYNACK was delivered. It remembers the
original SYNACK timestamp when SYNACK timeout has occurred and
re-uses the function to detect spurious SYN timeout conveniently.

Note that a server may receives multiple SYNs from and immediately
retransmits SYNACKs without any SYNACK timeout. This often happens
on when the client SYNs have timed out due to wireless delay
above. In this case since the server will still use the default
initial congestion (e.g. 10) because tp->undo_marker is reset in
tcp_init_metrics(). This is an intentional design because packets
are not lost but delayed.

This patch only covers regular TCP passive open. Fast Open is
supported in the next patch.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: better SYNACK sent timestamp

Detecting spurious SYNACK timeout using timestamp option requires
recording the exact SYNACK skb timestamp. Previously the SYNACK
sent timestamp was stamped slightly earlier before the skb
was transmitted. This patch uses the SYNACK skb transmission
timestamp directly.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: undo initial congestion window on false SYN timeout

Linux implements RFC6298 and use an initial congestion window of 1
upon establishing the connection if the SYN packet is retransmitted 2
or more times. In cellular networks SYN timeouts are often spurious
if the wireless radio was dormant or idle. Also some network path
is longer than the default SYN timeout. Having a minimal cwnd on
both cases are detrimental to TCP startup performance.

This patch extends TCP undo feature (RFC3522 aka TCP Eifel) to detect
spurious SYN timeout via TCP timestamps. Since tp->retrans_stamp
records the initial SYN timestamp instead of first retransmission, we
have to implement a different undo code additionally. The detection
also must happen before tcp_ack() as retrans_stamp is reset when
SYN is acknowledged.

Note this patch covers both active regular and fast open.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: avoid unconditional congestion window undo on SYN retransmit

Previously if an active TCP open has SYN timeout, it always undo the
cwnd upon receiving the SYNACK. This is because tcp_clean_rtx_queue
would reset tp->retrans_stamp when SYN is acked, which fools then
tcp_try_undo_loss and tcp_packet_delayed. Addressing this issue is
required to properly support undo for spurious SYN timeout.

Fixing this is tricky -- for active TCP open tp->retrans_stamp
records the time when the handshake starts, not the first
retransmission time as the name may suggest. The simplest fix is
for tcp_packet_delayed to ensure it is valid before comparing with
other timestamp.

One side effect of this change is active TCP Fast Open that incurred
SYN timeout. Upon receiving a SYN-ACK that only acknowledged
the SYN, it would immediately retransmit unacknowledged data in
tcp_ack() because the data is marked lost after SYN timeout. But
the retransmission would have an incorrect ack sequence number since
rcv_nxt has not been updated yet tcp_rcv_synsent_state_process(), the
retransmission needs to properly handed by tcp_rcv_fastopen_synack()
like before.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdevsim: fix fall-through annotation

Replace "pass through" with a proper "fall through" annotation
in order to fix the following warning:

drivers/net/netdevsim/bus.c: In function ‘new_device_store’:
drivers/net/netdevsim/bus.c:170:14: warning: this statement may fall through [-Wimplicit-fallthrough=]
   port_count = 1;
   ~~~~~~~~~~~^~~
drivers/net/netdevsim/bus.c:172:2: note: here
  case 2:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This fix is part of the ongoing efforts to enable
-Wimplicit-fallthrough

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: mcdi_port: Mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warning:

drivers/net/ethernet/sfc/mcdi_port.c: In function ‘efx_mcdi_phy_decode_link’:
./include/linux/compiler.h:77:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
# define unlikely(x) __builtin_expect(!!(x), 0)
                      ^~~~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/bug.h:125:2: note: in expansion of macro ‘unlikely’
  unlikely(__ret_warn_on);     \
  ^~~~~~~~
drivers/net/ethernet/sfc/mcdi_port.c:344:3: note: in expansion of macro ‘WARN_ON’
   WARN_ON(1);
   ^~~~~~~
drivers/net/ethernet/sfc/mcdi_port.c:345:2: note: here
  case MC_CMD_FCNTL_OFF:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: Change devlink health locking mechanism

The devlink health reporters create/destroy and user commands currently
use the devlink->lock as a locking mechanism. Different reporters have
different rules in the driver and are being created/destroyed during
different stages of driver load/unload/running. So during execution of a
reporter recover the flow can go through another reporter's destroy and
create. Such flow leads to deadlock trying to lock a mutex already
held.

With the new locking mechanism the different reporters share mutex lock
only to protect access to shared reporters list.
Added refcount per reporter, to protect the reporters from destroy while
being used.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'aquantia-next'

Igor Russkikh says:

====================
net: atlantic: Aquantia driver updates 2019-04

This patchset contains various improvements:

- Work targeting link up speedups: link interrupt introduced, some other
  logic changes to imrove this.
- FW operations securing with mutex
- Counters and statistics logic improved by Dmitry
- read out of chip temperature via hwmon interface implemented by
  Yana and Nikita.

v4 changes:
- remove drvinfo_exit noop
- 64bit stats should be readed out sequentially (lsw, then msw)
  declare 64bit read ops for that

v3 changes:
- temp ops renamed to phy_temp ops
- mutex commits squashed for better structure

v2 changes:
- use threaded irq for link state handling
- rework hwmon via devm_hwmon_device_register_with_info
Extra comments on review from Andrew:
- direct device name pointer is used in hwmon registration.
  This causes hwmon device to derive possible interface name changes
- Will consider sanity checks for firmware mutex lock separately.
  Right now there is no single point exsists where such check could
  be easily added.
- There is no way now to fetch and configure min/max/crit temperatures
  via FW. Will investigate this separately.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: remove outdated device ids

Some device ids were never released and does not exist.
Cleanup these.

Signed-off-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fixups on 64bit dma counters

DMA counters are 64 bit and we can fetch that to reduce
counter overflow, espesially on byte counters.

Tested-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: get total counters from DMA block

aq_nic_update_ndev_stats pushes statistics to ndev->stats from
system interface. This is not always good because it counts packets/bytes
before any of rx filters (including mac filter).

Its better to report the packet/bytes statistics from DMA
counters which gives actual values of data transferred over pci.
System level stats is still available via ethtool.

Signed-off-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: fetch up to date statistics on ethtool request

This improves ethtool -S usage, where stats are now actual
on each request. Before that stats only were updated at service
timer period.

Tested-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: extract timer cb into work job

Service timer callback fetches statistics from FW and that may cause
a long delay in error cases. We also now need to use fw mutex
to prevent concurrent access to FW, thus - extract that logic
from timer callback into the job in the separate work queue.

Signed-off-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: introduce fwreq mutex

Some of FW operations could be invoked simultaneously,
from f.e. ethtool context and from service service activity work.
Here we introduce a fw mutex to secure and serialize access
to FW logic.

Signed-off-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: user correct MSI irq type

Typo in msi code. No much impact though.

Signed-off-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: use macros for better visibility

Improve for better readability

Signed-off-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: improve ifup link detection

Original code detected link only after 1 sec is passed after up.
Here we replace this with direct service callback which updates
link status immediately

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: link status irq handling

Here we define and request an extra interrupt line,
assign it on link isr handler and restructure abit aq_pci code
to better support that.

We also remove logic for using different timer intervals
depending on link state, since thats now useless.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: create global service workqueue

We need this to schedule link interrupt handling and
various service tasks.

Signed-off-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: link interrupt handling function

Define link interrupt handler

Signed-off-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: add link interrupt fields

Declare macroes and nic fields to support link interrupt
handling

Signed-off-by: Nikita Danilov <ndanilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: implement hwmon api for chip temperature

Added support for hwmon api to fetch out chip temperature

Signed-off-by: Yana Esina <yana.esina@aquantia.com>
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: aquantia: add infrastructure to readout chip temperature

Ability to read the chip temperature from memory
via hwmon interface

Signed-off-by: Yana Esina <yana.esina@aquantia.com>
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove manual autoneg restart workaround

According to Neil who reported the issue leading to this
workaround, the workaround is no longer needed since
version 5.0. So let's remove it.

This was the bug report leading to the workaround:
https://bugzilla.kernel.org/show_bug.cgi?id=201081

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Neil MacLeod <neil@nmacleod.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8169-improve-eri-function-handling'

Heiner Kallweit says:

====================
r8169: improve eri function handling

This series aims at improving and simplifying the eri functions.
No functional change intended.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: add rtl_reset_packet_filter

Fortunately in one place there's a comment explaining what toggling
this bit does. So let's create a helper for it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: add helpers rtl_eri_set/clear_bits

Add helpers rtl_eri_set_bits and rtl_eri_clear_bits to improve
readability of the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: make ERIAR_EXGMAC the default in eri functions

In basically all eri function calls the type argument is ERIAR_EXGMAC.
Therefore make it the default.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Convert-mv88e6060-to-mdio-device'

Andrew Lunn says:

====================
Convert mv88e6060 to mdio device

This patchset builds upon the previous patches to mv88e6060. It adds
support for probing the switch as an MDIO device and then removes the
legacy probe method. Since this is the last device supporting legacy
probe, this allows legacy probe to be removed, originally planned to
be removed in 4.17, but took a bit longer.

This change to the mv88e6060 is more risky than the previous
patchset. Some attempts to test it have been made, by hacking the
driver to match on an mv88e6352 so that it probes. These changes are
all about probe, so it is a reasonable test. But testing on a real
mv88e6060 would be great.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: DSA: Remove legacy binding

Now that the code to support the legacy binding has been removed,
remove the documentation for it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Remove legacy probing support

Now that all drivers can be probed using more traditional methods,
remove the legacy probe code.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6060: Remove support for legacy probing

Now that the driver can be probed as an mdio device, remove the legacy
DSA platform device probing.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6060: Support probing as an mdio device

Probing DSA devices as platform devices has been superseded by using
normal bus drivers. Add support for probing the mv88e6060 device as an
mdio device.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-core-vlan'

Vladimir Oltean says:

====================
Improvements to DSA core VLAN manipulation

In preparation of submitting the NXP SJA1105 driver, the Broadcom b53
and Mediatek mt7530 drivers have been found to apply some VLAN
workarounds that are needed in the new driver as well.

Therefore this patchset is mostly simply promoting the DSA driver
workarounds for VLAN to the generic code.

The b53 driver was applying a few workarounds in order to convince DSA
that its vlan_filtering setting is not really per-port. This is now
simply set by the driver via a DSA variable at probe time. The sja1105
driver will be a second user of this.

The mt7530 was also keeping track of when the .port_vlan_filtering
callback was being called. Remove the kept state from this driver
and simplify dealing with vlan_filtering in the generic case.

TODO:

Find the best way to deal generically with the situation described below
(discussion at https://lkml.org/lkml/2019/4/16/1355):

> > +Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but
> > +all bridges should have the same level of VLAN awareness (either both have
> > +``vlan_filtering`` 0, or both 1). Also an inevitable limitation of the fact
> > +that VLAN awareness is global at the switch level is that once a bridge with
> > +``vlan_filtering`` enslaves at least one switch port, the other un-bridged
> > +ports are no longer available for standalone traffic termination.
>
> That is quite a limitation that I don't think I had fully grasped until
> reading your different patches. Since enslaving ports into a bridge
> comes after the network device was already made available for use, maybe
> you should force the carrier down or something along those lines as soon
> as a port is enslaved into a bridge with vlan_filtering=1 to make this
> more predictable for the user?
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Add more convenient functions for installing port VLANs

This hides the need to perform a two-phase transaction and construct a
switchdev_obj_port_vlan struct.

Call graph (including a function that will be introduced in a follow-up
patch) looks like this now (same for the *_vlan_del function):

dsa_slave_vlan_rx_add_vid   dsa_port_setup_8021q_tagging
            |                        |
            |                        |
            |          +-------------+
            |          |
            v          v
           dsa_port_vid_add      dsa_slave_port_obj_add
                  |                         |
                  +-------+         +-------+
                          |         |
                          v         v
                       dsa_port_vlan_add

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: b53: Use vlan_filtering property from dsa_switch

While possible (and safe) to use the newly introduced
dsa_port_is_vlan_filtering helper, fabricating a dsa_port pointer is a
bit awkward, so simply retrieve this from the dsa_switch structure.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: b53: Let DSA call .port_vlan_filtering only when necessary

Since DSA has recently learned to treat better with drivers that set
vlan_filtering_is_global, doing this is no longer required.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Skip calling .port_vlan_filtering on no change

Even if VLAN filtering is global, DSA will call this callback once per
each port. Drivers should not have to compare the global state with the
requested change. So let DSA do it.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mt7530: Use the DSA vlan_filtering helper function

This was recently introduced, so keeping state inside the driver is no
longer necessary.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Add helper function to retrieve VLAN awareness setting

Since different types of hardware may or may not support this setting
per-port, DSA keeps it either in dsa_switch or in dsa_port.

While drivers may know the characteristics of their hardware and
retrieve it from the correct place without the need of helpers, it is
cumbersone to find out an unambigous answer from generic DSA code.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Keep the vlan_filtering setting in dsa_switch if it's global

The current behavior is not as obvious as one would assume (which is
that, if the driver set vlan_filtering_is_global = 1, then checking any
dp->vlan_filtering would yield the same result). Only the ports which
are actively enslaved into a bridge would have vlan_filtering set.

This makes it tricky for drivers to check what the global state is.
So fix this and make the struct dsa_switch hold this global setting.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mt7530: Let DSA handle the unsetting of vlan_filtering

The driver, recognizing that the .port_vlan_filtering callback was never
coming after the port left its parent bridge, decided to take that duty
in its own hands. DSA now takes care of this condition, so fix that.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Unset vlan_filtering when ports leave the bridge

When ports are standalone (after they left the bridge), they should have
no VLAN filtering semantics (they should pass all traffic to the CPU).
Currently this is not true for switchdev drivers, because the bridge
"forgets" to unset that.

Normally one would think that doing this at the bridge layer would be a
better idea, i.e. call br_vlan_filter_toggle() from br_del_if(), similar
to how nbp_vlan_init() is called from br_add_if().

However what complicates that approach, and makes this one preferable,
is the fact that for the bridge core, vlan_filtering is a per-bridge
setting, whereas for switchdev/DSA it is per-port. Also there are
switches where the setting is per the entire device, and unsetting
vlan_filtering one by one, for each leaving port, would not be possible
from the bridge core without a certain level of awareness. So do this in
DSA and let drivers be unaware of it.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: b53: Let DSA handle mismatched VLAN filtering settings

The DSA core is now able to do this check prior to calling the
.port_vlan_filtering callback, so tell it that VLAN filtering is global
for this particular hardware.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Be aware of switches where VLAN filtering is a global setting

On some switches, the action of whether to parse VLAN frame headers and use
that information for ingress admission is configurable, but not per
port. Such is the case for the Broadcom BCM53xx and the NXP SJA1105
families, for example. In that case, DSA can prevent the bridge core
from trying to apply different VLAN filtering settings on net devices
that belong to the same switch.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Store vlan_filtering as a property of dsa_port

This allows drivers to query the VLAN setting imposed by the bridge
driver directly from DSA, instead of keeping their own state based on
the .port_vlan_filtering callback.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Fix pharse -> phase typo

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2019-04-30

1) A lot of work to remove indirections from the xfrm code.
   From Florian Westphal.

2) Support ESP offload in combination with gso partial.
   From Boris Pismenny.

3) Remove some duplicated code from vti4.
   From Jeremy Sowden.

Please note that there is merge conflict

between commit:

8742dc86d0c7 ("xfrm4: Fix uninitialized memory read in _decode_session4")

from the ipsec tree and commit:

c53ac41e3720 ("xfrm: remove decode_session indirection from afinfo_policy")

from the ipsec-next tree. The merge conflict will appear
when those trees get merged during the merge window.
The conflict can be solved as it is done in linux-next:

https://lkml.org/lkml/2019/4/25/1207

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: make sure the factory test bit is cleared

The KSZ8081 PHY has a factory test mode which is set at the de-assertion
of the reset line based on the RXER (KSZ8081RNA/RND) or TXC
(KSZ8081MNX/RNB) pin. If a pull-down is missing, or if the pin has a
pull-up, the factory test mode should be cleared by manually writing a 0
(according to the datasheet). This patch makes sure this factory test
bit is cleared in config_init().

Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: avoid unneeded MDIO reads in genphy_read_status

Considering that in polling mode each link drop will be latched,
settings can't have changed if link was up and is up.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-tag-modules'

Andrew Lunn says:

====================
Make DSA tag drivers kernel modules

Historically, DSA tag drivers have been compiled into the kernel as
part of the DSA core. With the growing number of tag drivers, it makes
sense to allow this driver code to be compiled as a module, and loaded
on demand.

v2
--
Move name to end of structure, keeping the hot entries at the beginning.
More tag protocol to end of structure to keep hot members at the beginning.
Fix indent of #endif
Rewrite to move list pointer into a new structure
void functions, since there cannot be errors
Fix fall-through comment
Reorder patch for unused symbols to before tag drivers can be modules
tab/space cleanup
Help text wording
NET_DSA_TAG_BRCM_COMMON and NET_DSA_TAG_KZS_COMMON hidden

v3
--
boilerplate: Move kdoc next to macro
boilerplate: Fix THIS_MODULE indentation
Kconfig: More tabification
Kconfig: Punctuation

v4
--
Cover note {H}istorically
Kconfig: trailer
====================

Tested-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Allow tag drivers to be built as modules

Make the CONFIG symbols tristate and add help text.

The broadcom and Microchip KSZ tag drivers support two different
tagging protocols in one driver. Add a configuration option for the
drivers, and then options to select the protocol.

Create a submenu for the tagging drivers.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2:
tab/space cleanup
Help text wording
NET_DSA_TAG_BRCM_COMMON and NET_DSA_TAG_KZS_COMMON hidden

v3:
More tabification
Punctuation

v4:
trailler->trailer

Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: tag_brcm: Avoid unused symbols

It is possible that the driver is compiled with both
CONFIG_NET_DSA_TAG_BRCM and CONFIG_NET_DSA_TAG_BRCM_PREPEND disabled.
This results in warnings about unused symbols. Add some conditional
compilation to avoid this.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2
Reorder patch to before tag drivers can be modules

Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Cleanup unneeded table and make tag structures static

Now that tag drivers dynamically register, we don't need the static
table. Remove it. This also means the tag driver structures can be
made static.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Make use of the list of tag drivers

Implement the _get and _put functions to make use of the list of tag
drivers. Also, trigger the loading of the module, based on the alias
information. The _get function takes a reference on the tag driver, so
it cannot be unloaded, and the _put function releases the reference.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2:
Make tag_driver_register void

Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Add stub tag driver put method

When a DSA switch driver is unloaded, the lock on the tag driver
should be released so the module can be unloaded. Add the needed calls,
but leave the actual release code as a stub.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2

Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Rename dsa_resolve_tag_protocol() to _get ready for locking

dsa_resolve_tag_protocol() is used to find the tagging driver needed
by a switch driver. When the tagging drivers become modules, it will
be necassary to take a reference on the module to prevent it being
unloaded. So rename this function to _get() to indicate it has some
locking properties.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Register the none tagger ops

The none tagger is special in that it does not live in a tag_*.c file,
but is within the core. Register/unregister when DSA is
loaded/unloaded.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Keep link list of tag drivers

Let the tag drivers register themselves with the DSA core, keeping
them in a linked list.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2

Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Add boilerplate helper to register DSA tag driver modules

A DSA tag driver module will need to register the tag protocols it
implements with the DSA core. Add macros containing this boiler plate.

The registration/unregistration code is currently just a stub. A Later
patch will add the real implementation.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2
Fix indent of #endif
Rewrite to move list pointer into a new structure
v3
Move kdoc next to macro
Fix THIS_MODULE indentation

Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Add TAG protocol to tag ops

In order that we can match the tagging protocol a switch driver
request to the tagger, we need to know what protocol the tagger
supports. Add this information to the ops structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2
More tag protocol to end of structure to keep hot members at the beginning.

Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Add MODULE_LICENSE to tag drivers

All the tag drivers are some variant of GPL. Add a MODULE_LICENSE()
indicating this, so the drivers can later be compiled as modules.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Add MODULE_ALIAS to taggers in preparation to become modules

When the tag drivers become modules, we will need to dynamically load
them based on what the switch drivers need. Add aliases to map between
the TAG protocol and the driver.

In order to do this, we need the tag protocol number as something
which the C pre-processor can stringinfy. Only the compiler knows the
value of an enum, CPP cannot use them. So add #defines.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Move tagger name into its ops structure

Rather than keep a list to map a tagger ops to a name, place the name
into the ops structure. This removes the hard coded list, a step
towards making the taggers more dynamic.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2:
Move name to end of structure, keeping the hot entries at the beginning.

Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Add SPDX header to tag drivers.

Add an SPDX header, and remove the license boilerplate text.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2019-04-28

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Introduce BPF socket local storage map so that BPF programs can store
   private data they associate with a socket (instead of e.g. separate hash
   table), from Martin.

2) Add support for bpftool to dump BTF types. This is done through a new
   `bpftool btf dump` sub-command, from Andrii.

3) Enable BPF-based flow dissector for skb-less eth_get_headlen() calls which
   was currently not supported since skb was used to lookup netns, from Stanislav.

4) Add an opt-in interface for tracepoints to expose a writable context
   for attached BPF programs, used here for NBD sockets, from Matt.

5) BPF xadd related arm64 JIT fixes and scalability improvements, from Daniel.

6) Change the skb->protocol for bpf_skb_adjust_room() helper in order to
   support tunnels such as sit. Add selftests as well, from Willem.

7) Various smaller misc fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Delete all hash and TCAM filters before resource cleanup

During driver unload, hash/TCAM filter deletion doesn't wait for
completion.This patch deletes all the filters with completion before
clearing the resources.

Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Remove legacy probe support

Remove the legacy method of probing the mv88e6xxx driver, now that all
the mainline boards have been converted to use mdio based probing for
a number of cycles.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mv88e6060-cleanups'

Andrew Lunn says:

====================
mv88e6060 cleanups

This patchset performs some cleanups of the mv88e6060 DSA driver, as a
step towards making it an MDIO device, rather than use the old probing
method. The changes here are all pretty mechanical and only compile
tested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6060: Replace REG_READ macro

The REG_READ macro contains a return statement, making it not very
safe. Remove it by inlining the code.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>