git.openwrt.org Git - openwrt/staging/blogic.git/log

projects / openwrt / staging / blogic.git / log

Jacob Keller [Fri, 26 Aug 2016 07:14:34 +0000 (00:14 -0700)]

fm10k: rework vxlan_port offload before adding geneve support

In preparation for adding Geneve Rx offload support, refactor the
current VXLAN offload flow to be a bit more generic so that it will be
easier to add the new Geneve code. The fm10k hardware supports one VXLAN
and one Geneve tunnel, so we will eventually treat the VXLAN and Geneve
tunnels identically. To this end, factor out the code that handles the
current list so that we can use the generic flow for both tunnels in the
next patch.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jacob Keller [Thu, 23 Jun 2016 20:54:01 +0000 (13:54 -0700)]

fm10k: don't try to stop queues if we've lost hw_addr

In the event of a surprise remove, we expect the driver to go down,
which includes calling .stop_hw(). However, this function will return an
error because the queues won't appear to cleanly disable. Prevent this
and avoid the unnecessary checks by just returning when
FM10K_REMOVED(hw->hw_addr) is true.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jacob Keller [Thu, 23 Jun 2016 20:31:01 +0000 (13:31 -0700)]

fm10k: don't continue probe if PCI device not in normal IO state

In the event of an uncorrectable AER error occurring when the driver has
not loaded, the recovery routines are not done. This is done because
future loads of the driver may not be aware of the IO state and may not
be able to recover at all. In this case, when we next load the driver it
fails due to what appears to be a surprise remove event. Instead, add
a check to ensure that the device is in the normal IO state before
continuing to probe. This allows us to give a more descriptive message
of what is wrong.

Without this change, the driver will attempt to probe up to our first
call of .reset_hw() which will be unable to read registers and act as if
a surprise remove event occurred.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jacob Keller [Thu, 23 Jun 2016 20:31:00 +0000 (13:31 -0700)]

fm10k: print error code when pci_enable_device_mem fails during probe

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jacob Keller [Mon, 20 Jun 2016 17:39:32 +0000 (10:39 -0700)]

fm10k: NAPI polling routine must return actual work done

When fm10k_poll fully cleans rings it returns 0. This is incorrect as it
messes up the budget accounting in the core NAPI code. Fix this by
returning actual work done, capped at budget - 1 since the core doesn't
expect a return of the full budget when the driver modifies the NAPI
status.

Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jacob Keller [Fri, 17 Jun 2016 23:21:11 +0000 (16:21 -0700)]

fm10k: prefer READ_ONCE instead of ACCESS_ONCE

While technically not needed, as all our uses of ACCESS_ONCE are scalar
types, we already use READ_ONCE in a few places, and for code
readability we can swap all the uses of the older ACCESS_ONCE into
READ_ONCE.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jacob Keller [Fri, 17 Jun 2016 21:36:45 +0000 (14:36 -0700)]

fm10k: remove fm10k_get_reta_size from namespace

The function is only used in fm10k_ethtool.c, so make it static.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jacob Keller [Thu, 9 Jun 2016 22:42:36 +0000 (15:42 -0700)]

fm10k: use variadic form of alloc_workqueue

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jacob Keller [Thu, 9 Jun 2016 21:56:05 +0000 (14:56 -0700)]

fm10k: use software values when checking for Tx hangs in hot path

A previous patch added support to check for hardware Tx pending in the
fm10k_down routine. This support was intended to ensure that we
accurately check what the hardware state is. However, checking for Tx
hangs in this manor during the hotpath results in a large performance
hit. Avoid this by making the hotpath check use the SW counters instead.

Fixes: a0f53cf49cb0 ("fm10k: use actual hardware registers when checking for pending Tx", 2016-06-08)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jacob Keller [Thu, 9 Jun 2016 19:02:03 +0000 (12:02 -0700)]

fm10k: fix PCI device enable_cnt leak in .io_slot_reset

A previous patch removed the pci_disable_device() call in
.io_error_detected. This call corresponded to a pci_enable_device_mem()
call within .io_slot_reset handler. Change the call here to
a pci_reenable_device() so that it does not increment and leak the
enable_cnt reference count for the device. Without this change, VF
devices may fail during an unbind/bind, and we'll never zero the
reference counter for the pci_dev structure.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Arnd Bergmann [Fri, 26 Aug 2016 15:25:45 +0000 (17:25 +0200)]

net_sched: fix use of uninitialized ethertype variable in cls_flower

The addition of VLAN support caused a possible use of uninitialized
data if we encounter a zero TCA_FLOWER_KEY_ETH_TYPE key, as pointed
out by "gcc -Wmaybe-uninitialized":

net/sched/cls_flower.c: In function 'fl_change':
net/sched/cls_flower.c:366:22: error: 'ethertype' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This changes the code to only set the ethertype field if it
was nonzero, as before the patch.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 9399ae9a6cb2 ("net_sched: flower: Add vlan support")
Cc: Hadar Hen Zion <hadarh@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Arnd Bergmann [Fri, 26 Aug 2016 15:25:46 +0000 (17:25 +0200)]

net/xgene: fix error handling during reset

The newly added reset logic uses helper functions for the MMIO that
may fail. However, when the read operation fails, we end up writing
back uninitialized data to the register, as gcc warns:

drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c: In function 'xgene_enet_link_state':
drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c:213:2: error: 'data' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c:209:6: note: 'data' was declared here
u32 data;

We already print a warning to the console log if that happens,
the best alternative that I can see is skip the rest of the reset
sequence if the register value cannot be read: Most likely the
write would fail as well, and if it succeeded, worse things could
happen.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 3eb7cb9dc946 ("drivers: net: xgene: XFI PCS reset when link is down")
Cc: Fushen Chen <fchen@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vidya Sagar Ravipati [Fri, 26 Aug 2016 08:25:50 +0000 (01:25 -0700)]

net: ethtool: add support for 1000BaseX and missing 10G link modes

This patch enhances ethtool link mode bitmap to include
missing interface modes for 1G/10G speeds

Changes:
1000baseX is the mode introduced to cover all 1G Fiber cases.
All modes under 1000BaseX i.e. 1000BASE-SX, 1000BASE-LX, 1000BASE-LX10
and 1000BASE-BX10 are not explicitly defined at this moment.
10G CR,SR,LR and ER link modes are included for 10G speed..

Issue:
ethtool on  1G/10G SFP port reports Base-T
as this port supports 1000baseX,10G CR, SR and LR modes.

root@tor-02$ ethtool swp1
Settings for swp1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Speed: 10000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: off
        Current message level: 0x00000000 (0)

        Link detected: yes

After fix:
root@tor-02$ ethtool swp1
Settings for swp1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseX/Full
                                10000baseCR/Full
                                10000baseSR/Full
                                10000baseLR/Full
                                10000baseER/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Speed: 10000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: off
        Current message level: 0x00000000 (0)
        Link detected: yes

Signed-off-by: Vidya Sagar Ravipati <vidya@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

James Morse [Fri, 26 Aug 2016 08:21:23 +0000 (09:21 +0100)]

amd-xgbe: Reset running devices after resume from hibernate

After resume from hibernate on arm64, any amd-xgbe devices that were
running when we hibernated are reported as down, even when it is not.

Re-plugging the cables does not cause the interface to come back, the
link must be marked as down then up via 'ip set link' using the serial
console.

This happens because the device has been power-cycled and possibly
re-initialised by firmware, whereas the driver's memory structures have
been restored from the hibernate image and the two do not agree.

Schedule a restart of the device after powerup in case the world changed
while we were asleep.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Sat, 27 Aug 2016 14:37:54 +0000 (07:37 -0700)]

tcp: add tcp_add_backlog()

When TCP operates in lossy environments (between 1 and 10 % packet
losses), many SACK blocks can be exchanged, and I noticed we could
drop them on busy senders, if these SACK blocks have to be queued
into the socket backlog.

While the main cause is the poor performance of RACK/SACK processing,
we can try to avoid these drops of valuable information that can lead to
spurious timeouts and retransmits.

Cause of the drops is the skb->truesize overestimation caused by :

- drivers allocating ~2048 (or more) bytes as a fragment to hold an
  Ethernet frame.

- various pskb_may_pull() calls bringing the headers into skb->head
  might have pulled all the frame content, but skb->truesize could
  not be lowered, as the stack has no idea of each fragment truesize.

The backlog drops are also more visible on bidirectional flows, since
their sk_rmem_alloc can be quite big.

Let's add some room for the backlog, as only the socket owner
can selectively take action to lower memory needs, like collapsing
receive queues or partial ofo pruning.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Colin Ian King [Sun, 28 Aug 2016 11:07:02 +0000 (12:07 +0100)]

cxgb4/cxgb4vf: fix spelling mistake "provissioned" -> "provisioned"

Trivial fix to spelling mistake in dev_warn message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Colin Ian King [Sun, 28 Aug 2016 11:03:27 +0000 (12:03 +0100)]

net: ucc_geth: fix spelling mistake "propperty" -> "property"

Trivial fix to spelling mistake in dev_warn message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Colin Ian King [Sun, 28 Aug 2016 10:40:41 +0000 (11:40 +0100)]

wan/fsl_ucc_hdlc: fix spelling mistake "prameter" -> "parameter"

Trivial fix to spelling mistake in dev_err message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 29 Aug 2016 03:32:58 +0000 (23:32 -0400)]

Merge branch 'strp-generalization'

Tom Herbert says:

====================
strp: Generalize stream parser to work with other socket types

Add a read_sock protocol operation function that allows something like
tcp_read_sock to be called for other protocol types.

Specific changes in this patch set:
  - Add read_sock function to proto_ops. This has the same signature as
    tcp_read_sock. sk_read_actor_t is also defined in net.h.
  - Set peek_len and read_sock proto_op functions for TCPv4 and TCPv6
    stream ops.
  - Remove references to tcp in strparser.
  - Call peek_len and read_sock operations from strparser instead of
    calling TCP specific functions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tom Herbert [Sun, 28 Aug 2016 21:43:19 +0000 (14:43 -0700)]

kcm: Remove TCP specific references from kcm and strparser

kcm and strparser need to work with any type of stream socket not just
TCP. Eliminate references to TCP and call generic proto_ops functions of
read_sock and peek_len. Also in strp_init check if the socket support
the proto_ops read_sock and peek_len.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tom Herbert [Sun, 28 Aug 2016 21:43:18 +0000 (14:43 -0700)]

tcp: Set read_sock and peek_len proto_ops

In inet_stream_ops we set read_sock to tcp_read_sock and peek_len to
tcp_peek_len (which is just a stub function that calls tcp_inq).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tom Herbert [Sun, 28 Aug 2016 21:43:17 +0000 (14:43 -0700)]

net: Add read_sock proto_op

Add new function in proto_ops structure. This includes moving the
typedef got sk_read_actor into net.h and removing the definition from
tcp.h.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Wei Yongjun [Fri, 26 Aug 2016 14:35:43 +0000 (14:35 +0000)]

net: ethernet: ti: cpsw: fix error return code in cpsw_set_channels()

Fix to return a negative error code from the cpsw_fill_rx_channels()
error handling case instead of 0, as done elsewhere in this function.

Fixes: ce52c744574b ("net: ethernet: ti: cpsw: add ethtool channels support")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Zhu Yanjun [Fri, 26 Aug 2016 14:21:47 +0000 (22:21 +0800)]

vxlan: remove the useless header file protocol.h

This header file is not used in vxlan.c file.

Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Wei Yongjun [Fri, 26 Aug 2016 14:21:08 +0000 (14:21 +0000)]

chcr: Fix non static symbol warning

Fixes the following sparse warning:

drivers/crypto/chelsio/chcr_algo.c:593:5: warning:
symbol 'cxgb4_is_crypto_q_full' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jon Cooper [Fri, 26 Aug 2016 14:13:30 +0000 (15:13 +0100)]

sfc: work around TRIGGER_INTERRUPT command not working on SFC9140

MC_CMD_TRIGGER_INTERRUPT does not work on the SFC9140, as used in the
sfn7x42q and sfn7x24f.
Check for this using the MCDI workaround mechanism.
The command is only used during self test. If it's not supported, skip
the interrupt test.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Edward Cree [Fri, 26 Aug 2016 14:12:57 +0000 (15:12 +0100)]

sfc: remove duplicate assignment

nic_data was already initialised to the right thing, no need to assign
it again.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Edward Cree [Fri, 26 Aug 2016 14:12:41 +0000 (15:12 +0100)]

sfc: include size-binned TX stats on sfn8542q

TX size bins were not supported on the 7000's 40G MAC, but the 8000 series
does support them and the MCPU advertises that via a new capability bit.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Sat, 27 Aug 2016 04:38:42 +0000 (21:38 -0700)]

Merge branch 'tipc-udp-replicast'

Richard Alpe says:

====================
tipc: introduce UDP replicast

This series introduces UDP replicast. A concept where we emulate multicast by
sending multiple unicast messages to configured peers. This allows TIPC to be
used in environments where IP multicast is disabled.

There is a corresponding patch series for the tipc user space tool that
allows a user to add remote addresses to the replicast list.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Richard Alpe [Fri, 26 Aug 2016 08:52:56 +0000 (10:52 +0200)]

tipc: add UDP remoteip dump to netlink API

When using replicast a UDP bearer can have an arbitrary amount of
remote ip addresses associated with it. This means we cannot simply
add all remote ip addresses to an existing bearer data message as it
might fill the message, leaving us with a truncated message that we
can't safely resume. To handle this we introduce the new netlink
command TIPC_NL_UDP_GET_REMOTEIP. This command is intended to be
called when the bearer data message has the
TIPC_NLA_UDP_MULTI_REMOTEIP flag set, indicating there are more than
one remote ip (replicast).

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Richard Alpe [Fri, 26 Aug 2016 08:52:55 +0000 (10:52 +0200)]

tipc: add the ability to get UDP options via netlink

Add UDP bearer options to netlink bearer get message. This is used by
the tipc user space tool to display UDP options.

The UDP bearer information is passed using either a sockaddr_in or
sockaddr_in6 structs. This means the user space receiver should
intermediately store the retrieved data in a large enough struct
(sockaddr_strage) before casting to the proper IP version type.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Richard Alpe [Fri, 26 Aug 2016 08:52:54 +0000 (10:52 +0200)]

tipc: add replicast peer discovery

Automatically learn UDP remote IP addresses of communicating peers by
looking at the source IP address of incoming TIPC link configuration
messages (neighbor discovery).

This makes configuration slightly easier and removes the problematic
scenario where a node receives directly addressed neighbor discovery
messages sent using replicast which the node cannot "reply" to using
mutlicast, leaving the link FSM in a limbo state.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Richard Alpe [Fri, 26 Aug 2016 08:52:53 +0000 (10:52 +0200)]

tipc: introduce UDP replicast

This patch introduces UDP replicast. A concept where we emulate
multicast by sending multiple unicast messages to configured peers.

The purpose of replicast is mainly to be able to use TIPC in cloud
environments where IP multicast is disabled. Using replicas to unicast
multicast messages is costly as we have to copy each skb and send the
copies individually.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Richard Alpe [Fri, 26 Aug 2016 08:52:52 +0000 (10:52 +0200)]

tipc: refactor multicast ip check

Add a function to check if a tipc UDP media address is a multicast
address or not. This is a purely cosmetic change.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Richard Alpe [Fri, 26 Aug 2016 08:52:51 +0000 (10:52 +0200)]

tipc: split UDP send function

Split the UDP send function into two. One callback that prepares the
skb and one transmit function that sends the skb. This will come in
handy in later patches, when we introduce UDP replicast.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Richard Alpe [Fri, 26 Aug 2016 08:52:50 +0000 (10:52 +0200)]

tipc: split UDP nl address parsing

Split the UDP netlink parse function so that it only parses one
netlink attribute at the time. This makes the parse function more
generic and allow future UDP API functions to use it for parsing.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Thu, 25 Aug 2016 22:23:41 +0000 (15:23 -0700)]

net: dsa: bcm_sf2: Utilize mask clear/set helpers in bcm_sf2_intr_disable

And while at it, remove the unecessary writing of zeroes to the CPU_MASK_CLEAR
register since it has no functional use.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tobias Regnery [Thu, 25 Aug 2016 18:09:53 +0000 (20:09 +0200)]

alx: add tso support

Add tso/tso6 support to the alx driver.
Based on information from the downstream driver at github.com/qca/alx

Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Sat, 27 Aug 2016 04:06:58 +0000 (21:06 -0700)]

Merge branch 'mediatek-pdma-rx'

Nelson Chang says:

====================
net: ethernet: mediatek: modify to use the PDMA for Ethernet RX

This series have some modifications and refines to support Ethernet RX by the PDMA.

changes since v4:
- Remove the redundant OR operation in mtk_hw_init()

changes since v3:
- Add GDM hardware settings to send packets to PDMA for RX

changes since v2:
- Fix the bugs of PDMA cpu index and interrupt settings in mtk_poll_rx()

changes since v1:
- Modify to use the PDMA instead of the QDMA for Ethernet RX
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nelson Chang [Thu, 25 Aug 2016 17:09:43 +0000 (01:09 +0800)]

net: ethernet: mediatek: modify GDM to send packets to the PDMA for RX

Because we change to use the PDMA as the Ethernet RX DMA engine,
the patch modifies to set GDM to send packets to PDMA for RX.

Acked-by: John Crispin <john@phrozen.org>
Signed-off-by: Nelson Chang <nelson.chang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nelson Chang [Thu, 25 Aug 2016 17:09:42 +0000 (01:09 +0800)]

net: ethernet: mediatek: modify to use the PDMA instead of the QDMA for Ethernet RX

Because the PDMA has richer features than the QDMA for Ethernet RX
(such as multiple RX rings, HW LRO, etc.),
the patch modifies to use the PDMA to handle Ethernet RX.

Acked-by: John Crispin <john@phrozen.org>
Signed-off-by: Nelson Chang <nelson.chang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Fri, 26 Aug 2016 20:15:49 +0000 (13:15 -0700)]

Merge branch 'bcm_sf2-utilize-b53_common'

Florian Fainelli says:

====================
net: dsa: Make bcm_sf2 utilize b53_common

This patch series makes the bcm_sf2 driver utilize a large number of the core
functions offered by the b53_common driver since the SWITCH_CORE registers are
mostly register compatible with the switches driven by b53_common.

In order to accomplish that, we just override the dsa_driver_ops callbacks that
we need to. There are still integration specific logic from the bcm_sf2 that we
cannot absorb into b53_common because it is just not there, mostly in the area
of link management and power management, but most of the features are within
b53_common now: VLAN, FDB, bridge

Along the process, we also improve support for the BCM58xx SoCs, since those
also have the same version of the switching IP that 7445 has (for which bcm_sf2
was developed).

Changes in v3:

- rebase against 145dd5f9c88f6ee645662df0be003e8f04bdae93 ("net: flush the
softnet backlog in process context")

Changes in v2:

- rebased against "net: dsa: rename switch operations structure"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Fri, 26 Aug 2016 19:18:34 +0000 (12:18 -0700)]

net: dsa: bcm_sf2: Remove duplicate code

Now that we are using b53_common for most VLAN, FDB and bridge
operations, delete all the redundant code that we had in bcm_sf2.c to
keep only the integration specific logic that we have to deal with:
power management, link management and the external interfaces (RGMII,
MDIO).

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Fri, 26 Aug 2016 19:18:33 +0000 (12:18 -0700)]

net: dsa: bcm_sf2: Utilize core B53 driver when possible

The Broadcom Starfighter2 is almost entirely register compatible with
B53, yet for historical reasons came up first in the tree and is now
being updated to utilize b53_common.c to the fullest extent possible. A
few things need to be adjusted to allow that:

- the switch "core" registers currently operate on a 32-bit address,
  whereas b53 passes a page + reg pair to offset from, so we need to
  convert that, thankfully there is a generic formula to do that

- the link managemenent is not self contained with the B53/CORE register
  set, but instead is in the SWITCH_REG block which is part of the
  integration glue logic, so we keep that entirely custom here because
  this really is part of the existing bcm_sf2 implementation

- there are additional power management constraints on the port's
  memories that make us keep the port_enable/disable callbacks custom
  for now, also, we support tagging whereas b53_common does not support
  that yet

All the VLAN and bridge code is entirely identical though so, avoid
duplicating it. Other things will be migrated in the future like EEE and
possibly Wake-on-LAN.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Fri, 26 Aug 2016 19:18:32 +0000 (12:18 -0700)]

net: dsa: b53: Add JOIN_ALL_VLAN support

In order to migrate the bcm_sf2 driver over to the b53 driver for most
VLAN/FDB/bridge operations, we need to add support for the "join all
VLANs" register and behavior which allows us to make a given port join
all VLANs and avoid setting specific VLAN entries when it is leaving the
bridge.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Fri, 26 Aug 2016 19:18:31 +0000 (12:18 -0700)]

net: dsa: b53: Define SF2 MIB layout

The 58xx and 7445 chips use the Starfighter2 code, define its MIB layout
and introduce a helper function: is58xx() which checks for both of these
IDs for now.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Fri, 26 Aug 2016 19:18:30 +0000 (12:18 -0700)]

net: dsa: b53: Prepare to support 7445 switch

Allocate a device entry for the Broadcom BCM7445 integrated switch
currently backed by bcm_sf2.c. Since this is the latest generation, it
has 4 ARL entries, 4K VLANs and uses Port 8 for the CPU/IMP port.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Fri, 26 Aug 2016 19:18:29 +0000 (12:18 -0700)]

net: dsa: b53: Initialize ds->ops in b53_switch_alloc

In order to allow drivers to override specific dsa_switch_driver
callbacks, initialize ds->ops to b53_switch_ops earlier, which avoids
having to expose this structure to glue drivers.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Fri, 26 Aug 2016 20:13:37 +0000 (13:13 -0700)]

Merge branch 'mlxsw-fw-mark-offload'

Jiri Pirko says:

====================
mlxsw: Introduce support for offload forward mark

Ido says:
This patchset enables the forwarding of certain control packets by the
device instead of relying on the CPU to do the forwarding.

The first two patches simplify the current switchdev offload forward
infrastructure and make it usable for stacked devices. This is done by
moving the packet and port marking to the bridge driver instead of the
switch driver.

Patches 3-5 add the mlxsw specific bits to support the forward mark.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Thu, 25 Aug 2016 16:42:40 +0000 (18:42 +0200)]

mlxsw: spectrum: Mirror certain packets to CPU

Instead of trapping certain packets to the CPU and then relying on it to
flood them we can instead make the device mirror them.

The following packet types are mirrored:

* DHCP: Broadcast packets that should be flooded by the device, but also
trapped in case CPU is running the DHCP server.

* IGMP query: Multicast packets that need to be forwarded to other
bridge ports, but also trapped so that receiving netdev will be marked
as a router port by the bridge driver.

* ARP request: Broadcast packets that should be forwarded to other
bridge ports, but also trapped in case requested IP is of the local
machine.

* ARP response: Unicast packets that should be forwarded by the bridge
but also trapped in case response is directed at us.

Set the trap action of such packets to mirror and mark them using
'offload_fwd_mark' to prevent the bridge driver from forwarding them
itself.

Note that OSPF packets are also marked despite their action being trap.
The reason for this is that the device traps such packets in the
pipeline after they were already flooded.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Thu, 25 Aug 2016 16:42:39 +0000 (18:42 +0200)]

mlxsw: spectrum: Allow different traps to have different actions

Up until now we only trapped packets to CPU, but we are going to allow
some packets to be mirrored (trap & forward) to CPU.

Extend the Rx listener with 'action' member.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Thu, 25 Aug 2016 16:42:38 +0000 (18:42 +0200)]

mlxsw: spectrum: Simplify traps definition

Instead of copying & pasting the same struct initialization for every
Rx listener, just use a macro.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Thu, 25 Aug 2016 16:42:37 +0000 (18:42 +0200)]

bridge: switchdev: Add forward mark support for stacked devices

switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
port netdevs so that packets being flooded by the device won't be
flooded twice.

It works by assigning a unique identifier (the ifindex of the first
bridge port) to bridge ports sharing the same parent ID. This prevents
packets from being flooded twice by the same switch, but will flood
packets through bridge ports belonging to a different switch.

This method is problematic when stacked devices are taken into account,
such as VLANs. In such cases, a physical port netdev can have upper
devices being members in two different bridges, thus requiring two
different 'offload_fwd_mark's to be configured on the port netdev, which
is impossible.

The main problem is that packet and netdev marking is performed at the
physical netdev level, whereas flooding occurs between bridge ports,
which are not necessarily port netdevs.

Instead, packet and netdev marking should really be done in the bridge
driver with the switch driver only telling it which packets it already
forwarded. The bridge driver will mark such packets using the mark
assigned to the ingress bridge port and will prevent the packet from
being forwarded through any bridge port sharing the same mark (i.e.
having the same parent ID).

Remove the current switchdev 'offload_fwd_mark' implementation and
instead implement the proposed method. In addition, make rocker - the
sole user of the mark - use the proposed method.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Thu, 25 Aug 2016 16:42:36 +0000 (18:42 +0200)]

switchdev: Support parent ID comparison for stacked devices

switchdev_port_same_parent_id() currently expects port netdevs, but we
need it to support stacked devices in the next patch, so drop the
NO_RECURSE flag.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ivan Vecera [Thu, 25 Aug 2016 14:46:44 +0000 (16:46 +0200)]

devlink: remove unused priv_size

Remove unused and useless priv_size member from struct devlink_ops.

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Paolo Abeni [Thu, 25 Aug 2016 13:58:44 +0000 (15:58 +0200)]

net: flush the softnet backlog in process context

Currently in process_backlog(), the process_queue dequeuing is
performed with local IRQ disabled, to protect against
flush_backlog(), which runs in hard IRQ context.

This patch moves the flush operation to a work queue and runs the
callback with bottom half disabled to protect the process_queue
against dequeuing.
Since process_queue is now always manipulated in bottom half context,
the irq disable/enable pair around the dequeue operation are removed.

To keep the flush time as low as possible, the flush
works are scheduled on all online cpu simultaneously, using the
high priority work-queue and statically allocated, per cpu,
work structs.

Overall this change increases the time required to destroy a device
to improve slightly the packets reinjection performances.

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Thu, 25 Aug 2016 12:27:51 +0000 (14:27 +0200)]

net: bridge: export also pvid flag in the xstats flags

When I added support to export the vlan entry flags via xstats I forgot to
add support for the pvid since it is manually matched, so check if the
entry matches the vlan_group's pvid and set the flag appropriately.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Xin Long [Thu, 25 Aug 2016 05:21:49 +0000 (13:21 +0800)]

veth: sctp: add NETIF_F_SCTP_CRC to device features

Commit b17c706987fa ("loopback: sctp: add NETIF_F_SCTP_CSUM to device
features") added NETIF_F_SCTP_CRC to device features for lo device to
improve the performance of sctp over lo.

This patch is to add NETIF_F_SCTP_CRC to device features for veth to
improve the performance of sctp over veth.

Before this patch:
  ip netns exec cs_client netperf -H 10.167.12.2 -t SCTP_STREAM -- -m 10K
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec

  212992 212992  10240    10.00    1117.16

After this patch:
  ip netns exec cs_client netperf -H 10.167.12.2 -t SCTP_STREAM -- -m 10K
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec

  212992 212992  10240    10.20    1415.22

Tested-by: Li Shuang <tjlishuang@yeah.net>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Florian Fainelli [Wed, 24 Aug 2016 21:21:41 +0000 (14:21 -0700)]

net: systemport: Fix ordering in intrl2_*_mask_clear macro

Since we keep shadow copies of which interrupt sources are enabled
through the intrl2_*_mask_{set,clear} macros, make sure that the
ordering in which we do these two operations: update the copy, then
unmask the register is correct.

This is not currently a problem because we actually do not use them, but
we will in a subsequent patch optimizing register accesses, so better be
safe here.

Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Wed, 24 Aug 2016 17:23:34 +0000 (10:23 -0700)]

net: minor optimization in qdisc_qstats_cpu_drop()

per_cpu_inc() is faster (at least on x86) than per_cpu_ptr(xxx)++;

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Wed, 24 Aug 2016 16:01:23 +0000 (09:01 -0700)]

tcp: md5: add LINUX_MIB_TCPMD5FAILURE counter

Adds SNMP counter for drops caused by MD5 mismatches.

The current syslog might help, but a counter is more precise and helps
monitoring.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Wed, 24 Aug 2016 15:50:24 +0000 (08:50 -0700)]

tcp: md5: increment sk_drops on syn_recv state

TCP MD5 mismatches do increment sk_drops counter in all states but
SYN_RECV.

This is very unlikely to happen in the real world, but worth adding
to help diagnostics.

We increase the parent (listener) sk_drops.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Wei Yongjun [Wed, 24 Aug 2016 15:07:26 +0000 (15:07 +0000)]

vmxnet3: fix non static symbol warning

Fixes the following sparse warning:

drivers/net/vmxnet3/vmxnet3_drv.c:1645:1: warning:
symbol 'vmxnet3_rq_destroy_all_rxdataring' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Wei Yongjun [Wed, 24 Aug 2016 13:47:58 +0000 (13:47 +0000)]

ibmvnic: fix error return code in ibmvnic_probe()

Fix to return error code -ENOMEM from the dma_map_single error
handling case instead of 0, as done elsewhere in this function.

Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Wei Yongjun [Wed, 24 Aug 2016 13:50:03 +0000 (13:50 +0000)]

ibmvnic: convert to use simple_open()

Remove an open coded simple_open() function and replace file
operations references to the function with simple_open()
instead.

Generated by: scripts/coccinelle/api/simple_open.cocci

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lorenzo Colitti [Wed, 24 Aug 2016 06:46:26 +0000 (15:46 +0900)]

net: diag: allow socket bytecode filters to match socket marks

This allows a privileged process to filter by socket mark when
dumping sockets via INET_DIAG_BY_FAMILY. This is useful on
systems that use mark-based routing such as Android.

The ability to filter socket marks requires CAP_NET_ADMIN, which
is consistent with other privileged operations allowed by the
SOCK_DIAG interface such as the ability to destroy sockets and
the ability to inspect BPF filters attached to packet sockets.

Tested: https://android-review.googlesource.com/261350
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Lorenzo Colitti [Wed, 24 Aug 2016 06:46:25 +0000 (15:46 +0900)]

net: diag: slightly refactor the inet_diag_bc_audit error checks.

This simplifies the code a bit and also allows inet_diag_bc_audit
to send to userspace an error that isn't EINVAL.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Vivien Didelot [Tue, 23 Aug 2016 16:38:56 +0000 (12:38 -0400)]

net: dsa: rename switch operations structure

Now that the dsa_switch_driver structure contains only function pointers
as it is supposed to, rename it to the more appropriate dsa_switch_ops,
uniformly to any other operations structure in the kernel.

No functional changes here, basically just the result of something like:
s/dsa_switch_driver *drv/dsa_switch_ops *ops/g

However keep the {un,}register_switch_driver functions and their
dsa_switch_drivers list as is, since they represent the -- likely to be
deprecated soon -- legacy DSA registration framework.

In the meantime, also fix the following checks from checkpatch.pl to
make it happy with this patch:

    CHECK: Comparison to NULL could be written "!ops"
    #403: FILE: net/dsa/dsa.c:470:
    + if (ops == NULL) {

    CHECK: Comparison to NULL could be written "ds->ops->get_strings"
    #773: FILE: net/dsa/slave.c:697:
    + if (ds->ops->get_strings != NULL)

    CHECK: Comparison to NULL could be written "ds->ops->get_ethtool_stats"
    #824: FILE: net/dsa/slave.c:785:
    + if (ds->ops->get_ethtool_stats != NULL)

    CHECK: Comparison to NULL could be written "ds->ops->get_sset_count"
    #835: FILE: net/dsa/slave.c:798:
    + if (ds->ops->get_sset_count != NULL)

    total: 0 errors, 0 warnings, 4 checks, 784 lines checked

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yuval Mintz [Wed, 24 Aug 2016 10:27:19 +0000 (13:27 +0300)]

bnx2x: Don't flush multicast MACs

When ndo_set_rx_mode() is called for bnx2x, as part of process of
configuring the new MAC address filters [both unicast & multicast]
driver begins by flushing the existing configuration and then iterating
over the network device's list of addresses and configures those instead.

This has the side-effect of creating a short gap where traffic wouldn't
be properly classified, as no filters are configured in HW.
While for unicasts this is rather insignificant [as unicast MACs don't
frequently change while interface is actually running],
for multicast traffic it does pose an issue as there are multicast-based
networks where new multicast groups would constantly be removed and
added.

This patch tries to remedy this [at least for the newer adapters] -
Instead of flushing & reconfiguring all existing multicast filters,
the driver would instead create the approximate hash match that would
result from the required filters. It would then compare it against the
currently configured approximate hash match, and only add and remove the
delta between those.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 24 Aug 2016 16:43:44 +0000 (09:43 -0700)]

Merge tag 'rxrpc-rewrite-20160824-2' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Add better client conn management strategy

These two patches add a better client connection management strategy.  They
need to be applied on top of the just-posted fixes.

(1) Duplicate the connection list and separate out procfs iteration from
     garbage collection.  This is necessary for the next patch as with that
     client connections no longer appear on a single list and may not
     appear on a list at all - and really don't want to be exposed to the
     old garbage collector.

     (Note that client conns aren't left dangling, they're also in a tree
     rooted in the local endpoint so that they can be found by a user
     wanting to make a new client call.  Service conns do not appear in
     this tree.)

(2) Implement a better lifetime management and garbage collection strategy
     for client connections.

     In this, a client connection can be in one of five cache states
     (inactive, waiting, active, culled and idle).  Limits are set on the
     number of client conns that may be active at any one time and makes
     users wait if they want to start a new call when there isn't capacity
     available.

     To make capacity available, active and idle connections can be culled,
     after a short delay (to allow for retransmission).  The delay is
     reduced if the capacity exceeds a tunable threshold.

     If there is spare capacity, client conns are permitted to hang around
     a fair bit longer (tunable) so as to allow reuse of negotiated
     security contexts.

     After this patch, the client conn strategy is separate from that of
     service conns (which continues to use the old code for the moment).

     This difference in strategy is because the client side retains control
     over when it allows a connection to become active, whereas the service
     side has no control over when it sees a new connection or a new call
     on an old connection.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 24 Aug 2016 16:42:57 +0000 (09:42 -0700)]

Merge tag 'rxrpc-rewrite-20160824-1' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: More fixes

Here are a couple of fix patches:

(1) Fix the conn-based retransmission patch posted yesterday.  This breaks
     if it actually has to retransmit.  However, it seems the likelihood of
     this happening is really low, despite the server I'm testing against
     being located >3000 miles away, and sometime of the time it's handled
     in the call background processor before we manage to disconnect the
     call - hence why I didn't spot it.

(2) /proc/net/rxrpc_calls can cause a crash it accessed whilst a call is
     being torn down.  The window of opportunity is pretty small, however,
     as calls don't stay in this state for long.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 24 Aug 2016 16:41:13 +0000 (09:41 -0700)]

Merge branch 'mlxsw-fdb-learning-offload'

Jiri Pirko says:

====================
mlxsw: Offload FDB learning configuration

Ido says:
This patchset addresses two long standing issues in the mlxsw driver
concerning FDB learning.

Patch 1 limits the number of FDB records processed by the driver in a
single session. This is useful in situations in which many new records
need to be processed, thereby causing the RTNL mutex to be held for
long periods of time.

Patches 2-6 offload the learning configuration (on / off) of bridge
ports to the device instead of having the driver decide whether a
record needs to be learned or not.

The last patch is fallout and removes configuration no longer necessary
after the first patches are applied.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Wed, 24 Aug 2016 10:00:29 +0000 (12:00 +0200)]

mlxsw: spectrum: Don't set learning when creating vPorts

Before commit 99724c18fc66 ("mlxsw: spectrum: Introduce support for
router interfaces") we used to assign vFIDs to the created vPorts. Since
these vPorts were used for slow path traffic we had to disable learning
for them, as it doesn't make sense to have it enabled.

This is no longer the case and now vPorts are either used for router
interfaces (for which learning is disabled by the firmware) or bridge
ports (for which learning is explicitly enabled by the driver).

Therefore, we can remove the learning configuration upon vPort creation.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Wed, 24 Aug 2016 10:00:28 +0000 (12:00 +0200)]

mlxsw: spectrum: Remove unnecessary check in FDB processing

We now offload the learning configuration to the device and don't rely
on the driver to decide whether to learn the FDB record, so remove the
check.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Wed, 24 Aug 2016 10:00:27 +0000 (12:00 +0200)]

mlxsw: spectrum: Offload learning to the switch ASIC

Up until now we simply stored the learning configuration of a bridge
port in the driver and decided whether to learn a new FDB record based
on this value.

However, this is sub-optimal in cases where learning is disabled on the
bridge port, as the device repeatedly generates learning notifications
for the same record.

Instead, offload the learning configuration to the device, thereby
preventing it from generating notifications when learning is disabled.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Wed, 24 Aug 2016 10:00:26 +0000 (12:00 +0200)]

mlxsw: spectrum: Configure learning for VLAN-aware bridge port

We are going to prevent the device from generating learning
notifications for a port that was configured with learning disabled.

Since learning configuration is done per {Port, VID} we need to apply
the port's learning configuration for any VID that is added to the
bridge port's VLAN filter list.

When a VID is added to the VLAN filter list of a VLAN-aware bridge port,
configure the {Port, VID} learning status according to the port's
configuration. When the VID is removed, disable learning for the {Port,
VID}.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Wed, 24 Aug 2016 10:00:25 +0000 (12:00 +0200)]

mlxsw: spectrum: Don't abort on first error when removing VLANs

When removing VLANs from the VLAN-aware bridge we shouldn't abort on the
first error, as we'll otherwise have resources that will never be freed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Wed, 24 Aug 2016 10:00:24 +0000 (12:00 +0200)]

mlxsw: spectrum: Make VLAN deletion function symmetric

Commit 05978481e77e ("mlxsw: spectrum: Create PVID vPort before
registering netdevice") removed __mlxsw_sp_port_vlans_del() from the
init sequence of the driver, which forced it to be non-symmetric with
regards to __mlxsw_sp_port_vlans_add().

Make both functions symmetric as the constraint no longer exists.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ido Schimmel [Wed, 24 Aug 2016 10:00:23 +0000 (12:00 +0200)]

mlxsw: spectrum: Limit number of FDB records per learning session

Up until now a learning session ended whenever the number of queried
records was zero. This turned out to be problematic in situations where
a large number of MACs (48K) had to be processed by the switch driver,
as RTNL mutex is held during the learning session.

Instead, limit the number of FDB records that can be processed in a
session to 64. This means that every time the device is queried for
learning notifications (currently, every 100ms), up to 64 records will
be processed by the switch driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 24 Aug 2016 16:35:35 +0000 (09:35 -0700)]

Merge tag 'shared-for-4.9-2' of git://git./linux/kernel/git/leon/linux-rdma

Saeed Mahameed says:

====================
Mellanox mlx5 core driver updates 2016-08-24

This series contains some low level and API updates for mlx5 core
driver interface and mlx5_ifc.h, plus mlx5 LAG core driver support,
to be shared as base code for net-next and rdma mlx5 4.9 submissions.

From Alex and Artemy, Update mlx5_ifc for modify RQ and XRC bits.

From Noa, Expose mlx5 link modes so they can be used in RDMA tree for rdma tools.

From Aviv, LAG support needed for RDMA.
    - Add needed hardware structures, layouts and interface
    - mlx5 core driver LAG implementation
    - Introduce mlx5 core driver LAG API for mlx5_ib

From Maor, add two low level patches for mlx5 hardware sniffer QP
infrastructure bits and capabilities, plus added the namespace for sniffer
steering tables.  Needed for RDMA subtree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David Howells [Wed, 24 Aug 2016 06:30:52 +0000 (07:30 +0100)]

rxrpc: Improve management and caching of client connection objects

Improve the management and caching of client rxrpc connection objects.
From this point, client connections will be managed separately from service
connections because AF_RXRPC controls the creation and re-use of client
connections but doesn't have that luxury with service connections.

Further, there will be limits on the numbers of client connections that may
be live on a machine.  No direct restriction will be placed on the number
of client calls, excepting that each client connection can support a
maximum of four concurrent calls.

Note that, for a number of reasons, we don't want to simply discard a
client connection as soon as the last call is apparently finished:

(1) Security is negotiated per-connection and the context is then shared
     between all calls on that connection.  The context can be negotiated
     again if the connection lapses, but that involves holding up calls
     whilst at least two packets are exchanged and various crypto bits are
     performed - so we'd ideally like to cache it for a little while at
     least.

(2) If a packet goes astray, we will need to retransmit a final ACK or
     ABORT packet.  To make this work, we need to keep around the
     connection details for a little while.

(3) The locally held structures represent some amount of setup time, to be
     weighed against their occupation of memory when idle.

To this end, the client connection cache is managed by a state machine on
each connection.  There are five states:

(1) INACTIVE - The connection is not held in any list and may not have
     been exposed to the world.  If it has been previously exposed, it was
     discarded from the idle list after expiring.

(2) WAITING - The connection is waiting for the number of client conns to
     drop below the maximum capacity.  Calls may be in progress upon it
     from when it was active and got culled.

     The connection is on the rxrpc_waiting_client_conns list which is kept
     in to-be-granted order.  Culled conns with waiters go to the back of
     the queue just like new conns.

(3) ACTIVE - The connection has at least one call in progress upon it, it
     may freely grant available channels to new calls and calls may be
     waiting on it for channels to become available.

     The connection is on the rxrpc_active_client_conns list which is kept
     in activation order for culling purposes.

(4) CULLED - The connection got summarily culled to try and free up
     capacity.  Calls currently in progress on the connection are allowed
     to continue, but new calls will have to wait.  There can be no waiters
     in this state - the conn would have to go to the WAITING state
     instead.

(5) IDLE - The connection has no calls in progress upon it and must have
     been exposed to the world (ie. the EXPOSED flag must be set).  When it
     expires, the EXPOSED flag is cleared and the connection transitions to
     the INACTIVE state.

     The connection is on the rxrpc_idle_client_conns list which is kept in
     order of how soon they'll expire.

A connection in the ACTIVE or CULLED state must have at least one active
call upon it; if in the WAITING state it may have active calls upon it;
other states may not have active calls.

As long as a connection remains active and doesn't get culled, it may
continue to process calls - even if there are connections on the wait
queue.  This simplifies things a bit and reduces the amount of checking we
need do.

There are a couple flags of relevance to the cache:

(1) EXPOSED - The connection ID got exposed to the world.  If this flag is
     set, an extra ref is added to the connection preventing it from being
     reaped when it has no calls outstanding.  This flag is cleared and the
     ref dropped when a conn is discarded from the idle list.

(2) DONT_REUSE - The connection should be discarded as soon as possible and
     should not be reused.

This commit also provides a number of new settings:

(*) /proc/net/rxrpc/max_client_conns

     The maximum number of live client connections.  Above this number, new
     connections get added to the wait list and must wait for an active
     conn to be culled.  Culled connections can be reused, but they will go
     to the back of the wait list and have to wait.

(*) /proc/net/rxrpc/reap_client_conns

     If the number of desired connections exceeds the maximum above, the
     active connection list will be culled until there are only this many
     left in it.

(*) /proc/net/rxrpc/idle_conn_expiry

     The normal expiry time for a client connection, provided there are
     fewer than reap_client_conns of them around.

(*) /proc/net/rxrpc/idle_conn_fast_expiry

     The expedited expiry time, used when there are more than
     reap_client_conns of them around.

Note that I combined the Tx wait queue with the channel grant wait queue to
save space as only one of these should be in use at once.

Note also that, for the moment, the service connection cache still uses the
old connection management code.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 24 Aug 2016 06:30:52 +0000 (07:30 +0100)]

rxrpc: Dup the main conn list for the proc interface

The main connection list is used for two independent purposes: primarily it
is used to find connections to reap and secondarily it is used to list
connections in procfs.

Split the procfs list out from the reap list. This allows us to stop using
the reap list for client connections when they acquire a separate
management strategy from service collections.

The client connections will not be on a management single list, and sometimes
won't be on a management list at all. This doesn't leave them floating,
however, as they will also be on an rb-tree rooted on the socket so that the
socket can find them to dispatch calls.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 24 Aug 2016 13:31:43 +0000 (14:31 +0100)]

rxrpc: Make /proc/net/rxrpc_calls safer

Make /proc/net/rxrpc_calls safer by stashing a copy of the peer pointer in
the rxrpc_call struct and checking in the show routine that the peer
pointer, the socket pointer and the local pointer obtained from the socket
pointer aren't NULL before we use them.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 24 Aug 2016 12:06:14 +0000 (13:06 +0100)]

rxrpc: Fix conn-based retransmit

If a duplicate packet comes in for a call that has just completed on a
connection's channel then there will be an oops in the data_ready handler
because it tries to examine the connection struct via a call struct (which
we don't have - the pointer is unset).

Since the connection struct pointer is available to us, go direct instead.

Also, the ACK packet to be retransmitted needs three octets of padding
between the soft ack list and the ackinfo.

Fixes: 18bfeba50dfd0c8ee420396f2570f16a0bdbd7de ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David S. Miller [Wed, 24 Aug 2016 06:25:37 +0000 (23:25 -0700)]

Merge branch 'remove-clear_sk'

Eric Dumazet says:

====================
net: remove clear_sk() method

Since IPv6 socket lookups no longer dereference pinet6 pointer
and UDP lost SLAB_DESTROY_BY_RCU special rules, we no longer
need special clear_sk() methods.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Tue, 23 Aug 2016 18:39:29 +0000 (11:39 -0700)]

net: remove clear_sk() method

We no longer use this handler, we can delete it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Tue, 23 Aug 2016 18:39:28 +0000 (11:39 -0700)]

ipv6: tcp: get rid of tcp_v6_clear_sk()

Now RCU lookups of IPv6 TCP sockets no longer dereference pinet6,
we do not need tcp_v6_clear_sk() anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Tue, 23 Aug 2016 18:39:27 +0000 (11:39 -0700)]

udp: get rid of sk_prot_clear_portaddr_nulls()

Since we no longer use SLAB_DESTROY_BY_RCU for UDP,
we do not need sk_prot_clear_portaddr_nulls() helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Eric Dumazet [Tue, 23 Aug 2016 18:39:26 +0000 (11:39 -0700)]

ipv6: udp: remove udp_v6_clear_sk()

Now RCU lookups of ipv6 udp sockets no longer dereference
pinet6 field, we can get rid of udp_v6_clear_sk() helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David Ahern [Wed, 24 Aug 2016 04:06:33 +0000 (21:06 -0700)]

net: diag: support SOCK_DESTROY for UDP sockets

This implements SOCK_DESTROY for UDP sockets similar to what was done
for TCP with commit c1e64e298b8ca ("net: diag: Support destroying TCP
sockets.") A process with a UDP socket targeted for destroy is awakened
and recvmsg fails with ECONNABORTED.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Wei Yongjun [Tue, 23 Aug 2016 23:01:02 +0000 (23:01 +0000)]

tipc: use kfree_skb() instead of kfree()

Use kfree_skb() instead of kfree() to free sk_buff.

Fixes: 0d051bf93c06 ("tipc: make bearer packet filtering generic")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Rami Rosen [Tue, 23 Aug 2016 17:20:17 +0000 (20:20 +0300)]

net: ena: change the return type of ena_set_push_mode() to be void.

This patch changes the return type of ena_set_push_mode() to be void,
as it always returns 0.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 24 Aug 2016 00:20:59 +0000 (17:20 -0700)]

Merge tag 'rxrpc-rewrite-20160823-2' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Miscellaneous improvements

Here are some improvements that are part of the AF_RXRPC rewrite.  They
need to be applied on top of the just posted cleanups.

(1) Set the connection expiry on the connection becoming idle when its
     last currently active call completes rather than each time put is
     called.

     This means that the connection isn't held open by retransmissions,
     pings and duplicate packets.  Future patches will limit the number of
     live connections that the kernel will support, so making sure that old
     connections don't overstay their welcome is necessary.

(2) Calculate packet serial skew in the UDP data_ready callback rather
     than in the call processor on a work queue.  Deferring it like this
     causes the skew to be elevated by further packets coming in before we
     get to make the calculation.

(3) Move retransmission of the terminal ACK or ABORT packet for a
     connection to the connection processor, using the terminal state
     cached in the rxrpc_connection struct.  This means that once last_call
     is set in a channel to the current call's ID, no more packets will be
     routed to that rxrpc_call struct.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 24 Aug 2016 00:19:59 +0000 (17:19 -0700)]

Merge tag 'rxrpc-rewrite-20160823-1' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Cleanups

Here are some cleanups for the AF_RXRPC rewrite:

(1) Remove some unused bits.

(2) Call releasing on socket closure is now done in the order in which
     calls progress through the phases so that we don't miss a call
     actively moving list.

(3) The rxrpc_call struct's channel number field is redundant and replaced
     with accesses to the masked off cid field instead.

(4) Use a tracepoint for socket buffer accounting rather than printks.

     Unfortunately, since this would require currently non-existend
     arch-specific help to divine the current instruction location, the
     accounting functions are moved out of line so that
     __builtin_return_address() can be used.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Wei Yongjun [Tue, 23 Aug 2016 15:11:03 +0000 (15:11 +0000)]

net: hns: remove redundant dev_err call in hns_dsaf_get_cfg()

There is a error message within devm_ioremap_resource
already, so remove the dev_err call to avoid redundant
error message.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Wei Yongjun [Tue, 23 Aug 2016 15:09:49 +0000 (15:09 +0000)]

cxgb4: Remove unused including <linux/version.h>

Remove including <linux/version.h> that don't need it.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Wei Yongjun [Tue, 23 Aug 2016 15:06:05 +0000 (15:06 +0000)]

net: phy: xgmiitorgmii: Fix non static symbol warning

Fixes the following sparse warning:

drivers/net/phy/xilinx_gmii2rgmii.c:61:5: warning:
symbol 'xgmiitorgmii_probe' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Sudarsana Reddy Kalluru [Tue, 23 Aug 2016 14:56:55 +0000 (10:56 -0400)]

qede: Add support for Tx/Rx-only queues.

Add provision for configuring the fastpath queues with Tx (or Rx) only
functionality.

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 23 Aug 2016 11:14:31 +0000 (13:14 +0200)]

net: rtnetlink: Don't export empty RTAX_FEATURES

Since the features bit field has bits for internal only use as well, it
may happen that the kernel exports RTAX_FEATURES attribute with zero
value which is pointless.

Fix this by making sure the attribute is added only if the exported
value is non-zero.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Hariprasad Shenai [Tue, 23 Aug 2016 06:05:32 +0000 (11:35 +0530)]

cxgb4: Fix issue while re-registering VF mgmt netdev

When we disable SRIOV, we used to unregister the netdev but wasn't
freed. But next time when the same netdev is registered, since the state
was in 'NETREG_UNREGISTERED', we used to hit BUG_ON in register_netdevice,
where it expects the state to be 'NETREG_UNINITIALIZED'.

Alloc netdev and register them while configuring SRIOV, and free them
when SRIOV is disabled. Also added a new function to setup ethernet
properties instead of using ether_setup. Set carrier off by default,
since we don't have to do any transmit on the interface.

Fixes: 7829451c695e ("cxgb4: Add control net_device for configuring PCIe VF")
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

John Crispins staging tree

RSS Atom