openwrt/staging/blogic.git
6 years agoMerge tag 'mac80211-next-for-davem-2018-03-29' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Thu, 29 Mar 2018 20:23:26 +0000 (16:23 -0400)]
Merge tag 'mac80211-next-for-davem-2018-03-29' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
We have a fair number of patches, but many of them are from the
first bullet here:
 * EAPoL-over-nl80211 from Denis - this will let us fix
   some long-standing issues with bridging, races with
   encryption and more
 * DFS offload support from the qtnfmac folks
 * regulatory database changes for the new ETSI adaptivity
   requirements
 * various other fixes and small enhancements
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'dsa-Add-ATU-VTU-statistics'
David S. Miller [Thu, 29 Mar 2018 19:04:22 +0000 (15:04 -0400)]
Merge branch 'dsa-Add-ATU-VTU-statistics'

Andrew Lunn says:

====================
Add ATU/VTU statistics

Previous patches have added basic support for Address Translation Unit
and VLAN translation Unit violation interrupts. Add statistics
counters for when these occur, which can be accessed using
ethtool. Downgrade one of the particularly spammy warnings from VTU
violations to debug only, now that we have a counter for it.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Make VTU miss violations less spammy
Andrew Lunn [Wed, 28 Mar 2018 21:50:29 +0000 (23:50 +0200)]
net: dsa: mv88e6xxx: Make VTU miss violations less spammy

VTU miss violations can happen under normal conditions. Don't spam the
kernel log, downgrade the output to debug level only. The statistics
counter will indicate it is happening, if anybody not debugging is
interested.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: dsa: mv88e6xxx: Keep ATU/VTU violation statistics
Andrew Lunn [Wed, 28 Mar 2018 21:50:28 +0000 (23:50 +0200)]
net: dsa: mv88e6xxx: Keep ATU/VTU violation statistics

Count the numbers of various ATU and VTU violation statistics and
return them as part of the ethtool -S statistics.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: fix unused lable warning
Arnd Bergmann [Wed, 28 Mar 2018 14:14:56 +0000 (16:14 +0200)]
sctp: fix unused lable warning

The proc file cleanup left a label possibly unused:

net/sctp/protocol.c: In function 'sctp_defaults_init':
net/sctp/protocol.c:1304:1: error: label 'err_init_proc' defined but not used [-Werror=unused-label]

This adds an #ifdef around it to match the respective 'goto'.

Fixes: d47d08c8ca05 ("sctp: use proc_remove_subtree()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: cavium: use module_pci_driver to simplify the code
Wei Yongjun [Wed, 28 Mar 2018 12:51:55 +0000 (12:51 +0000)]
net: cavium: use module_pci_driver to simplify the code

Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bcmgenet: return NULL instead of plain integer
Wei Yongjun [Wed, 28 Mar 2018 12:51:19 +0000 (12:51 +0000)]
net: bcmgenet: return NULL instead of plain integer

Fixes the following sparse warning:

drivers/net/ethernet/broadcom/genet/bcmgenet.c:1351:16: warning:
 Using plain integer as NULL pointer

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotest_bpf: Fix NULL vs IS_ERR() check in test_skb_segment()
Dan Carpenter [Wed, 28 Mar 2018 11:48:36 +0000 (14:48 +0300)]
test_bpf: Fix NULL vs IS_ERR() check in test_skb_segment()

The skb_segment() function returns error pointers on error.  It never
returns NULL.

Fixes: 76db8087c4c9 ("net: bpf: add a test for skb_segment in test_bpf module")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosfp: allow cotsworks modules
Russell King [Wed, 28 Mar 2018 10:18:25 +0000 (11:18 +0100)]
sfp: allow cotsworks modules

Cotsworks modules fail the checksums - it appears that Cotsworks
reprograms the EEPROM at the end of production with the final product
information (serial, date code, and exact part number for module
options) and fails to update the checksum.

Work around this by detecting the Cotsworks name in the manufacturer
field, and reducing the checksum failures to warnings rather than a
hard error.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'qed-flash-upgrade-support'
David S. Miller [Thu, 29 Mar 2018 18:29:56 +0000 (14:29 -0400)]
Merge branch 'qed-flash-upgrade-support'

Sudarsana Reddy Kalluru says:

====================
qed*: Flash upgrade support.

The patch series adds adapter flash upgrade support for qed/qede drivers.

Please consider applying it to net-next branch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqede: Ethtool flash update support.
Sudarsana Reddy Kalluru [Wed, 28 Mar 2018 12:14:23 +0000 (05:14 -0700)]
qede: Ethtool flash update support.

The patch adds ethtool callback implementation for flash update.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: Adapter flash update support.
Sudarsana Reddy Kalluru [Wed, 28 Mar 2018 12:14:22 +0000 (05:14 -0700)]
qed: Adapter flash update support.

This patch adds the required driver support for updating the flash or
non volatile memory of the adapter. At highlevel, flash upgrade comprises
of reading the flash images from the input file, validating the images and
writing them to the respective paritions.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: Add APIs for flash access.
Sudarsana Reddy Kalluru [Wed, 28 Mar 2018 12:14:21 +0000 (05:14 -0700)]
qed: Add APIs for flash access.

This patch adds APIs for flash access.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: Fix PTT entry leak in the selftest error flow.
Sudarsana Reddy Kalluru [Wed, 28 Mar 2018 12:14:20 +0000 (05:14 -0700)]
qed: Fix PTT entry leak in the selftest error flow.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed: Populate nvm image attribute shadow.
Sudarsana Reddy Kalluru [Wed, 28 Mar 2018 12:14:19 +0000 (05:14 -0700)]
qed: Populate nvm image attribute shadow.

This patch adds support for populating the flash image attributes.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoqed*: Utilize FW 8.33.11.0
Michal Kalderon [Wed, 28 Mar 2018 08:42:16 +0000 (11:42 +0300)]
qed*: Utilize FW 8.33.11.0

This FW contains several fixes and features

RDMA Features
- SRQ support
- XRC support
- Memory window support
- RDMA low latency queue support
- RDMA bonding support

RDMA bug fixes
- RDMA remote invalidate during retransmit fix
- iWARP MPA connect interop issue with RTR fix
- iWARP Legacy DPM support
- Fix MPA reject flow
- iWARP error handling
- RQ WQE validation checks

MISC
- Fix some HSI types endianity
- New Restriction: vlan insertion in core_tx_bd_data can't be set
  for LB packets

ETH
- HW QoS offload support
- Fix vlan, dcb and sriov flow of VF sending a packet with
  inband VLAN tag instead of default VLAN
- Allow GRE version 1 offloads in RX flow
- Allow VXLAN steering

iSCSI / FcoE
- Fix bd availability checking flow
- Support 256th sge proerly in iscsi/fcoe retransmit
- Performance improvement
- Fix handle iSCSI command arrival with AHS and with immediate
- Fix ipv6 traffic class configuration

DEBUG
- Update debug utilities

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: export ip6 fragments sysctl to unprivileged users
Eric Dumazet [Wed, 28 Mar 2018 02:49:42 +0000 (19:49 -0700)]
ipv6: export ip6 fragments sysctl to unprivileged users

IPv4 was changed in commit 52a773d645e9 ("net: Export ip fragment
sysctl to unprivileged users")

The only sysctl that is not per-netns is not used :
ip6frag_secret_interval

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: Prioritize control messages
Intiyaz Basha [Wed, 28 Mar 2018 02:25:18 +0000 (19:25 -0700)]
liquidio: Prioritize control messages

During heavy tx traffic, control messages (sent by liquidio driver to NIC
firmware) sometimes do not get processed in a timely manner.  Reason is:
the low-level metadata of control messages and that of egress network
packets indicate that they have the same priority.

Fix it by setting a higher priority for control messages through the new
ctrl_qpg field in the oct_txpciq struct.  It is the NIC firmware that does
the actual setting of priority by writing to the new ctrl_qpg field; the
host driver treats that value as opaque and just assigns it to pki_ih3->qpg

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-Allow-FIB-notifiers-to-fail-add-and-replace'
David S. Miller [Thu, 29 Mar 2018 18:10:31 +0000 (14:10 -0400)]
Merge branch 'net-Allow-FIB-notifiers-to-fail-add-and-replace'

David Ahern says:

====================
net: Allow FIB notifiers to fail add and replace

I wanted to revisit how resource overload is handled for hardware offload
of FIB entries and rules. At the moment, the in-kernel fib notifier can
tell a driver about a route or rule add, replace, and delete, but the
notifier can not affect the action. Specifically, in the case of mlxsw
if a route or rule add is going to overflow the ASIC resources the only
recourse is to abort hardware offload. Aborting offload is akin to taking
down the switch as the path from data plane to the control plane simply
can not support the traffic bandwidth of the front panel ports. Further,
the current state of FIB notifiers is inconsistent with other resources
where a driver can affect a user request - e.g., enslavement of a port
into a bridge or a VRF.

As a result of the work done over the past 3+ years, I believe we are
at a point where we can bring consistency to the stack and offloads,
and reliably allow the FIB notifiers to fail a request, pushing an error
along with a suitable error message back to the user. Rather than
aborting offload when the switch is out of resources, userspace is simply
prevented from adding more routes and has a clear indication of why.

This set does not resolve the corner case where rules or routes not
supported by the device are installed prior to the driver getting loaded
and registering for FIB notifications. In that case, hardware offload has
not been established and it can refuse to offload anything, sending
errors back to userspace via extack. Since conceptually the driver owns
the netdevices associated with its asic, this corner case mainly applies
to unsupported rules and any races during the bringup phase.

Patch 1 fixes call_fib_notifiers to extract the errno from the encoded
response from handlers.

Patches 2-5 allow the call to call_fib_notifiers to fail the add or
replace of a route or rule.

Patch 6 adds a simple resource controller to netdevsim to illustrate
how a FIB resource controller can limit the number of route entries.

Changes since RFC
- correct return code for call_fib_notifier
- dropped patch 6 exporting devlink symbols
- limited example resource controller to init_net only
- updated Kconfig for netdevsim to use MAY_USE_DEVLINK
- updated cover letter regarding startup case noted by Ido
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonetdevsim: Add simple FIB resource controller via devlink
David Ahern [Wed, 28 Mar 2018 01:22:00 +0000 (18:22 -0700)]
netdevsim: Add simple FIB resource controller via devlink

Add devlink support to netdevsim and use it to implement a simple,
profile based resource controller. Only one controller is needed
per namespace, so the first netdevsim netdevice in a namespace
registers with devlink. If that device is deleted, the resource
settings are deleted.

The resource controller allows a user to limit the number of IPv4 and
IPv6 FIB entries and FIB rules. The resource paths are:
    /IPv4
    /IPv4/fib
    /IPv4/fib-rules
    /IPv6
    /IPv6/fib
    /IPv6/fib-rules

The IPv4 and IPv6 top level resources are unlimited in size and can not
be changed. From there, the number of FIB entries and FIB rule entries
are unlimited by default. A user can specify a limit for the fib and
fib-rules resources:

    $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 96
    $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib-rules size 16
    $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib size 64
    $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib-rules size 16
    $ devlink dev reload netdevsim/netdevsim0

such that the number of rules or routes is limited (96 ipv4 routes in the
example above):
    $ for n in $(seq 1 32); do ip ro add 10.99.$n.0/24 dev eth1; done
    Error: netdevsim: Exceeded number of supported fib entries.

    $ devlink resource show netdevsim/netdevsim0
    netdevsim/netdevsim0:
      name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables non
        resources:
          name fib size 96 occ 96 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables
    ...

With this template in place for resource management, it is fairly trivial
to extend and shows one way to implement a simple counter based resource
controller typical of network profiles.

Currently, devlink only supports initial namespace. Code is in place to
adapt netdevsim to a per namespace controller once the network namespace
issues are resolved.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv6: Move call_fib6_entry_notifiers up for route adds
David Ahern [Wed, 28 Mar 2018 01:21:59 +0000 (18:21 -0700)]
net/ipv6: Move call_fib6_entry_notifiers up for route adds

Move call to call_fib6_entry_notifiers for new IPv6 routes to right
before the insertion into the FIB. At this point notifier handlers can
decide the fate of the new route with a clean path to delete the
potential new entry if the notifier returns non-0.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv4: Allow notifier to fail route replace
David Ahern [Wed, 28 Mar 2018 01:21:58 +0000 (18:21 -0700)]
net/ipv4: Allow notifier to fail route replace

Add checking to call to call_fib_entry_notifiers for IPv4 route replace.
Allows a notifier handler to fail the replace.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/ipv4: Move call_fib_entry_notifiers up for new routes
David Ahern [Wed, 28 Mar 2018 01:21:57 +0000 (18:21 -0700)]
net/ipv4: Move call_fib_entry_notifiers up for new routes

Move call to call_fib_entry_notifiers for new IPv4 routes to right
before the call to fib_insert_alias. At this point the only remaining
failure path is memory allocations in fib_insert_node. Handle that
very unlikely failure with a call to call_fib_entry_notifiers to
tell drivers about it.

At this point notifier handlers can decide the fate of the new route
with a clean path to delete the potential new entry if the notifier
returns non-0.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Move call_fib_rule_notifiers up in fib_nl_newrule
David Ahern [Wed, 28 Mar 2018 01:21:56 +0000 (18:21 -0700)]
net: Move call_fib_rule_notifiers up in fib_nl_newrule

Move call_fib_rule_notifiers up in fib_nl_newrule to the point right
before the rule is inserted into the list. At this point there are no
more failure paths within the core rule code, so if the notifier
does not fail then the rule will be inserted into the list.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Fix fib notifer to return errno
David Ahern [Wed, 28 Mar 2018 01:21:55 +0000 (18:21 -0700)]
net: Fix fib notifer to return errno

Notifier handlers use notifier_from_errno to convert any potential error
to an encoded format. As a consequence the other side, call_fib_notifier{s}
in this case, needs to use notifier_to_errno to return the error from
the handler back to its caller.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'mlx5-updates-2018-03-27' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Thu, 29 Mar 2018 18:01:40 +0000 (14:01 -0400)]
Merge tag 'mlx5-updates-2018-03-27' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-03-27 (Misc updates & SQ recovery)

This series contains Misc updates and cleanups for mlx5e rx path
and SQ recovery feature for tx path.

From Tariq: (RX updates)
    - Disable Striding RQ when PCI devices, striding RQ limits the use
      of CQE compression feature, which is very critical for slow PCI
      devices performance, in this change we will prefer CQE compression
      over Striding RQ only on specific "slow"  PCIe links.
    - RX path cleanups
    - Private flag to enable/disable striding RQ

From Eran: (TX fast recovery)
    - TX timeout logic improvements, fast SQ recovery and TX error reporting
      if a HW error occurs while transmitting on a specific SQ, the driver will
      ignore such error and will wait for TX timeout to occur and reset all
      the rings. Instead, the current series improves the resiliency for such
      HW errors by detecting TX completions with errors, which will report them
      and perform a fast recover for the specific faulty SQ even before a TX
      timeout is detected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'Introduce-net_rwsem-to-protect-net_namespace_list'
David S. Miller [Thu, 29 Mar 2018 17:47:54 +0000 (13:47 -0400)]
Merge branch 'Introduce-net_rwsem-to-protect-net_namespace_list'

Kirill Tkhai says:

====================
Introduce net_rwsem to protect net_namespace_list

The series introduces fine grained rw_semaphore, which will be used
instead of rtnl_lock() to protect net_namespace_list.

This improves scalability and allows to do non-exclusive sleepable
iteration for_each_net(), which is enough for most cases.

scripts/get_maintainer.pl gives enormous list of people, and I add
all to CC.

Note, that this patch is independent of "Close race between
{un, }register_netdevice_notifier and pernet_operations":
https://patchwork.ozlabs.org/project/netdev/list/?series=36495

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Remove rtnl_lock() in nf_ct_iterate_destroy()
Kirill Tkhai [Thu, 29 Mar 2018 16:21:20 +0000 (19:21 +0300)]
net: Remove rtnl_lock() in nf_ct_iterate_destroy()

rtnl_lock() doesn't protect net::ct::count,
and it's not needed for__nf_ct_unconfirmed_destroy()
and for nf_queue_nf_hook_drop().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoovs: Remove rtnl_lock() from ovs_exit_net()
Kirill Tkhai [Thu, 29 Mar 2018 16:21:09 +0000 (19:21 +0300)]
ovs: Remove rtnl_lock() from ovs_exit_net()

Here we iterate for_each_net() and removes
vport from alive net to the exiting net.

ovs_net::dps are protected by ovs_mutex(),
and the others, who change it (ovs_dp_cmd_new(),
__dp_destroy()) also take it.
The same with datapath::ports list.

So, we remove rtnl_lock() here.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosecurity: Remove rtnl_lock() in selinux_xfrm_notify_policyload()
Kirill Tkhai [Thu, 29 Mar 2018 16:20:56 +0000 (19:20 +0300)]
security: Remove rtnl_lock() in selinux_xfrm_notify_policyload()

rt_genid_bump_all() consists of ipv4 and ipv6 part.
ipv4 part is incrementing of net::ipv4::rt_genid,
and I see many places, where it's read without rtnl_lock().

ipv6 part calls __fib6_clean_all(), and it's also
called without rtnl_lock() in other places.

So, rtnl_lock() here was used to iterate net_namespace_list only,
and we can remove it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Don't take rtnl_lock() in wireless_nlevent_flush()
Kirill Tkhai [Thu, 29 Mar 2018 16:20:44 +0000 (19:20 +0300)]
net: Don't take rtnl_lock() in wireless_nlevent_flush()

This function iterates over net_namespace_list and flushes
the queue for every of them. What does this rtnl_lock()
protects?! Since we may add skbs to net::wext_nlevents
without rtnl_lock(), it does not protects us about queuers.

It guarantees, two threads can't flush the queue in parallel,
that can change the order, but since skb can be queued
in any order, it doesn't matter, how many threads do this
in parallel. In case of several threads, this will be even
faster.

So, we can remove rtnl_lock() here, as it was used for
iteration over net_namespace_list only.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Introduce net_rwsem to protect net_namespace_list
Kirill Tkhai [Thu, 29 Mar 2018 16:20:32 +0000 (19:20 +0300)]
net: Introduce net_rwsem to protect net_namespace_list

rtnl_lock() is used everywhere, and contention is very high.
When someone wants to iterate over alive net namespaces,
he/she has no a possibility to do that without exclusive lock.
But the exclusive rtnl_lock() in such places is overkill,
and it just increases the contention. Yes, there is already
for_each_net_rcu() in kernel, but it requires rcu_read_lock(),
and this can't be sleepable. Also, sometimes it may be need
really prevent net_namespace_list growth, so for_each_net_rcu()
is not fit there.

This patch introduces new rw_semaphore, which will be used
instead of rtnl_mutex to protect net_namespace_list. It is
sleepable and allows not-exclusive iterations over net
namespaces list. It allows to stop using rtnl_lock()
in several places (what is made in next patches) and makes
less the time, we keep rtnl_mutex. Here we just add new lock,
while the explanation of we can remove rtnl_lock() there are
in next patches.

Fine grained locks generally are better, then one big lock,
so let's do that with net_namespace_list, while the situation
allows that.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-bgmac-Couple-of-small-bgmac-changes'
David S. Miller [Thu, 29 Mar 2018 16:06:11 +0000 (12:06 -0400)]
Merge branch 'net-bgmac-Couple-of-small-bgmac-changes'

Florian Fainelli says:

====================
net: bgmac: Couple of small bgmac changes

This patch series addresses two minor issues with the bgmac driver:

- provides the interface name through /proc/interrupts rather than "bgmac"
- makes sure the interrupts are masked during probe, in case the block was
  not properly reset
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bgmac: Mask interrupts during probe
Florian Fainelli [Tue, 27 Mar 2018 23:20:02 +0000 (16:20 -0700)]
net: bgmac: Mask interrupts during probe

We can have interrupts left enabled form e.g: the bootloader which used
the network device for network boot. Make sure we have those disabled as
early as possible to avoid spurious interrupts.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: bgmac: Use interface name to request interrupt
Florian Fainelli [Tue, 27 Mar 2018 23:20:01 +0000 (16:20 -0700)]
net: bgmac: Use interface name to request interrupt

When the system contains several BGMAC adapters, it is nice to be able
to tell which one is which by looking at /proc/interrupts. Use the
network device name as a name to request_irq() with.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'rxrpc-next-20180327' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Thu, 29 Mar 2018 16:02:08 +0000 (12:02 -0400)]
Merge tag 'rxrpc-next-20180327' of git://git./linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Tracing updates

Here are some patches that update tracing in AF_RXRPC and AFS:

 (1) Add a tracepoint for tracking resend events.

 (2) Use debug_ids in traces rather than pointers (as pointers are now hashed)
     and allow use of the same debug_id in AFS calls as in the corresponding
     AF_RXRPC calls.  This makes filtering the trace output much easier.

 (3) Add a tracepoint for tracking call completion.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ethernet: nixge: Add support for National Instruments XGE netdev
Moritz Fischer [Tue, 27 Mar 2018 21:43:15 +0000 (14:43 -0700)]
net: ethernet: nixge: Add support for National Instruments XGE netdev

Add support for the National Instruments XGE 1/10G network device.

It uses the EEPROM on the board via NVMEM.

Signed-off-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agodt-bindings: net: Add bindings for National Instruments XGE netdev
Moritz Fischer [Tue, 27 Mar 2018 21:43:14 +0000 (14:43 -0700)]
dt-bindings: net: Add bindings for National Instruments XGE netdev

This adds bindings for the NI XGE 1G/10G network device.

Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec...
David S. Miller [Thu, 29 Mar 2018 15:22:31 +0000 (11:22 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-03-29

1) Remove a redundant pointer initialization esp_input_set_header().
   From Colin Ian King.

2) Mark the xfrm kmem_caches as __ro_after_init.
   From Alexey Dobriyan.

3) Do the checksum for an ipsec offlad packet in software
   if the device does not advertise NETIF_F_HW_ESP_TX_CSUM.
   From Shannon Nelson.

4) Use booleans for true and false instead of integers
   in xfrm_policy_cache_flush().
   From Gustavo A. R. Silva

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomac80211: don't WARN on bad WMM parameters from buggy APs
Emmanuel Grumbach [Mon, 26 Mar 2018 13:21:04 +0000 (16:21 +0300)]
mac80211: don't WARN on bad WMM parameters from buggy APs

Apparently, some APs are buggy enough to send a zeroed
WMM IE. Don't WARN on this since this is not caused by a bug
on the client's system.

This aligns the condition of the WARNING in drv_conf_tx
with the validity check in ieee80211_sta_wmm_params.
We will now pick the default values whenever we get
a zeroed WMM IE.

This has been reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=199161

Fixes: f409079bb678 ("mac80211: sanity check CW_min/CW_max towards driver")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agoMerge branch 'eapol-over-nl80211' into mac80211-next
Johannes Berg [Thu, 29 Mar 2018 12:04:07 +0000 (14:04 +0200)]
Merge branch 'eapol-over-nl80211' into mac80211-next

This is the EAPoL over nl80211 patchset from Denis Kenzior, minus some
infrastructure patches I'd split out and applied earlier. Denis described
it as follows:

This patchset adds support for running 802.11 authentication mechanisms (e.g.
802.1X, 4-Way Handshake, etc) over NL80211 instead of putting them onto the
network device.  This has the advantage of fixing several long-standing race
conditions that result from userspace operating on multiple transports in order
to manage a 802.11 connection (e.g. NL80211 and wireless netdev, wlan0, etc).

For example, userspace would sometimes see 4-Way handshake packets before
NL80211 signaled that the connection has been established.  Leading to ugly
hacks or having the STA wait for retransmissions from the AP.

This also provides a way to mitigate a particularly nasty race condition where
the encryption key could be set prior to the 4-way handshake packet 4/4 being
sent.  This would result in the packet being sent encrypted and discarded by
the peer.  The mitigation strategy for this race is for userspace to explicitly
tell the kernel that a particular EAPoL packet should not be encrypted.

To make this possible this patchset introduces a new NL80211 command and several
new attributes.  A userspace that is capable of processing EAPoL packets over
NL80211 includes a new NL80211_ATTR_CONTROL_PORT_OVER_NL80211 attribute in its
NL80211_CMD_ASSOCIATE or NL80211_CMD_CONNECT requests being sent to the kernel.
The previously added NL80211_ATTR_SOCKET_OWNER attribute must also be included.
The latter is used by the kernel to send NL80211_CMD_CONTROL_PORT_FRAME
notifications back to userspace via a netlink unicast.  If the
NL80211_ATTR_CONTROL_PORT_OVER_NL80211 attribute is not specified, then legacy
behavior is kept and control port packets continue to flow over the network
interface.

If control port over nl80211 transport is requested, then control port packets
are intercepted just prior to being handed to the network device and sent over
netlink via the NL80211_CMD_CONTROL_PORT_FRAME notification.
NL80211_ATTR_CONTROL_PORT_ETHERTYPE and NL80211_ATTR_MAC are included to
specify the control port frame protocol and source address respectively.  If
the control port frame was received unencrypted then
NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT flag is also included.  NL80211_ATTR_FRAME
attribute contains the raw control port frame with all transport layer headers
stripped (e.g. this would be the raw EAPoL frame).

Userspace can reply to control port frames either via legacy methods (by sending
frames to the network device) or via NL80211_CMD_CONTROL_PORT_FRAME request.
Userspace would included NL80211_ATTR_FRAME with the raw control port frame as
well as NL80211_Attr_MAC and NL80211_ATTR_CONTROL_PORT_ETHERTYPE attributes to
specify the destination address and protocol respectively.  This allows
Pre-Authentication (protocol 0x88c7) frames to be sent via this mechanism as
well.  Finally, NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT flag can be included to
tell the driver to send the frame unencrypted, e.g. for 4-Way handshake 4/4
frames.

The proposed patchset has been tested in a mac80211_hwsim based environment with
hostapd and iwd.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211: Send control port frames over nl80211
Denis Kenzior [Mon, 26 Mar 2018 17:52:51 +0000 (12:52 -0500)]
mac80211: Send control port frames over nl80211

If userspace requested control port frames to go over 80211, then do so.
The control packets are intercepted just prior to delivery of the packet
to the underlying network device.

Pre-authentication type frames (protocol: 0x88c7) are also forwarded
over nl80211.

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211: Add support for tx_control_port
Denis Kenzior [Mon, 26 Mar 2018 17:52:50 +0000 (12:52 -0500)]
mac80211: Add support for tx_control_port

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agonl80211: Add control_port_over_nl80211 to mesh_setup
Denis Kenzior [Mon, 26 Mar 2018 17:52:49 +0000 (12:52 -0500)]
nl80211: Add control_port_over_nl80211 to mesh_setup

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agonl80211: Add control_port_over_nl80211 for ibss
Denis Kenzior [Mon, 26 Mar 2018 17:52:48 +0000 (12:52 -0500)]
nl80211: Add control_port_over_nl80211 for ibss

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agonl80211: Add CONTROL_PORT_OVER_NL80211 attribute
Denis Kenzior [Mon, 26 Mar 2018 17:52:43 +0000 (12:52 -0500)]
nl80211: Add CONTROL_PORT_OVER_NL80211 attribute

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agonl80211: Implement TX of control port frames
Denis Kenzior [Mon, 26 Mar 2018 17:52:42 +0000 (12:52 -0500)]
nl80211: Implement TX of control port frames

This commit implements the TX side of NL80211_CMD_CONTROL_PORT_FRAME.
Userspace provides the raw EAPoL frame using NL80211_ATTR_FRAME.
Userspace should also provide the destination address and the protocol
type to use when sending the frame.  This is used to implement TX of
Pre-authentication frames.  If CONTROL_PORT_ETHERTYPE_NO_ENCRYPT is
specified, then the driver will be asked not to encrypt the outgoing
frame.

A new EXT_FEATURE flag is introduced so that nl80211 code can check
whether a given wiphy has capability to pass EAPoL frames over nl80211.

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agonl80211: Add CMD_CONTROL_PORT_FRAME API
Denis Kenzior [Mon, 26 Mar 2018 17:52:41 +0000 (12:52 -0500)]
nl80211: Add CMD_CONTROL_PORT_FRAME API

This commit also adds cfg80211_rx_control_port function.  This is used
to generate a CMD_CONTROL_PORT_FRAME event out to userspace.  The
conn_owner_nlportid is used as the unicast destination.  This means that
userspace must specify NL80211_ATTR_SOCKET_OWNER flag if control port
over nl80211 routing is requested in NL80211_CMD_CONNECT,
NL80211_CMD_ASSOCIATE, NL80211_CMD_START_AP or IBSS/mesh join.

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
[johannes: fix return value of cfg80211_rx_control_port()]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211: remove shadowing duplicated variable
Johannes Berg [Thu, 29 Mar 2018 09:14:30 +0000 (11:14 +0200)]
mac80211: remove shadowing duplicated variable

We already have 'ifmgd' here, and it's already assigned
to the same value, so remove the duplicate.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211: allow AP_VLAN operation on crypto controlled devices
Manikanta Pubbisetty [Wed, 28 Mar 2018 13:04:19 +0000 (18:34 +0530)]
mac80211: allow AP_VLAN operation on crypto controlled devices

In the current implementation, mac80211 advertises the support of
AP_VLANs based on the driver's support for AP mode; it also
blocks encrypted AP_VLAN operation on devices advertising
SW_CRYPTO_CONTROL.

The implementation seems weird in it's current form and could be
often confusing, this is because there can be drivers advertising
both SW_CRYPTO_CONTROL and AP mode support (ex: ath10k) in which case
AP_VLAN will still be supported but only in open BSS and not in
secured BSS.

When SW_CRYPTO_CONTROL is enabled, it makes more sense if the decision
to support AP_VLANs is left to the driver. Mac80211 can then allow
AP_VLAN operations depending on the driver support.

Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agocfg80211: Add API to allow querying regdb for wmm_rule
Haim Dreyfuss [Wed, 28 Mar 2018 10:24:11 +0000 (13:24 +0300)]
cfg80211: Add API to allow querying regdb for wmm_rule

In general regulatory self managed devices maintain their own
regulatory profiles thus it doesn't have to query the regulatory database
on country change.

ETSI has recently introduced a new channel access mechanism for 5GHz
that all wlan devices need to comply with.
These values are stored in the regulatory database.
There are self managed devices which can't maintain these
values on their own. Add API to allow self managed regulatory devices
to query the regulatory database for high band wmm rule.

Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[johannes: fix documentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211: limit wmm params to comply with ETSI requirements
Haim Dreyfuss [Wed, 28 Mar 2018 10:24:10 +0000 (13:24 +0300)]
mac80211: limit wmm params to comply with ETSI requirements

ETSI has recently added new requirements that restrict the WMM
parameter values for 5GHz frequencies.  We need to take care of the
following scenarios in order to comply with these new requirements:

1. When using mac80211 default values;
2. When the userspace tries to configure its own values;
3. When associating to an AP which advertises WWM IE.

When associating to an AP, the client uses the values in the
advertised WMM IE.  But the AP may not comply with the new ETSI
requirements, so the client needs to check the current regulatory
rules and use those limits accordingly.

Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agocfg80211: don't require RTNL held for regdomain reads
Johannes Berg [Tue, 27 Feb 2018 10:22:15 +0000 (11:22 +0100)]
cfg80211: don't require RTNL held for regdomain reads

The whole code is set up to allow RCU reads of this data, but
then uses rtnl_dereference() which requires the RTNL. Convert
it to rcu_dereference_rtnl() which makes it require only RCU
or the RTNL, to allow RCU-protected reading of the data.

Reviewed-by: Coelho, Luciano <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agocfg80211: read wmm rules from regulatory database
Haim Dreyfuss [Wed, 28 Mar 2018 10:24:09 +0000 (13:24 +0300)]
cfg80211: read wmm rules from regulatory database

ETSI EN 301 893 v2.1.1 (2017-05) standard defines a new channel access
mechanism that all devices (WLAN and LAA) need to comply with.
The regulatory database can now be loaded into the kernel and also
has the option to load optional data.
In order to be able to comply with ETSI standard, we add wmm_rule into
regulatory rule and add the option to read its value from the regulatory
database.

Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[johannes: fix memory leak in error path]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agonl80211: Add SOCKET_OWNER support to START_AP
Denis Kenzior [Mon, 26 Mar 2018 17:52:47 +0000 (12:52 -0500)]
nl80211: Add SOCKET_OWNER support to START_AP

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agonl80211: Add SOCKET_OWNER support to JOIN_MESH
Denis Kenzior [Mon, 26 Mar 2018 17:52:46 +0000 (12:52 -0500)]
nl80211: Add SOCKET_OWNER support to JOIN_MESH

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
[johannes: fix race with wdev lock/unlock by just acquiring once]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agonl80211: Add SOCKET_OWNER support to JOIN_IBSS
Denis Kenzior [Mon, 26 Mar 2018 17:52:45 +0000 (12:52 -0500)]
nl80211: Add SOCKET_OWNER support to JOIN_IBSS

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
[johannes: fix race with wdev lock/unlock by just acquiring once]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agocfg80211: Support all iftypes in autodisconnect_wk
Denis Kenzior [Mon, 26 Mar 2018 17:52:44 +0000 (12:52 -0500)]
cfg80211: Support all iftypes in autodisconnect_wk

Currently autodisconnect_wk assumes that only interface types of
P2P_CLIENT and STATION use conn_owner_nlportid.  Change this so all
interface types are supported.

Signed-off-by: Denis Kenzior <denkenz@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agocfg80211: enable use of non-cleared DFS channels for DFS offload
Dmitry Lebed [Mon, 26 Mar 2018 13:36:32 +0000 (16:36 +0300)]
cfg80211: enable use of non-cleared DFS channels for DFS offload

Currently channel switch/start_ap to DFS channel cannot be done to
non-CAC-cleared channel even if DFS offload if enabled.
Make non-cleared DFS channels available if DFS offload is enabled.
CAC will be started by HW after channel change, start_ap call, etc.

Signed-off-by: Dmitry Lebed <dlebed@quantenna.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agocfg80211: fix CAC_STARTED event handling
Dmitry Lebed [Mon, 26 Mar 2018 13:36:31 +0000 (16:36 +0300)]
cfg80211: fix CAC_STARTED event handling

Exclude CAC_STARTED event from !wdev->cac_started check,
since cac_started will be set later in the same function.

Signed-off-by: Dmitry Lebed <dlebed@quantenna.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211: Use proper chan_width enum in sta opmode event
tamizhr@codeaurora.org [Tue, 27 Mar 2018 13:46:17 +0000 (19:16 +0530)]
mac80211: Use proper chan_width enum in sta opmode event

Bandwidth change value reported via nl80211 contains mac80211
specific enum value(ieee80211_sta_rx_bw) and which is not
understand by userspace application. Map the mac80211 specific
value to nl80211_chan_width enum value to avoid using wrong value
in the userspace application. And used station's ht/vht capability
to map IEEE80211_STA_RX_BW_20 and IEEE80211_STA_RX_BW_160 with
proper nl80211 value.

Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agomac80211: Use proper smps_mode enum in sta opmode event
tamizhr@codeaurora.org [Tue, 27 Mar 2018 13:46:16 +0000 (19:16 +0530)]
mac80211: Use proper smps_mode enum in sta opmode event

SMPS_MODE change value notified via nl80211 contains mac80211
specific value(ieee80211_smps_mode) and user space application
will not know those values. This patch add support to map
the mac80211 enum value to nl80211_smps_mode which will be
understood by the userspace application.

Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agocfg80211: fix data type of sta_opmode_info parameter
tamizhr@codeaurora.org [Tue, 27 Mar 2018 13:46:15 +0000 (19:16 +0530)]
cfg80211: fix data type of sta_opmode_info parameter

Currently bw and smps_mode are u8 type value in sta_opmode_info
structure. This values filled in mac80211 from ieee80211_sta_rx_bandwidth
and ieee80211_smps_mode. These enum values are specific to mac80211 and
userspace/cfg80211 doesn't know about that. This will lead to incorrect
result/assumption by the user space application.
Change bw and smps_mode parameters to their respective enums in nl80211.

Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6 years agonet/mlx5e: Recover Send Queue (SQ) from error state
Eran Ben Elisha [Tue, 26 Dec 2017 14:02:24 +0000 (16:02 +0200)]
net/mlx5e: Recover Send Queue (SQ) from error state

An error TX completion (CQE) which arrived on a specific SQ indicates
that this SQ got moved by the hardware to error state, which means all
pending and incoming TX requests are dropped or will be dropped and no
further "Good" CQEs will be generated for that SQ.

Before this patch TX completions (CQEs) were not monitored and were
handled as a regular CQE. This caused the SQ to stay in an error state,
making it useless for xmiting new packets.

Mitigation plan:
In case of an error completion, schedule a recovery work which would do
the following:
- Mark the TXQ as DRV_XOFF to disable new packets to arrive from the
  stack
- NAPI to flush all pending SQ WQEs (via flush_in_error_en bit) to
  release SW and HW resources(SKB, DMA, etc) and have the SQ and CQ
  consumer/producer indices synced.
- Modify the SQ state ERR -> RST -> RDY (restart the SQ).
- Reactivate the SQ and reset SQ cc and pc

If we identify two consecutive requests for SQ recover in less than
500 msecs, drop the recover request to avoid CPU overload, as this
scenario most likely happened due to a severe repeated bug.

In addition, add SQ recover SW counter to monitor successful recoveries.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Dump xmit error completions
Eran Ben Elisha [Tue, 9 Jan 2018 14:21:16 +0000 (16:21 +0200)]
net/mlx5e: Dump xmit error completions

Monitor and dump xmit error completions. In addition, add err_cqe
counter to track the number of error completion per send queue.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agomlx5: Move dump error CQE function out of mlx5_ib for code sharing
Eran Ben Elisha [Sun, 31 Dec 2017 10:55:26 +0000 (12:55 +0200)]
mlx5: Move dump error CQE function out of mlx5_ib for code sharing

Move mlx5_ib dump error CQE implementation to mlx5 CQ header file in
order to use it in a downstream patch from mlx5e.

In addition, use print_hex_dump instead of manual dumping of the buffer.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agomlx5_{ib,core}: Add query SQ state helper function
Eran Ben Elisha [Tue, 26 Dec 2017 13:17:05 +0000 (15:17 +0200)]
mlx5_{ib,core}: Add query SQ state helper function

Move query SQ state function from mlx5_ib to mlx5_core in order to
have it in shared code.

It will be used in a downstream patch from mlx5e.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Move all TX timeout logic to be under state lock
Eran Ben Elisha [Tue, 16 Jan 2018 15:25:06 +0000 (17:25 +0200)]
net/mlx5e: Move all TX timeout logic to be under state lock

Driver callback for handling TX timeout should access some internal
resources (SQ, CQ) in order to decide if the tx timeout work should be
scheduled.  These resources might be unavailable if channels are closed
in parallel (ifdown for example).

The state lock is the mechanism to protect from such races.
Move all TX timeout logic to be in the work under a state lock.

In addition, Move the work from the global WQ to mlx5e WQ to make sure
this work is flushed when device is detached..

Also, move the mlx5e_tx_timeout_work code to be next to the TX timeout
NDO for better code locality.

Fixes: 3947ca185999 ("net/mlx5e: Implement ndo_tx_timeout callback")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Remove unused max inline related code
Gal Pressman [Sun, 21 Jan 2018 08:52:17 +0000 (10:52 +0200)]
net/mlx5e: Remove unused max inline related code

Commit 58d522912ac7 ("net/mlx5e: Support TX packet copy into WQE")
introduced the max inline WQE as an ethtool tunable. One commit later,
that functionality was made dependent on BlueFlame.

Commit 6982ab609768 ("net/mlx5e: Xmit, no write combining") removed
BlueFlame support, and with it the max inline WQE.
This patch cleans up the leftovers from the removed feature.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Add ethtool priv-flag for Striding RQ
Tariq Toukan [Wed, 7 Feb 2018 12:51:45 +0000 (14:51 +0200)]
net/mlx5e: Add ethtool priv-flag for Striding RQ

Add a control private flag in ethtool to enable/disable
Striding RQ feature.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Do not reset Receive Queue params on every type change
Tariq Toukan [Sun, 18 Feb 2018 09:37:06 +0000 (11:37 +0200)]
net/mlx5e: Do not reset Receive Queue params on every type change

Do not implicit a call to mlx5e_init_rq_type_params() upon every
change in RQ type. It should be called only on channels creation.

Fixes: 2fc4bfb7250d ("net/mlx5e: Dynamic RQ type infrastructure")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Remove rq_headroom field from params
Tariq Toukan [Wed, 7 Feb 2018 11:28:35 +0000 (13:28 +0200)]
net/mlx5e: Remove rq_headroom field from params

It can be derived from other params, calculate it
via the dedicated function when needed.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Remove RQ MPWQE fields from params
Tariq Toukan [Wed, 7 Feb 2018 11:21:30 +0000 (13:21 +0200)]
net/mlx5e: Remove RQ MPWQE fields from params

Introduce functions to calculate them when needed.
They can be derived from other params.
This will simplify transition between RQ configurations.

In general, any parameter that is not explicitly set
or controlled, but derived from other parameters,
should not have a control-path field itself, but a
getter function.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Use no-offset function in skb header copy
Tariq Toukan [Thu, 15 Feb 2018 16:27:06 +0000 (18:27 +0200)]
net/mlx5e: Use no-offset function in skb header copy

In copying skb header to skb->data, replace the call to
skb_copy_to_linear_data_offset() with a zero offset with
the call to the no-offset function skb_copy_to_linear_data().

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Separate dma base address and offset in dma_sync call
Tariq Toukan [Mon, 27 Nov 2017 08:35:12 +0000 (10:35 +0200)]
net/mlx5e: Separate dma base address and offset in dma_sync call

Pass the base dma address and offset to dma_sync_single_range_for_cpu(),
instead of doing the pre-calculation.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Remove unused define MLX5_MPWRQ_STRIDES_PER_PAGE
Tariq Toukan [Tue, 21 Nov 2017 15:43:50 +0000 (17:43 +0200)]
net/mlx5e: Remove unused define MLX5_MPWRQ_STRIDES_PER_PAGE

Clean it up as it's not in use.

Fixes: d9d9f156f380 ("net/mlx5e: Expand WQE stride when CQE compression is enabled")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Disable Striding RQ when PCI is slower than link
Tariq Toukan [Sun, 11 Feb 2018 09:58:30 +0000 (11:58 +0200)]
net/mlx5e: Disable Striding RQ when PCI is slower than link

We turn the feature off for servers with PCI BW bounded
by a threshold (16G) and lower than MAX LINK BW.
This improves the effectiveness of CQE compression feature,
that is defaulted to ON for the same case.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agonet/mlx5e: Unify slow PCI heuristic
Tariq Toukan [Wed, 17 Jan 2018 15:39:07 +0000 (17:39 +0200)]
net/mlx5e: Unify slow PCI heuristic

Get the link/pci speed query and logic into a single function.
Unify the heuristics and use a single PCI threshold (16G) for all.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
6 years agorxrpc: Trace call completion
David Howells [Tue, 27 Mar 2018 22:08:20 +0000 (23:08 +0100)]
rxrpc: Trace call completion

Add a tracepoint to track rxrpc calls moving into the completed state and
to log the completion type and the recorded error value and abort code.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agorxrpc, afs: Use debug_ids rather than pointers in traces
David Howells [Tue, 27 Mar 2018 22:03:00 +0000 (23:03 +0100)]
rxrpc, afs: Use debug_ids rather than pointers in traces

In rxrpc and afs, use the debug_ids that are monotonically allocated to
various objects as they're allocated rather than pointers as kernel
pointers are now hashed making them less useful.  Further, the debug ids
aren't reused anywhere nearly as quickly.

In addition, allow kernel services that use rxrpc, such as afs, to take
numbers from the rxrpc counter, assign them to their own call struct and
pass them in to rxrpc for both client and service calls so that the trace
lines for each will have the same ID tag.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agorxrpc: Trace resend
David Howells [Tue, 27 Mar 2018 22:02:47 +0000 (23:02 +0100)]
rxrpc: Trace resend

Add a tracepoint to trace packet resend events and to dump the Tx
annotation buffer for added illumination.

Signed-off-by: David Howells <dhowells@rdhat.com>
6 years agoMerge branch 'sfc-filter-locking'
David S. Miller [Tue, 27 Mar 2018 17:33:21 +0000 (13:33 -0400)]
Merge branch 'sfc-filter-locking'

Edward Cree says:

====================
sfc: rework locking around filter management

The use of a spinlock to protect filter state combined with the need for a
 sleeping operation (MCDI) to apply that state to the NIC (on EF10) led to
 unfixable race conditions, around the handling of filter restoration after
 an MC reboot.
So, this patch series removes the requirement to be able to modify the SW
 filter table from atomic context, by using a workqueue to request
 asynchronous filter operations (which are needed for ARFS).  Then, the
 filter table locks are changed to mutexes, replacing the dance of spinlocks
 and 'busy' flags.  Also, a mutex is added to protect the RSS context state,
 since otherwise a similar race is possible around restoring that after an
 MC reboot.  While we're at it, fix a couple of other related bugs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosfc: fix flow type handling for RSS filters
Edward Cree [Tue, 27 Mar 2018 16:44:51 +0000 (17:44 +0100)]
sfc: fix flow type handling for RSS filters

The FLOW_RSS flag was causing us to insert UDP filters when TCP was wanted.

Fixes: 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple filters")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosfc: protect list of RSS contexts under a mutex
Edward Cree [Tue, 27 Mar 2018 16:44:36 +0000 (17:44 +0100)]
sfc: protect list of RSS contexts under a mutex

Otherwise races are possible between ethtool ops and
 efx_ef10_rx_restore_rss_contexts().
Also, don't try to perform the restore on every reset, only after an MC
 reboot, otherwise we'll leak RSS contexts on the NIC.

Fixes: 42356d9a137b ("sfc: support RSS spreading of ethtool ntuple filters")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosfc: return a better error if filter insertion collides with MC reboot
Edward Cree [Tue, 27 Mar 2018 16:44:21 +0000 (17:44 +0100)]
sfc: return a better error if filter insertion collides with MC reboot

If some other operation gets the MCDI lock ahead of us and performs an MC
 reboot, then our attempt to insert the filter will fail with EINVAL,
 because the destination VI (spec->dmaq_id, MC_CMD_FILTER_OP_IN_RX_QUEUE) does
 not exist.  But the caller's request (which might e.g. be an ethtool ntuple
 request from userland) isn't invalid, it just got unlucky; so return EAGAIN.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosfc: use a semaphore to lock farch filters too
Edward Cree [Tue, 27 Mar 2018 16:42:57 +0000 (17:42 +0100)]
sfc: use a semaphore to lock farch filters too

With this change, the spinlock efx->filter_lock is no longer used and is
 thus removed.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosfc: give ef10 its own rwsem in the filter table instead of filter_lock
Edward Cree [Tue, 27 Mar 2018 16:42:28 +0000 (17:42 +0100)]
sfc: give ef10 its own rwsem in the filter table instead of filter_lock

efx->filter_lock remains in place for use on farch, but EF10 now ignores it.
EFX_EF10_FILTER_FLAG_BUSY is no longer needed, hence it is removed.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosfc: replace asynchronous filter operations
Edward Cree [Tue, 27 Mar 2018 16:41:59 +0000 (17:41 +0100)]
sfc: replace asynchronous filter operations

Instead of having an efx->type->filter_rfs_insert() method, just use
 workitems with a worker function that calls efx->type->filter_insert().
The only user of this is efx_filter_rfs(), which now queues a call to
 efx_filter_rfs_work().
Similarly, efx_filter_rfs_expire() is now a worker function called on a
 new channel->filter_work work_struct, so the method
 efx->type->filter_rfs_expire_one() is no longer called in atomic context.
 We also add a new mutex efx->rps_mutex to protect the RPS state (efx->
 rps_expire_channel, efx->rps_expire_index, and channel->rps_flow_id) so
 that the taking of efx->filter_lock can be moved to
 efx->type->filter_rfs_expire_one().
Thus, all filter table functions are now called in a sleepable context,
 allowing them to use sleeping locks in a future patch.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'pernet-all-async'
David S. Miller [Tue, 27 Mar 2018 17:18:10 +0000 (13:18 -0400)]
Merge branch 'pernet-all-async'

Kirill Tkhai says:

====================
Make pernet_operations always read locked

All the pernet_operations are converted, and the last one
is in this patchset (nfsd_net_ops acked by J. Bruce Fields).
So, it's the time to kill pernet_operations::async field,
and make setup_net() and cleanup_net() always require
the rwsem only read locked.

All further pernet_operations have to be developed to fit
this rule. Some of previous patches added a comment to
struct pernet_operations about that.

Also, this patchset renames net_sem to pernet_ops_rwsem
to make the target area of the rwsem is more clear visible,
and adds more comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Add more comments
Kirill Tkhai [Tue, 27 Mar 2018 15:02:32 +0000 (18:02 +0300)]
net: Add more comments

This adds comments to different places to improve
readability.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Rename net_sem to pernet_ops_rwsem
Kirill Tkhai [Tue, 27 Mar 2018 15:02:23 +0000 (18:02 +0300)]
net: Rename net_sem to pernet_ops_rwsem

net_sem is some undefined area name, so it will be better
to make the area more defined.

Rename it to pernet_ops_rwsem for better readability and
better intelligibility.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Drop pernet_operations::async
Kirill Tkhai [Tue, 27 Mar 2018 15:02:13 +0000 (18:02 +0300)]
net: Drop pernet_operations::async

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Reflect all pernet_operations are converted
Kirill Tkhai [Tue, 27 Mar 2018 15:02:01 +0000 (18:02 +0300)]
net: Reflect all pernet_operations are converted

All pernet_operations are reviewed and converted, hooray!
Reflect this in core code: setup_net() and cleanup_net()
will take down_read() always.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Convert nfsd_net_ops
Kirill Tkhai [Tue, 27 Mar 2018 15:01:51 +0000 (18:01 +0300)]
net: Convert nfsd_net_ops

These pernet_operations look similar to rpcsec_gss_net_ops,
they just create and destroy another caches. So, they also
can be async.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: mvpp2: Use relaxed I/O in data path
Yan Markman [Tue, 27 Mar 2018 14:49:05 +0000 (16:49 +0200)]
net: mvpp2: Use relaxed I/O in data path

Use relaxed I/O on the hot path. This achieves significant performance
improvements. On a 10G link, this makes a basic iperf TCP test go from
an average of 4.5 Gbits/sec to about 9.40 Gbits/sec.

Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Maxime: Commit message, cosmetic changes]
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'mlx5-updates-2018-03-22' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Tue, 27 Mar 2018 15:05:23 +0000 (11:05 -0400)]
Merge tag 'mlx5-updates-2018-03-22' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-03-22 (Misc updates)

This series includes misc updates for mlx5 core and netdev dirver,

Highlights:

From Inbar, three patches to add support for PFC stall prevention
statistics and enable/disable through new ethtool tunable, as requested
from previous submission.

From Moshe, four patches, added more drop counters:
- drop counter for netdev steering miss
- drop counter for when VF logical link is down
        - drop counter for when netdev logical link is down.

From Or, three patches to support vlan push/pop offload via tc HW action,
for newer HW (Connectx-5 and onward) via HW steering flow actions rather
than the emulated path for the older HW brands.

And five more misc small trivial patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoliquidio: Removed duplicate Tx queue status check
Intiyaz Basha [Mon, 26 Mar 2018 20:40:27 +0000 (13:40 -0700)]
liquidio: Removed duplicate Tx queue status check

Napi is checking Tx queue status and waking the Tx queue if required.
Same operation is being done while freeing every Tx buffer.
So removed the duplicate operation of checking Tx queue status from the Tx
buffer free functions.

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: addrconf: Use normal debugging style
Joe Perches [Mon, 26 Mar 2018 15:35:01 +0000 (08:35 -0700)]
ipv6: addrconf: Use normal debugging style

Remove local ADBG macro and use netdev_dbg/pr_debug

Miscellanea:

o Remove unnecessary debug message after allocation failure as there
  already is a dump_stack() on the failure paths
o Leave the allocation failure message on snmp6_alloc_dev as there
  is one code path that does not do a dump_stack()

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotc-testing: Correct compound statements for namespace execution
Lucas Bates [Mon, 26 Mar 2018 14:46:14 +0000 (10:46 -0400)]
tc-testing: Correct compound statements for namespace execution

If tdc is executing test cases inside a namespace, only the
first command in a compound statement will be executed inside
the namespace by tdc. As a result, the subsequent commands
are not executed inside the namespace and the test will fail.

Example:

for i in {x..y}; do args="foo"; done && tc actions add $args

The namespace execution feature will prepend 'ip netns exec'
to the command:

ip netns exec tcut for i in {x..y}; do args="foo"; done && \
  tc actions add $args

So the actual tc command is not parsed by the shell as being
part of the namespace execution.

Enclosing these compound statements inside a bash invocation
with proper escape characters resolves the problem by creating
a subshell inside the namespace.

Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: tipc_node_create() can be static
Wei Yongjun [Mon, 26 Mar 2018 14:33:13 +0000 (14:33 +0000)]
tipc: tipc_node_create() can be static

Fixes the following sparse warning:

net/tipc/node.c:336:18: warning:
 symbol 'tipc_node_create' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>