Nikolay Aleksandrov [Mon, 31 Oct 2016 12:21:03 +0000 (13:21 +0100)]
net: pim: add a helper to check for IPv4 all pim routers address
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov [Mon, 31 Oct 2016 12:21:02 +0000 (13:21 +0100)]
net: pim: add common pimhdr struct and helpers
Add the common pimhdr structure and helpers to access it, also cleanup the
format of the header file.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 31 Oct 2016 19:52:37 +0000 (15:52 -0400)]
Merge branch 'qed-next'
Yuval Mintz says:
====================
qed*: Patch series
This series does several things. The bigger changes:
- Add new notification APIs [& Defaults] for various fields.
The series then utilizes some of those qed <-> qede APIs to bass WoL
support upon.
- Change the resource allocation scheme to receive the values from
management firmware, instead of equally sharing resources between
functions [that might not need those]. That would, e.g., allow us to
configure additional filters to network interfaces in presence of
storage [PCI] functions from same adapter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Mon, 31 Oct 2016 05:14:27 +0000 (07:14 +0200)]
qed: Learn resources from management firmware
Currently, each interfaces assumes it receives an equal portion
of HW/FW resources, but this is wasteful - different partitions
[and specifically, parititions exposing different protocol support]
might require different resources.
Implement a new resource learning scheme where the information is
received directly from the management firmware [which has knowledge
of all of the functions and can serve as arbiter].
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Mon, 31 Oct 2016 05:14:26 +0000 (07:14 +0200)]
qed: Use VF-queue feature
Driver sets several restrictions about the number of supported VFs
according to available HW/FW resources.
This creates a problem as there are constellations which can't be
supported [as limitation don't accurately describe the resources],
as well as holes where enabling IOV would fail due to supposed
lack of resources.
This introduces a new interal feature - vf-queues, which would
be used to lift some of the restriction and accurately enumerate
the queues that can be used by a given PF's VFs.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Mon, 31 Oct 2016 05:14:25 +0000 (07:14 +0200)]
qed: Learn of RDMA capabilities per-device
Today, RDMA capabilities are learned from management firmware
which provides a per-device indication for all interfaces.
Newer management firmware is capable of providing a per-device
indication [would later be extended to either RoCE/iWARP].
Try using this newer learning mechanism, but fallback in case
management firmware is too old to retain current functionality.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Mon, 31 Oct 2016 05:14:24 +0000 (07:14 +0200)]
qede: Decouple ethtool caps from qed
While the qed_lm_maps is closely tied with the QED_LM_* defines,
when iterating over the array use actual size instead of the qed
define to prevent future possible issues.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Mon, 31 Oct 2016 05:14:23 +0000 (07:14 +0200)]
qed*: Add support for WoL
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Mon, 31 Oct 2016 05:14:22 +0000 (07:14 +0200)]
qed: Add nvram selftest
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Kalluru [Mon, 31 Oct 2016 05:14:21 +0000 (07:14 +0200)]
qed*: Management firmware - notifications and defaults
Management firmware is interested in various tidbits about
the driver - including the driver state & several configuration
related fields [MTU, primtary MAC, etc.].
This adds the necessray logic to update MFW with such configurations,
some of which are passed directly via qed while for others APIs
are provide so that qede would be able to later configure if needed.
This also introduces a new default configuration for MTU which would
replace the default inherited by being an ethernet device.
Signed-off-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sat, 29 Oct 2016 19:37:09 +0000 (21:37 +0200)]
solos-pci: use permission-specific DEVICE_ATTR variants
Use DEVICE_ATTR_RW for read-write attributes. This simplifies the
source code, improves readbility, and reduces the chance of
inconsistencies.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@rw@
declarer name DEVICE_ATTR;
identifier x,x_show,x_store;
@@
DEVICE_ATTR(x, \(0644\|S_IRUGO|S_IWUSR\), x_show, x_store);
@script:ocaml@
x << rw.x;
x_show << rw.x_show;
x_store << rw.x_store;
@@
if not (x^"_show" = x_show && x^"_store" = x_store)
then Coccilib.include_match false
@@
declarer name DEVICE_ATTR_RW;
identifier rw.x,rw.x_show,rw.x_store;
@@
- DEVICE_ATTR(x, \(0644\|S_IRUGO|S_IWUSR\), x_show, x_store);
+ DEVICE_ATTR_RW(x);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sat, 29 Oct 2016 19:37:06 +0000 (21:37 +0200)]
ptp: use permission-specific DEVICE_ATTR variants
Use DEVICE_ATTR_RO for read only attributes. This simplifies the
source code, improves readbility, and reduces the chance of
inconsistencies.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@ro@
declarer name DEVICE_ATTR;
identifier x,x_show;
@@
DEVICE_ATTR(x, \(0444\|S_IRUGO\), x_show, NULL);
@script:ocaml@
x << ro.x;
x_show << ro.x_show;
@@
if not (x^"_show" = x_show) then Coccilib.include_match false
@@
declarer name DEVICE_ATTR_RO;
identifier ro.x,ro.x_show;
@@
- DEVICE_ATTR(x, \(0444\|S_IRUGO\), x_show, NULL);
+ DEVICE_ATTR_RO(x);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 29 Oct 2016 00:30:46 +0000 (02:30 +0200)]
bpf, inode: add support for symlinks and fix mtime/ctime
While commit
bb35a6ef7da4 ("bpf, inode: allow for rename and link ops")
added support for hard links that can be used for prog and map nodes,
this work adds simple symlink support, which can be used f.e. for
directories also when unpriviledged and works with cmdline tooling that
understands S_IFLNK anyway. Since the switch in
e27f4a942a0e ("bpf: Use
mount_nodev not mount_ns to mount the bpf filesystem"), there can be
various mount instances with mount_nodev() and thus hierarchy can be
flattened to facilitate object sharing. Thus, we can keep bpf tooling
also working by repointing paths.
Most of the functionality can be used from vfs library operations. The
symlink is stored in the inode itself, that is in i_link, which is
sufficient in our case as opposed to storing it in the page cache.
While at it, I noticed that bpf_mkdir() and bpf_mkobj() don't update
the directories mtime and ctime, so add a common helper for it called
bpf_dentry_finalize() that takes care of it for all cases now.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aaron Young [Fri, 28 Oct 2016 18:26:19 +0000 (14:26 -0400)]
ldmvsw: tx queue stuck in stopped state after LDC reset
The following patch fixes an issue with the ldmvsw driver where
the network connection of a guest domain becomes non-functional after
the guest domain has panic'd and rebooted.
The root cause was determined to be from the following series of
events:
1. Guest domain panics - resulting in the guest no longer processing
network packets (from ldmvsw driver)
2. The ldmvsw driver (in the control domain) eventually exerts flow
control due to no more available tx drings and stops the tx queue
for the guest domain
3. The LDC of the network connection for the guest is reset when
the guest domain reboots after the panic.
4. The LDC reset event is received by the ldmvsw driver and the ldmvsw
responds by clearing the tx queue for the guest.
5. ldmvsw waits indefinitely for a DATA ACK from the guest - which is
the normal method to re-enable the tx queue. But the ACK never comes
because the tx queue was cleared due to the LDC reset.
To fix this issue, in addition to clearing the tx queue, re-enable the
tx queue on a LDC reset. This prevents the ldmvsw from getting caught in
this deadlocked state of waiting for a DATA ACK which will never come.
Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 31 Oct 2016 19:00:48 +0000 (15:00 -0400)]
Merge branch 'xps-DCB'
Alexander Duyck says:
====================
Add support for XPS when using DCB
This patch series enables proper isolation between traffic classes when
using XPS while DCB is enabled. Previously enabling XPS would cause the
traffic to be potentially pulled from one traffic class into another on
egress. This change essentially multiplies the XPS map by the number of
traffic classes and allows us to do a lookup per traffic class for a given
CPU.
To guarantee the isolation I invalidate the XPS map for any queues that are
moved from one traffic class to another, or if we change the number of
traffic classes.
v2: Added sysfs to display traffic class
Replaced do/while with for loop
Cleaned up several other for for loops throughout the patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 28 Oct 2016 15:50:13 +0000 (11:50 -0400)]
net: Add support for XPS with QoS via traffic classes
This patch adds support for setting and using XPS when QoS via traffic
classes is enabled. With this change we will factor in the priority and
traffic class mapping of the packet and use that information to correctly
select the queue.
This allows us to define a set of queues for a given traffic class via
mqprio and then configure the XPS mapping for those queues so that the
traffic flows can avoid head-of-line blocking between the individual CPUs
if so desired.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 28 Oct 2016 15:46:49 +0000 (11:46 -0400)]
net: Refactor removal of queues from XPS map and apply on num_tc changes
This patch updates the code for removing queues from the XPS map and makes
it so that we can apply the code any time we change either the number of
traffic classes or the mapping of a given block of queues. This way we
avoid having queues pulling traffic from a foreign traffic class.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 28 Oct 2016 15:43:49 +0000 (11:43 -0400)]
net: Add sysfs value to determine queue traffic class
Add a sysfs attribute for a Tx queue that allows us to determine the
traffic class for a given queue. This will allow us to more easily
determine this in the future. It is needed as XPS will take the traffic
class for a group of queues into account in order to avoid pulling traffic
from one traffic class into another.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 28 Oct 2016 15:43:20 +0000 (11:43 -0400)]
net: Move functions for configuring traffic classes out of inline headers
The functions for configuring the traffic class to queue mappings have
other effects that need to be addressed. Instead of trying to export a
bunch of new functions just relocate the functions so that we can
instrument them directly with the functionality they will need.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Thu, 27 Oct 2016 01:05:22 +0000 (09:05 +0800)]
driver: tun: Use new macro SOCK_IOC_TYPE instead of literal number 0x89
The current codes use _IOC_TYPE(cmd) == 0x89 to check if the cmd is one
socket ioctl command like SIOCGIFHWADDR. But the literal number 0x89 may
confuse readers. So create one macro SOCK_IOC_TYPE to enhance the readability.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Tue, 25 Oct 2016 01:29:13 +0000 (18:29 -0700)]
net: add an ioctl to get a socket network namespace
Each socket operates in a network namespace where it has been created,
so if we want to dump and restore a socket, we have to know its network
namespace.
We have a socket_diag to get information about sockets, it doesn't
report sockets which are not bound or connected.
This patch introduces a new socket ioctl, which is called SIOCGSKNS
and used to get a file descriptor for a socket network namespace.
A task must have CAP_NET_ADMIN in a target network namespace to
use this ioctl.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 31 Oct 2016 14:33:08 +0000 (10:33 -0400)]
mv643xx_eth: Properly resolve merge conflict.
The second SET_NETDEV_DEV() in the hunk should be
removed.
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 31 Oct 2016 12:59:27 +0000 (08:59 -0400)]
mv643xx_eth: Fix merge error.
One merge conflict block wasn't resolved.
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 30 Oct 2016 21:31:12 +0000 (17:31 -0400)]
Merge tag 'shared-for-4.10-1' of git://git./linux/kernel/git/leon/linux-rdma
Saeed Mahameed says:
====================
Mellanox mlx5 core driver updates 2016-10-25
This series contains some updates and fixes of mlx5 core and
IB drivers with the addition of two features that demand
new low level commands and infrastructure updates.
- SRIOV VF max rate limit support
- mlx5e tc support for FWD rules with counter.
Needed for both net and rdma subsystems.
Updates and Fixes:
From Saeed Mahameed (2):
- mlx5 IB: Skip handling unknown mlx5 events
- Add ConnectX-5 PCIe 4.0 VF device ID
From Artemy Kovalyov (2):
- Update struct mlx5_ifc_xrqc_bits
- Ensure SRQ physical address structure endianness
From Eugenia Emantayev (1):
- Fix length of async_event_mask
New Features:
From Mohamad Haj Yahia (3): mlx5 SRIOV VF max rate limit support
- Introduce TSAR manipulation firmware commands
- Introduce E-switch QoS management
- Add SRIOV VF max rate configuration support
From Mark Bloch (7): mlx5e Tc support for FWD rule with counter
- Don't unlock fte while still using it
- Use fte status to decide on firmware command
- Refactor find_flow_rule
- Group similar rules under the same fte
- Add multi dest support
- Add option to add fwd rule with counter
- mlx5e tc support for FWD rule with counter
Mark here fixed two trivial issues with the flow steering core, and did
some refactoring in the flow steering API to support adding mulit destination
rules to the same hardware flow table entry at once. In the last two patches
added the ability to populate a flow rule with a flow counter to the same flow entry.
V2: Dropped some patches that added new structures without adding any usage of them.
Added SRIOV VF max rate configuration support patch that introduces
the usage of the TSAR infrastructure.
Added flow steering fixes and refactoring in addition to mlx5 tc
support for forward rule with counter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Tue, 25 Oct 2016 16:41:31 +0000 (18:41 +0200)]
net: bonding: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 30 Oct 2016 20:50:20 +0000 (16:50 -0400)]
Merge branch 'mlxsw-IB'
Jiri Pirko says:
====================
mlxsw: Add Infiniband support for Mellanox switches
This patchset adds basic Infiniband support for SwitchX-2, Switch-IB
and Switch-IB-2 ASIC drivers.
SwitchX-2 ASIC is VPI capable, which means each port can be either
Ethernet or Infiniband. When the port is configured as Infiniband,
the Subnet Management Agent (SMA) is managed by the SwitchX-2 firmware
and not by the host. Port configuration, MTU and more are configured
remotely by the Subnet Manager (SM).
Usage:
$ devlink port show
pci/0000:03:00.0/1: type eth netdev eth0
pci/0000:03:00.0/3: type eth netdev eth1
pci/0000:03:00.0/5: type eth netdev eth2
pci/0000:03:00.0/6: type eth netdev eth3
pci/0000:03:00.0/8: type eth netdev eth4
$ devlink port set pci/0000:03:00.0/1 type ib
$ devlink port show
pci/0000:03:00.0/1: type ib
Switch-IB (FDR) and Switch-IB-2 (EDR 100Gbs) ASICs are Infiniband-only
switches. The support provided in the mlxsw_switchib.ko driver is port
initialization only. The firmware running in the Silicon implements
the SMA.
Please note that this patchset does only very basic port initialization.
ib_device or RDMA implementations are not part of this patchset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:36:01 +0000 (21:36 +0200)]
mlxsw: switchib: Introduce SwitchIB and SwitchIB silicon driver
SwitchIB and SwitchIB-2 are Infiniband switches with up to 36 ports. This
driver initialize the hardware and Firmware which implements the IB
management and connection with the SM.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:36:00 +0000 (21:36 +0200)]
mlxsw: switchx2: Add IB port support
SwitchX-2 is IB capable device. This patch add a support to change the
port type between Ethernet and Infiniband.
When the port is set to IB, the FW implements the Subnet Management Agent
(SMA) manage the port. All port attributes can be control remotely by
the SM.
Usage:
$ devlink port show
pci/0000:03:00.0/1: type eth netdev eth0
pci/0000:03:00.0/3: type eth netdev eth1
pci/0000:03:00.0/5: type eth netdev eth2
pci/0000:03:00.0/6: type eth netdev eth3
pci/0000:03:00.0/8: type eth netdev eth4
$ devlink port set pci/0000:03:00.0/1 type ib
$ devlink port show
pci/0000:03:00.0/1: type ib
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:35:59 +0000 (21:35 +0200)]
mlxsw: switchx2: Add eth prefix to port create and remove
Since we are about to add Infiniband port remove and create we will add
"eth" prefix to port create and remove APIs.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:35:58 +0000 (21:35 +0200)]
mlxsw: core: Add port type (Eth/IB) set API
Add "port_type_set" API to mlxsw core. The core layer send the change type
callback to the port along with it's private information.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:35:57 +0000 (21:35 +0200)]
mlxsw: core: Add "eth" prefix to mlxsw_core_port_set
Since we are about to introduce IB port APIs, we will add prefixes to
existing APIs.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:35:56 +0000 (21:35 +0200)]
mlxsw: switchx2: Add Infiniband switch partition
In order to put a port in Infiniband fabric it should be assigned
to separate swid (Switch partition) that initialized as IB swid.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 28 Oct 2016 19:35:55 +0000 (21:35 +0200)]
mlxsw: Make devlink port instances independent of spectrum/switchx2 port instances
Currently, devlink register/unregister is done directly from
spectrum/switchx2 port create/remove functions. With a need to
introduce a port type change, the devlink port instances have to be
persistent across type changes, therefore across port create/remove
function calls. So do a bit of reshuffling to achieve that.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:35:54 +0000 (21:35 +0200)]
mlxsw: reg: Add local-port to Infiniband port mapping
In order to change a port type to Infiniband port we should change his
mapping from local-port to Infiniband. Adding the PLIB (Port Local to
InfiniBand) allows this mapping.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:35:53 +0000 (21:35 +0200)]
mlxsw: reg: Add Infiniband support to PTYS
In order to support Infiniband fabric, we need to introduce IB speeds and
capabilities to PTYS emads.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:35:52 +0000 (21:35 +0200)]
mlxsw: reg: Add eth prefix to PTYS pack and unpack
We want to add Infiniband support to PTYS. In order to maintain proper
conventions, we will change pack and unpack prefix to eth.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:35:51 +0000 (21:35 +0200)]
mlxsw: switchx2: Fix port speed configuration
In SwitchX-2 we configure the port speed to negotiate with 40G link only.
Add support for all other supported speeds.
Fixes: 31557f0f9755 ("mlxsw: Introduce Mellanox SwitchX-2 ASIC support")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:35:50 +0000 (21:35 +0200)]
mlxsw: switchx2: Add support for physical port names
Export to userspace the front panel name of the port, so that udev can
rename the ports accordingly. The convention suggested by switchdev
documentation is used: pX
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 28 Oct 2016 19:35:49 +0000 (21:35 +0200)]
mlxsw: spectrum: Move port used check outside port remove function
Be symmentrical with create and do the check outside the remove function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 28 Oct 2016 19:35:48 +0000 (21:35 +0200)]
mlxsw: switchx2: Move port used check outside port remove function
Be symmentrical with create and do the check outside the remove function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 28 Oct 2016 19:35:47 +0000 (21:35 +0200)]
mlxsw: switchx2: Check if port is usable before calling port create
Do it in a same way we do it in spectrum. Check if port is usable first
and only in that case create a port instance.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Fri, 28 Oct 2016 19:35:46 +0000 (21:35 +0200)]
mlxsw: core: Zero payload buffers for couple of registers
We recently discovered a bug in the firmware in which a field's length in
one of the registers was incorrectly set. This caused the firmware to
access garbage data that wasn't initialized by the driver and therefore
emit error messages.
While the bug is already fixed and the driver usually zeros the buffers
passed to the firmware, there are a handful of cases where this isn't
done. Zero the buffer in these cases and prevent similar bugs from
recurring, as they tend to be hard to debug.
Fixes: 52581961d83d ("mlxsw: core: Implement fan control using hwmon")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 30 Oct 2016 16:42:58 +0000 (12:42 -0400)]
Merge git://git./linux/kernel/git/davem/net
Mostly simple overlapping changes.
For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Bloch [Tue, 20 Sep 2016 07:58:29 +0000 (07:58 +0000)]
net/mlx5e: Add tc support for FWD rule with counter
When creating a FWD rule using tc create also a HW counter
for this rule.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mark Bloch [Tue, 20 Sep 2016 07:59:05 +0000 (07:59 +0000)]
net/mlx5: Add option to add fwd rule with counter
Currently the code supports only drop rules to possess counters,
add that ability also for fwd rules.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mark Bloch [Wed, 31 Aug 2016 11:24:25 +0000 (11:24 +0000)]
net/mlx5: Add multi dest support
Currently when calling mlx5_add_flow_rule we accept
only one flow destination, this commit allows to pass
multiple destinations.
This change forces us to change the return structure to a more
flexible one. We introduce a flow handle (struct mlx5_flow_handle),
it holds internally the number for rules created and holds an array
where each cell points the to a flow rule.
From the consumers (of mlx5_add_flow_rule) point of view this
change is only cosmetic and requires only to change the type
of the returned value they store.
From the core point of view, we now need to use a loop when
allocating and deleting rules (e.g given to us a flow handler).
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mark Bloch [Sun, 11 Sep 2016 13:03:06 +0000 (13:03 +0000)]
net/mlx5: Group similer rules under the same fte
When adding a new rule, if we can match it with compare_match_value and
flow tag we might be able to insert the rule to the same fte.
In order to do that, there must be an overlap between the actions of the
fte and the new rule.
When updating the action of an existing fte, we must tell the firmware
we are doing so.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mark Bloch [Fri, 16 Sep 2016 07:00:10 +0000 (07:00 +0000)]
net/mlx5: Refactor find_flow_rule
The way we compare between two dests will need to be used in other
places in the future, so we factor out the comparison logic
between two dests into a separate function.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mark Bloch [Sun, 11 Sep 2016 12:54:50 +0000 (12:54 +0000)]
net/mlx5: Use fte status to decide on firmware command
An fte status becomes FS_FTE_STATUS_EXISTING only after it was
created in HW. We can use this in order to simplify the logic on
what firmware command to use. If the status isn't FS_FTE_STATUS_EXISTING
we need to create the fte, otherwise we need only to update it.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mark Bloch [Mon, 5 Sep 2016 10:58:04 +0000 (10:58 +0000)]
net/mlx5: Don't unlock fte while still using it
When adding a new rule to an fte, we need to hold the fte lock
until we add that rule to the fte and increase the fte ref count.
Fixes: 0c56b97503fd ("net/mlx5_core: Introduce flow steering API")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mohamad Haj Yahia [Thu, 11 Aug 2016 08:28:21 +0000 (11:28 +0300)]
net/mlx5: Add SRIOV VF max rate configuration support
Implement the vf set rate ndo by modifying the TSAR vport rate limit.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mohamad Haj Yahia [Thu, 11 Aug 2016 08:26:36 +0000 (11:26 +0300)]
net/mlx5: Introduce E-switch QoS management
Add TSAR to the eswitch which will act as the vports rate limiter.
Create/Destroy TSAR on Enable/Dsiable SRIOV.
Attach/Detach vport to eswitch TSAR on Enable/Disable vport.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Mohamad Haj Yahia [Thu, 11 Aug 2016 08:21:39 +0000 (11:21 +0300)]
net/mlx5: Introduce TSAR manipulation firmware commands
TSAR (stands for Transmit Scheduling ARbiter) is a hardware component
that is responsible for selecting the next entity to serve on the
transmit path.
The arbitration defines the QoS policy between the agents connected to
the TSAR.
The TSAR is a consist two main features:
1) BW Allocation between agents:
The TSAR implements a defecit weighted round robin between the agents.
Each agent attached to the TSAR is assigned with a weight and it is
awarded transmission tokens according to this weight.
2) Rate limer per agent:
Each agent attached to the TSAR is (optionally) assigned with a rate
limit.
TSAR will not allow scheduling for an agent exceeding its defined rate
limit.
In this patch we implement the API of manipulating the TSAR.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Saeed Mahameed [Tue, 18 Oct 2016 18:35:35 +0000 (21:35 +0300)]
net/mlx5: Add ConnectX-5 PCIe 4.0 VF device ID
For the mlx5 driver to support ConnectX-5 PCIe 4.0 VFs, we add the
device ID "0x101a" to mlx5_core_pci_table.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Eugenia Emantayev [Wed, 28 Sep 2016 14:29:41 +0000 (17:29 +0300)]
net/mlx5: Fix length of async_event_mask
According to PRM async_event_mask have to be 64 bits long.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Artemy Kovalyov [Wed, 31 Aug 2016 05:29:58 +0000 (05:29 +0000)]
net/mlx5: Ensure SRQ physical address structure endianness
SRQ physical address structure field should be in big-endian format.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Artemy Kovalyov [Wed, 31 Aug 2016 05:17:54 +0000 (05:17 +0000)]
net/mlx5: Update struct mlx5_ifc_xrqc_bits
Update struct mlx5_ifc_xrqc_bits according to last specification
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Saeed Mahameed [Thu, 29 Sep 2016 16:35:38 +0000 (19:35 +0300)]
IB/mlx5: Skip handling unknown events
Do not dispatch unknown mlx5 core events on mlx5_ib_event.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Linus Torvalds [Sun, 30 Oct 2016 03:33:20 +0000 (20:33 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"Lots of fixes, mostly drivers as is usually the case.
1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
Khoroshilov.
2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
Pedersen.
3) Don't put aead_req crypto struct on the stack in mac80211, from
Ard Biesheuvel.
4) Several uninitialized variable warning fixes from Arnd Bergmann.
5) Fix memory leak in cxgb4, from Colin Ian King.
6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.
7) Several VRF semantic fixes from David Ahern.
8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.
9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.
10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.
11) Fix stale link state during failover in NCSCI driver, from Gavin
Shan.
12) Fix netdev lower adjacency list traversal, from Ido Schimmel.
13) Propvide proper handle when emitting notifications of filter
deletes, from Jamal Hadi Salim.
14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.
15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.
16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.
17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.
18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
Leitner.
19) Revert a netns locking change that causes regressions, from Paul
Moore.
20) Add recursion limit to GRO handling, from Sabrina Dubroca.
21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.
22) Avoid accessing stale vxlan/geneve socket in data path, from
Pravin Shelar"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
geneve: avoid using stale geneve socket.
vxlan: avoid using stale vxlan socket.
qede: Fix out-of-bound fastpath memory access
net: phy: dp83848: add dp83822 PHY support
enic: fix rq disable
tipc: fix broadcast link synchronization problem
ibmvnic: Fix missing brackets in init_sub_crq_irqs
ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
net/mlx4_en: Save slave ethtool stats command
net/mlx4_en: Fix potential deadlock in port statistics flow
net/mlx4: Fix firmware command timeout during interrupt test
net/mlx4_core: Do not access comm channel if it has not yet been initialized
net/mlx4_en: Fix panic during reboot
net/mlx4_en: Process all completions in RX rings after port goes up
net/mlx4_en: Resolve dividing by zero in 32-bit system
net/mlx4_core: Change the default value of enable_qos
net/mlx4_core: Avoid setting ports to auto when only one port type is supported
net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
...
Stefan Richter [Sat, 29 Oct 2016 20:16:58 +0000 (22:16 +0200)]
firewire: net: really fix maximum possible MTU
The maximum unicast datagram size /without/ link fragmentation is
4096 - 4 = 4092 (max IEEE 1394 async payload size at >= S800 bus speed,
minus unfragmented encapssulation header). Max broadcast datagram size
without fragmentation is 8 bytes less than that (due to GASP header).
The maximum datagram size /with/ link fragmentation is 0xfff = 4095
for unicast and broadcast. This is because the RFC 2734 fragment
encapsulation header field for datagram size is only 12 bits wide.
Fixes: 5d48f00d836a('firewire: net: fix maximum possible MTU')
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Fri, 28 Oct 2016 23:01:41 +0000 (16:01 -0700)]
genetlink: Fix generic netlink family unregister
This patch fixes a typo in unregister operation.
Following crash is fixed by this patch. It can be easily reproduced
by repeating modprobe and rmmod module that uses genetlink.
[ 261.446686] BUG: unable to handle kernel paging request at
ffffffffa0264088
[ 261.448921] IP: [<
ffffffff813cb70e>] strcmp+0xe/0x30
[ 261.450494] PGD
1c09067
[ 261.451266] PUD
1c0a063
[ 261.452091] PMD
8068d5067
[ 261.452525] PTE 0
[ 261.453164]
[ 261.453618] Oops: 0000 [#1] SMP
[ 261.454577] Modules linked in: openvswitch(+) ...
[ 261.480753] RIP: 0010:[<
ffffffff813cb70e>] [<
ffffffff813cb70e>] strcmp+0xe/0x30
[ 261.483069] RSP: 0018:
ffffc90003c0bc28 EFLAGS:
00010282
[ 261.510145] Call Trace:
[ 261.510896] [<
ffffffff816f10ca>] genl_family_find_byname+0x5a/0x70
[ 261.512819] [<
ffffffff816f2319>] genl_register_family+0xb9/0x630
[ 261.514805] [<
ffffffffa02840bc>] dp_init+0xbc/0x120 [openvswitch]
[ 261.518268] [<
ffffffff8100217d>] do_one_initcall+0x3d/0x160
[ 261.525041] [<
ffffffff811808a9>] do_init_module+0x60/0x1f1
[ 261.526754] [<
ffffffff8110687f>] load_module+0x22af/0x2860
[ 261.530144] [<
ffffffff81107026>] SYSC_finit_module+0x96/0xd0
[ 261.531901] [<
ffffffff8110707e>] SyS_finit_module+0xe/0x10
[ 261.533605] [<
ffffffff8100391e>] do_syscall_64+0x6e/0x180
[ 261.535284] [<
ffffffff817c2faf>] entry_SYSCALL64_slow_path+0x25/0x25
[ 261.546512] RIP [<
ffffffff813cb70e>] strcmp+0xe/0x30
[ 261.550198] ---[ end trace
76505a814dd68770 ]---
Fixes: 2ae0f17df1c ("genetlink: use idr to track families").
Reported-by: Jarno Rajahalme <jarno@ovn.org>
CC: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Fri, 28 Oct 2016 16:59:16 +0000 (09:59 -0700)]
geneve: avoid using stale geneve socket.
This patch is similar to earlier vxlan patch.
Geneve device close operation frees geneve socket. This
operation can race with geneve-xmit function which
dereferences geneve socket. Following patch uses RCU
mechanism to avoid this situation.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Fri, 28 Oct 2016 16:59:15 +0000 (09:59 -0700)]
vxlan: avoid using stale vxlan socket.
When vxlan device is closed vxlan socket is freed. This
operation can race with vxlan-xmit function which
dereferences vxlan socket. Following patch uses RCU
mechanism to avoid this situation.
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sat, 29 Oct 2016 14:04:35 +0000 (17:04 +0300)]
qede: Fix out-of-bound fastpath memory access
Driver allocates a shadow array for transmitted SKBs with X entries;
That means valid indices are {0,...,X - 1}. [X == 8191]
Problem is the driver also uses X as a mask for a
producer/consumer in order to choose the right entry in the
array which allows access to entry X which is out of bounds.
To fix this, simply allocate X + 1 entries in the shadow array.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Fri, 28 Oct 2016 10:10:11 +0000 (12:10 +0200)]
net: phy: Add support for Microsemi VSC 8530/40 Fast Ethernet PHY
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Oct 2016 21:28:45 +0000 (17:28 -0400)]
Merge tag 'mac80211-next-for-davem-2016-10-28' of git://git./linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Among various cleanups and improvements, we have the following:
* client FILS authentication support in mac80211 (Jouni)
* AP/VLAN multicast improvements (Michael Braun)
* config/advertising support for differing beacon intervals on
multiple virtual interfaces (Purushottam Kushwaha, myself)
* deprecate the old WDS mode for cfg80211-based drivers, the
mode is hardly usable since it doesn't support any "modern"
features like WPA encryption (2003), HT (2009) or VHT (2014),
I'm not even sure WEP (introduced in 1997) could be done.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Quadros [Fri, 28 Oct 2016 09:40:20 +0000 (12:40 +0300)]
net: phy: dp83848: add dp83822 PHY support
This PHY has a compatible register set with DP83848x so
add support for it.
Acked-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 27 Oct 2016 23:01:03 +0000 (16:01 -0700)]
enic: fix rq disable
When MTU is changed from 9000 to 1500 while there is burst of inbound 9000
bytes packets, adaptor sometimes delivers 9000 bytes packets to 1500 bytes
buffers. This causes memory corruption and sometimes crash.
This is because of a race condition in adaptor between "RQ disable"
clearing descriptor mini-cache and mini-cache valid bit being set by
completion of descriptor fetch. This can result in stale RQ desc being
cached and used when packets arrive. In this case, the stale descriptor
have old MTU value.
Solution is to write RQ->disable twice. The first write will stop any
further desc fetches, allowing the second disable to clear the mini-cache
valid bit without danger of a race.
Also, the check for rq->running becoming 0 after writing rq->enable to 0
is not done properly. When incoming packets are flooding the interface,
rq->running will pulse high for each dropped packet. Since the driver was
waiting for 10us between each poll, it is possible to see rq->running = 1
1000 times in a row, even though it is not actually stuck running.
This results in false failure of vnic_rq_disable(). Fix is to try more
than 1000 time without delay between polls to ensure we do not miss when
running goes low.
In old adaptors rq->enable needs to be re-written to 0 when posted_index
is reset in vnic_rq_clean() in order to keep rq->prefetch_index in sync.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 27 Oct 2016 22:51:55 +0000 (18:51 -0400)]
tipc: fix broadcast link synchronization problem
In commit
2d18ac4ba745 ("tipc: extend broadcast link initialization
criteria") we tried to fix a problem with the initial synchronization
of broadcast link acknowledge values. Unfortunately that solution is
not sufficient to solve the issue.
We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
broadcast acknowledge numbers, leading to premature opening of the
broadcast link. When the bypassed packets finally arrive, they are
inadvertently accepted, and the already correctly initialized
acknowledge number in the broadcast receive link is overwritten by
the invalid (zero) value of the said packets. After this the broadcast
link goes stale.
We now fix this by marking the packets where we know the acknowledge
value is or may be invalid, and then ignoring the acks from those.
To this purpose, we claim an unused bit in the header to indicate that
the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL
synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
packets, plus those LINK_PROTOCOL packets sent out before the broadcast
links are fully synchronized.
This minor protocol update is fully backwards compatible.
Reported-by: John Thompson <thompa.atl@gmail.com>
Tested-by: John Thompson <thompa.atl@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 27 Oct 2016 17:28:52 +0000 (12:28 -0500)]
ibmvnic: Fix missing brackets in init_sub_crq_irqs
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 27 Oct 2016 17:28:51 +0000 (12:28 -0500)]
ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
Schedule these XPORT event tasks in the shared workqueue
so that IRQs are not freed in an interrupt context when
sub-CRQs are released.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Oct 2016 21:18:17 +0000 (17:18 -0400)]
Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
This reverts commit
8d7533e5aaad1c94386a8101a36b0617987966b7.
It introduced kbuild failures, new version coming.
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 27 Oct 2016 20:32:22 +0000 (22:32 +0200)]
rocker: set physical device for port netdevice
Do this so the sysfs has "device" link correctly set.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Oct 2016 21:14:19 +0000 (17:14 -0400)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2016-10-27
This series contains fixes to ixgbe and i40e.
Emil fixes a NULL pointer dereference when a macvlan interface is brought
up while the PF is still down.
David root caused the original panic that was fixed by commit id
(
a036244c068612 "i40e: Fix kernel panic on enable/disable LLDP") and the
fix was not quite correct, so removed the get_default_tc() and replaced
it with a #define since there is only one TC supported as a default.
Guilherme Piccoli fixes an issue where if we modprobe the driver module
without enough MSI-X interrupts, then unload the module and reload it
again, the kernel would crash. So if we fail to allocate enough MSI-X
interrupts, we should disable them since they were previously enabled.
Huaibin Wang found that the order of the arguments for
ndo_dflt_bridge_getlink() were in the correct order, so fix the order.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Thu, 27 Oct 2016 17:26:37 +0000 (13:26 -0400)]
tcp_bbr: add a state transition diagram and accompanying comment
Document the possible state transitions for a BBR flow, and also add a
prose summary of the state machine, covering the life of a typical BBR
flow.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 27 Oct 2016 14:30:06 +0000 (16:30 +0200)]
arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
Commit
01cfbad "ipv4: Update parameters for csum_tcpudp_magic to their
original types" changed parameters for csum_tcpudp_magic and
csum_tcpudp_nofold for many platforms but not for PowerPC.
Fixes: 01cfbad "ipv4: Update parameters for csum_tcpudp_magic to their original types"
Cc: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 29 Oct 2016 20:52:02 +0000 (13:52 -0700)]
Linux 4.9-rc3
Linus Torvalds [Sat, 29 Oct 2016 20:42:44 +0000 (13:42 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 bugfix from Thomas Gleixner:
"A single bugfix for the recent changes related to registering the boot
cpu when this has not happened before prefill_possible_map().
The main problem with this change got fixed already, but we missed the
case where the local APIC is not yet mapped, when prefill_possible_map()
is invoked, so the registration of the boot cpu which has the APIC bit
set in CPUID will explode.
I should have seen that issue earlier, but all I can do now is feeling
embarassed"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smpboot: Init apic mapping before usage
David S. Miller [Sat, 29 Oct 2016 20:26:50 +0000 (16:26 -0400)]
Merge tag 'batadv-next-for-davem-
20161027' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This code cleanup patchset includes the following changes (chronological
order):
- bump version strings, by Simon Wunderlich
- README updates/clean up, by Sven Eckelmann (4 patches)
- Code clean up and restructuring by Sven Eckelmann (2 patches)
- Kerneldoc fix in forw_packet structure, by Linus Luessing
- Remove unused argument in dbg_arp, by Antonio Quartulli
- Add support to build batman-adv without wireless, by Linus Luessing
- Restructure error handling for is_ap_isolated, by Markus Elfring
- Remove unused initialization in various functions, by Sven Eckelmann
- Use better names for fragment and gateway list heads, by Sven
Eckelmann (2 patches)
- Convert to octal permissions for files, by Sven Eckelmann
- Avoid precedence issues for some macros, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Oct 2016 20:23:49 +0000 (16:23 -0400)]
Merge branch 'mlx4-fixes'
Tariq Toukan says:
====================
mlx4 misc fixes for 4.9
This patchset contains several bug fixes from the team to the
mlx4 Eth and Core drivers.
Series generated against net commit:
ecc515d7238f 'sctp: fix the panic caused by route update'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Thu, 27 Oct 2016 13:27:22 +0000 (16:27 +0300)]
net/mlx4_en: Save slave ethtool stats command
Following the previous patch, as an optimization, the slave will
not even bother sending the DUMP_ETH_STATS command over the
comm channel.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Thu, 27 Oct 2016 13:27:21 +0000 (16:27 +0300)]
net/mlx4_en: Fix potential deadlock in port statistics flow
mlx4_en_DUMP_ETH_STATS took the *counter mutex* and then
called the FW command, with WRAPPED attribute. As a result, the fw command
is wrapped on the Hypervisor when it calls mlx4_en_DUMP_ETH_STATS.
The FW command wrapper flow on the hypervisor takes the *slave_cmd_mutex*
during processing.
At the same time, a VF could be in the process of coming up, and could
call mlx4_QUERY_FUNC_CAP. On the hypervisor, the command flow takes the
*slave_cmd_mutex*, then executes mlx4_QUERY_FUNC_CAP_wrapper.
mlx4_QUERY_FUNC_CAP wrapper calls mlx4_get_default_counter_index(),
which takes the *counter mutex*. DEADLOCK.
The fix is that the DUMP_ETH_STATS fw command should be called with
the NATIVE attribute, so that on the hypervisor, this command does not
enter the wrapper flow.
Since the Hypervisor no longer goes through the wrapper code, we also
simply return 0 in mlx4_DUMP_ETH_STATS_wrapper (i.e.the function succeeds,
but the returned data will be all zeroes).
No need to test if it is the Hypervisor going through the wrapper.
Fixes: f9baff509f8a ("mlx4_core: Add "native" argument to mlx4_cmd ...")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eugenia Emantayev [Thu, 27 Oct 2016 13:27:20 +0000 (16:27 +0300)]
net/mlx4: Fix firmware command timeout during interrupt test
Currently interrupt test that is part of ethtool selftest runs the
check over all interrupt vectors of the device.
In mlx4_en package part of interrupt vectors are uninitialized since
mlx4_ib doesn't exist. This causes NOP FW command to time out.
Change logic to test current port interrupt vectors only.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Thu, 27 Oct 2016 13:27:19 +0000 (16:27 +0300)]
net/mlx4_core: Do not access comm channel if it has not yet been initialized
In the Hypervisor, there are several FW commands which are invoked
before the comm channel is initialized (in mlx4_multi_func_init).
These include MOD_STAT_CONFIG, QUERY_DEV_CAP, INIT_HCA, and others.
If any of these commands fails, say with a timeout, the Hypervisor
driver enters the internal error reset flow. In this flow, the driver
attempts to notify all slaves via the comm channel that an internal error
has occurred.
Since the comm channel has not yet been initialized (i.e., mapped via
ioremap), this will cause dereferencing a NULL pointer.
To fix this, do not access the comm channel in the internal error flow
if it has not yet been initialized.
Fixes: 55ad359225b2 ("net/mlx4_core: Enable device recovery flow with SRIOV")
Fixes: ab9c17a009ee ("mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eugenia Emantayev [Thu, 27 Oct 2016 13:27:18 +0000 (16:27 +0300)]
net/mlx4_en: Fix panic during reboot
Fix a kernel panic that occurs as a result of an asynchronous event
handled in roce_gid_mgmt:
mlx4_en_get_drvinfo is called and accesses freed resources.
This happens in a shutdown flow only, since pci device is destroyed
while netdevice is still alive.
Fixes: c27a02cd94d6 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erez Shitrit [Thu, 27 Oct 2016 13:27:17 +0000 (16:27 +0300)]
net/mlx4_en: Process all completions in RX rings after port goes up
Currently there is a race between incoming traffic and
initialization flow. HW is able to receive the packets
after INIT_PORT is done and unicast steering is configured.
Before we set priv->port_up NAPI is not scheduled and
receive queues become full. Therefore we never get
new interrupts about the completions.
This issue could happen if running heavy traffic during
bringing port up.
The resolution is to schedule NAPI once port_up is set.
If receive queues were full this will process all cqes
and release them.
Fixes: c27a02cd94d6 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eugenia Emantayev [Thu, 27 Oct 2016 13:27:16 +0000 (16:27 +0300)]
net/mlx4_en: Resolve dividing by zero in 32-bit system
When doing roundup_pow_of_two for large enough number with
bit 31, an overflow will occur and a value equal to 1 will
be returned. In this case 1 will be subtracted from the return
value and division by zero will be reached.
Fixes: 31c128b66e5b ("net/mlx4_en: Choose time-stamping shift value according to HW frequency")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moshe Lazer [Thu, 27 Oct 2016 13:27:15 +0000 (16:27 +0300)]
net/mlx4_core: Change the default value of enable_qos
Change the default status of quality of service back to disabled,
as it hurts performance in some cases.
Fixes: 38438f7c7e8c ("net/mlx4: Set enhanced QoS support by default when ...")
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maor Gottlieb [Thu, 27 Oct 2016 13:27:14 +0000 (16:27 +0300)]
net/mlx4_core: Avoid setting ports to auto when only one port type is supported
When only one port type is supported, it should be read only.
We reject changing requests, even to the auto sense mode.
Fixes: 27bf91d6a0d5 ("mlx4_core: Add link type autosensing")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Thu, 27 Oct 2016 13:27:13 +0000 (16:27 +0300)]
net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
The resource type enum in the resource tracker was incorrect.
RES_EQ was put in the position of RES_NPORT_ID (a FC resource).
Since the remaining resources maintain their current values,
and RES_EQ is not passed from slaves to the hypervisor in any
FW command, this change affects only the hypervisor.
Therefore, there is no backwards-compatibility issue.
Fixes: 623ed84b1f95 ("mlx4_core: initial header-file changes for SRIOV support")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Oct 2016 20:19:41 +0000 (16:19 -0400)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2016-10-28
This series contains updates to i40e and i40evf only.
Carolyn provides a couple of fixes, first resolving a problem in the
client interface that was causing random stack traces in the RDMA driver
which was due to a timing related NULL pointer dereference. Fixed a
problem where it could take a very long time to print the link down
notification, by changing how often we update link info from firmware.
Alex provides a number of changes, first is a re-write of the bust wait
loop in the Flow Director transmit function to reduce code size. Cleans
up unused code in favor of the same functionality which can be inlined.
Dropped the functionality for SCTP since we cannot currently support it.
Cleans up redundant code in the receive clean-up path. Finally cleaned
up the convoluted configuration for how the driver handled the debug
flags contained in msg_level.
Filip fixes an incorrect bit mask which was being used for testing the
"get link status". Cleaned up a workaround that is no longer needed
for production NICs and was causing frames to pass while disregarding
the VLAN tagging.
Mitch brings another fix for the client interface supporting the VF RDMA
driver to allow clients to recover from reset by re-opening existing
clients.
Alan fixes a bug in which a "perfect storm" can occur and cause interrupts
to fail to be correctly affinitized.
Lihong fixes a confusing dmesg reported when users were using ethtool -L
option.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 29 Oct 2016 20:15:24 +0000 (13:15 -0700)]
Merge tag 'upstream-4.9-rc3' of git://git.infradead.org/linux-ubifs
Pull ubi/ubifs fixes from Richard Weinberger:
"This contains fixes for issues in both UBI and UBIFS:
- A regression wrt overlayfs, introduced in -rc2.
- An UBI issue, found by Dan Carpenter's static checker"
* tag 'upstream-4.9-rc3' of git://git.infradead.org/linux-ubifs:
ubifs: Fix regression in ubifs_readdir()
ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap()
shamir rabinovitch [Thu, 27 Oct 2016 09:46:38 +0000 (05:46 -0400)]
rds: debug messages are enabled by default
rds use Kconfig option called "RDS_DEBUG" to enable rds debug messages.
This option cause the rds Makefile to add -DDEBUG to the rds gcc command
line.
When CONFIG_DYNAMIC_DEBUG is enabled, the "DEBUG" macro is used by
include/linux/dynamic_debug.h to decide if dynamic debug prints should
be sent by default to the kernel log.
rds should not enable this macro for production builds. rds dynamic
debug work as expected follow this fix.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 27 Oct 2016 09:23:51 +0000 (11:23 +0200)]
bpf: Print function name in addition to function id
The verifier currently prints raw function ids when printing CALL
instructions or when complaining:
5: (85) call 23
unknown func 23
print a meaningful function name instead:
5: (85) call bpf_redirect#23
unknown func bpf_redirect#23
Moves the function documentation to a single comment and renames all
helpers names in the list to conform to the bpf_ prefix notation so
they can be greped in the kernel source.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 29 Oct 2016 19:54:16 +0000 (15:54 -0400)]
Merge tag 'mac80211-for-davem-2016-10-27' of git://git./linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just two fixes:
* a fix to process all events while suspending, so any
potential calls into the driver are done before it is
suspended
* small markup fixes for the sphinx documentation conversion
that's coming into the tree via the doc tree
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 26 Oct 2016 20:21:33 +0000 (13:21 -0700)]
net: dev: Fix non-RCU based lower dev walker
netdev_walk_all_lower_dev is not properly walking the lower device
list. Commit
1a3f060c1a47 made netdev_walk_all_lower_dev similar
to netdev_walk_all_upper_dev_rcu and netdev_walk_all_lower_dev_rcu
but failed to update its netdev_next_lower_dev iterator. This patch
fixes that.
Fixes: 1a3f060c1a47 ("net: Introduce new api for walking upper and
lower devices")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Wed, 26 Oct 2016 18:57:38 +0000 (13:57 -0500)]
ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
Schedule these XPORT event tasks in the shared workqueue
so that IRQs are not freed in an interrupt context when
sub-CRQs are released.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Gunthorpe [Wed, 26 Oct 2016 17:47:02 +0000 (11:47 -0600)]
net: mv643xx_eth: Fetch the phy connection type from DT
The MAC is capable of RGMII mode and that is probably a more typical
connection type than GMII today (eg it is used by Marvell Reference
designs for several SOCs). Let DT users specify the standard
phy-connection-type = "rgmii-id";
On a phy node.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 26 Oct 2016 16:49:46 +0000 (18:49 +0200)]
flow_dissector: __skb_get_hash_symmetric arg can be const
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 26 Oct 2016 16:27:57 +0000 (09:27 -0700)]
tcp/dccp: drop SYN packets if accept queue is full
Per listen(fd, backlog) rules, there is really no point accepting a SYN,
sending a SYNACK, and dropping the following ACK packet if accept queue
is full, because application is not draining accept queue fast enough.
This behavior is fooling TCP clients that believe they established a
flow, while there is nothing at server side. They might then send about
10 MSS (if using IW10) that will be dropped anyway while server is under
stress.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>