openwrt/staging/blogic.git
8 years agoliquidio CN23XX: VF interrupt
Raghu Vatsavayi [Tue, 29 Nov 2016 00:54:40 +0000 (16:54 -0800)]
liquidio CN23XX: VF interrupt

Adds support for VF interrupt processing.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoliquidio CN23XX: VF mailbox
Raghu Vatsavayi [Tue, 29 Nov 2016 00:54:39 +0000 (16:54 -0800)]
liquidio CN23XX: VF mailbox

Adds support for VF mailbox setup.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoliquidio CN23XX: init VF softcommand queues
Raghu Vatsavayi [Tue, 29 Nov 2016 00:54:38 +0000 (16:54 -0800)]
liquidio CN23XX: init VF softcommand queues

Adds support for initializing softcommand, dispatch and
instructions queues for VF.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoliquidio CN23XX: VF register access
Raghu Vatsavayi [Tue, 29 Nov 2016 00:54:37 +0000 (16:54 -0800)]
liquidio CN23XX: VF register access

This patch adds support for VF device register access.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoliquidio CN23XX: VF queue setup
Raghu Vatsavayi [Tue, 29 Nov 2016 00:54:36 +0000 (16:54 -0800)]
liquidio CN23XX: VF queue setup

Adds support for configuring VF input/output queues.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoliquidio CN23XX: VF config setup
Raghu Vatsavayi [Tue, 29 Nov 2016 00:54:35 +0000 (16:54 -0800)]
liquidio CN23XX: VF config setup

Adds support for setting up VF configuration.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoliquidio CN23XX: VF registration
Raghu Vatsavayi [Tue, 29 Nov 2016 00:54:34 +0000 (16:54 -0800)]
liquidio CN23XX: VF registration

Adds support for cn23xx VF probe and registration.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoliquidio CN23XX: VF register definitions
Raghu Vatsavayi [Tue, 29 Nov 2016 00:54:33 +0000 (16:54 -0800)]
liquidio CN23XX: VF register definitions

Adds support for CN23xx VF registers.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosamples: bpf: Refactor test_cgrp2_attach -- use getopt, and add mode
Sargun Dhillon [Mon, 28 Nov 2016 22:52:42 +0000 (14:52 -0800)]
samples: bpf: Refactor test_cgrp2_attach -- use getopt, and add mode

This patch modifies test_cgrp2_attach to use getopt so we can use standard
command line parsing.

It also adds an option to run the program in detach only mode. This does
not attach a new filter at the cgroup, but only runs the detach command.

Lastly, it changes the attach code to not detach and then attach. It relies
on the 'hotswap' behaviour of CGroup BPF programs to be able to change
in-place. If detach-then-attach behaviour needs to be tested, the example
can be run in detach only mode prior to attachment.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: brocade: bna: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Mon, 28 Nov 2016 22:52:19 +0000 (23:52 +0100)]
net: brocade: bna: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Rasesh Mody <Rasesh.Mody@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf, xdp: allow to pass flags to dev_change_xdp_fd
Daniel Borkmann [Mon, 28 Nov 2016 22:16:54 +0000 (23:16 +0100)]
bpf, xdp: allow to pass flags to dev_change_xdp_fd

Add an IFLA_XDP_FLAGS attribute that can be passed for setting up
XDP along with IFLA_XDP_FD, which eventually allows user space to
implement typical add/replace/delete logic for programs. Right now,
calling into dev_change_xdp_fd() will always replace previous programs.

When passed XDP_FLAGS_UPDATE_IF_NOEXIST, we can handle this more
graceful when requested by returning -EBUSY in case we try to
attach a new program, but we find that another one is already
attached. This will be used by upcoming front-end for iproute2 as
well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'broadcom-phy-internal-counters'
David S. Miller [Wed, 30 Nov 2016 15:22:28 +0000 (10:22 -0500)]
Merge branch 'broadcom-phy-internal-counters'

Florian Fainelli says:

====================
net: phy: broadcom: Support for PHY counters

This patch series adds support for reading the Broadcom PHYs internal counters.

Changes in v3:

- fixed the allocation of priv->stats in bcm7xxx

Changes in v2:

- fixed modular build reported by kbuild

- constify array of stats
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: phy: bcm7xxx: Plug in support for reading PHY error counters
Florian Fainelli [Tue, 29 Nov 2016 17:57:18 +0000 (09:57 -0800)]
net: phy: bcm7xxx: Plug in support for reading PHY error counters

Broadcom BCM7xxx internal PHYs can leverage the Broadcom PHY library
module PHY error counters helper functions, just implement the
appropriate PHY driver function calls to do so. We need to allocate some
storage space for our PHY statistics, and provide it to the Broadcom PHY
library, so do this in a specific probe function, and slightly wrap the
get_stats function call.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: phy: broadcom: Add support code for reading PHY counters
Florian Fainelli [Tue, 29 Nov 2016 17:57:17 +0000 (09:57 -0800)]
net: phy: broadcom: Add support code for reading PHY counters

Broadcom PHYs expose a number of PHY error counters: receive errors,
false carrier sense, SerDes BER count, local and remote receive errors.
Add support code to allow retrieving these error counters. Since the
Broadcom PHY library code is used by several drivers, make it possible
for them to specify the storage for the software copy of the statistics.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver
Edward Cree [Mon, 28 Nov 2016 18:55:34 +0000 (18:55 +0000)]
sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver

Rationale: The differences between Falcon and Siena are in many ways larger
 than those between Siena and EF10 (despite Siena being nominally "Falcon-
 architecture"); for instance, Falcon has no MCPU, so there is no MCDI.
 Removing Falcon support from the sfc driver should simplify the latter,
 and avoid the possibility of Falcon support being broken by changes to sfc
 (which are rarely if ever tested on Falcon, it being end-of-lifed hardware).

The sfc-falcon driver created in this changeset is essentially a copy of the
 sfc driver, but with Siena- and EF10-specific code, including MCDI, removed
 and with the "efx_" identifier prefix changed to "ef4_" (for "EFX 4000-
 series") to avoid collisions when both drivers are built-in.

This changeset removes Falcon from the sfc driver's PCI ID table; then in
 sfc I've removed obvious Falcon-related code: I removed the Falcon NIC
 functions, Falcon PHY code, and EFX_REV_FALCON_*, then fixed up everything
 that referenced them.

Also, increment minor version of both drivers (to 4.1).

For now, CONFIG_SFC selects CONFIG_SFC_FALCON, so that updating old configs
 doesn't cause Falcon support to disappear; but that should be undone at
 some point in the future.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocpsw: ethtool: add support for nway reset
Yegor Yefremov [Mon, 28 Nov 2016 09:47:52 +0000 (10:47 +0100)]
cpsw: ethtool: add support for nway reset

This patch adds support for ethtool's '-r' command. Restarting
N-WAY negotiation can be useful to activate newly changed EEE
settings etc.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'tcp-sender-chronographs'
David S. Miller [Wed, 30 Nov 2016 15:04:32 +0000 (10:04 -0500)]
Merge branch 'tcp-sender-chronographs'

Yuchung Cheng says:

====================
tcp: sender chronographs instrumentation

This patch set provides instrumentation on TCP sender limitations.
While developing the BBR congestion control, we noticed that TCP
sending process is often limited by factors unrelated to congestion
control: insufficient sender buffer and/or insufficient receive
window/buffer to saturate the network bandwidth. Unfortunately these
limits are not visible to the users and often the poor performance
is attributed to the congestion control of choice.

Thie patch aims to help users get the high level understanding of
where sending process is limited by, similar to the TCP_INFO design.
It is not to replace detailed kernel tracing and instrumentation
facilities.

In addition this patch set provide a new option to the timestamping
work to instrument these limits on application data unit. For exampe,
one can use SO_TIMESTAMPING and this patch set to measure the how
long a particular HTTP response is limited by small receive window.

Patch set was initially written by Francis Yan then polished
by Yuchung Cheng, with lots of help from Eric Dumazet and Soheil
Hassas Yeganeh.
====================

Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
Francis Yan [Mon, 28 Nov 2016 07:07:18 +0000 (23:07 -0800)]
tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING

This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation.  For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.

To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: export sender limits chronographs to TCP_INFO
Francis Yan [Mon, 28 Nov 2016 07:07:17 +0000 (23:07 -0800)]
tcp: export sender limits chronographs to TCP_INFO

This patch exports all the sender chronograph measurements collected
in the previous patches to TCP_INFO interface. Note that busy time
exported includes all the other sending limits (rwnd-limited,
sndbuf-limited). Internally the time unit is jiffy but externally
the measurements are in microseconds for future extensions.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: instrument how long TCP is limited by insufficient send buffer
Francis Yan [Mon, 28 Nov 2016 07:07:16 +0000 (23:07 -0800)]
tcp: instrument how long TCP is limited by insufficient send buffer

This patch measures the amount of time when TCP runs out of new data
to send to the network due to insufficient send buffer, while TCP
is still busy delivering (i.e. write queue is not empty). The goal
is to indicate either the send buffer autotuning or user SO_SNDBUF
setting has resulted network under-utilization.

The measurement starts conservatively by checking various conditions
to minimize false claims (i.e. under-estimation is more likely).
The measurement stops when the SOCK_NOSPACE flag is cleared. But it
does not account the time elapsed till the next application write.
Also the measurement only starts if the sender is still busy sending
data, s.t. the limit accounted is part of the total busy time.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: instrument how long TCP is limited by receive window
Francis Yan [Mon, 28 Nov 2016 07:07:15 +0000 (23:07 -0800)]
tcp: instrument how long TCP is limited by receive window

This patch measures the total time when the TCP stops sending because
the receiver's advertised window is not large enough. Note that
once the limit is lifted we are likely in the busy status if we
have data pending.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: instrument how long TCP is busy sending
Francis Yan [Mon, 28 Nov 2016 07:07:14 +0000 (23:07 -0800)]
tcp: instrument how long TCP is busy sending

This patch measures TCP busy time, which is defined as the period
of time when sender has data (or FIN) to send. The time starts when
data is buffered and stops when the write queue is flushed by ACKs
or error events.

Note the busy time does not include SYN time, unless data is
included in SYN (i.e. Fast Open). It does include FIN time even
if the FIN carries no payload. Excluding pure FIN is possible but
would incur one additional test in the fast path, which may not
be worth it.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agotcp: instrument tcp sender limits chronographs
Francis Yan [Mon, 28 Nov 2016 07:07:13 +0000 (23:07 -0800)]
tcp: instrument tcp sender limits chronographs

This patch implements the skeleton of the TCP chronograph
instrumentation on sender side limits:

1) idle (unspec)
2) busy sending data other than 3-4 below
3) rwnd-limited
4) sndbuf-limited

The limits are enumerated 'tcp_chrono'. Since a connection in
theory can idle forever, we do not track the actual length of this
uninteresting idle period. For the rest we track how long the sender
spends in each limit. At any point during the life time of a
connection, the sender must be in one of the four states.

If there are multiple conditions worthy of tracking in a chronograph
then the highest priority enum takes precedence over
the other conditions. So that if something "more interesting"
starts happening, stop the previous chrono and start a new one.

The time unit is jiffy(u32) in order to save space in tcp_sock.
This implies application must sample the stats no longer than every
49 days of 1ms jiffy.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agocpsw: ethtool: add support for getting/setting EEE registers
Yegor Yefremov [Mon, 28 Nov 2016 08:41:33 +0000 (09:41 +0100)]
cpsw: ethtool: add support for getting/setting EEE registers

Add the ability to query and set Energy Efficient Ethernet parameters
via ethtool for applicable devices.

This patch doesn't activate full EEE support in cpsw driver, but it
enables reading and writing EEE advertising settings. This way one
can disable advertising EEE for certain speeds.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Acked-by: Rami Rosen <roszenrami@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agohv_netvsc: remove excessive logging on MTU change
Vitaly Kuznetsov [Mon, 28 Nov 2016 17:25:44 +0000 (18:25 +0100)]
hv_netvsc: remove excessive logging on MTU change

When we change MTU or the number of channels on a netvsc device we get the
following logged:

 hv_netvsc bf5edba8...: net device safe to remove
 hv_netvsc: hv_netvsc channel opened successfully
 hv_netvsc bf5edba8...: Send section size: 6144, Section count:2560
 hv_netvsc bf5edba8...: Device MAC 00:15:5d:1e:91:12 link state up

This information is useful as debug at most.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'mlxsw-enhancements'
David S. Miller [Wed, 30 Nov 2016 01:48:52 +0000 (20:48 -0500)]
Merge branch 'mlxsw-enhancements'

Jiri Pirko says:

====================
mlxsw: couple of enhancements and fixes

Couple of enhancements and fixes from Ido.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: core: Change order of operations in removal path
Ido Schimmel [Mon, 28 Nov 2016 17:01:26 +0000 (18:01 +0100)]
mlxsw: core: Change order of operations in removal path

We call bus->init() before allocating 'lag.mapping'. Change the order of
operations in removal path to reflect that.

This makes the error path of mlxsw_core_bus_device_register() symmetric
with mlxsw_core_bus_device_unregister().

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: core: Add missing rollback in error path
Ido Schimmel [Mon, 28 Nov 2016 17:01:25 +0000 (18:01 +0100)]
mlxsw: core: Add missing rollback in error path

Without this rollback, the thermal zone is still registered during the
error path, whereas its private data is freed upon the destruction of
the underlying bus device due to the use of devm_kzalloc(). This results
in use after free.

Fix this by calling mlxsw_thermal_fini() from the appropriate place in
the error path.

Fixes: a50c1e35650b ("mlxsw: core: Implement thermal zone")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum_buffers: Limit size of pools
Ido Schimmel [Mon, 28 Nov 2016 17:01:24 +0000 (18:01 +0100)]
mlxsw: spectrum_buffers: Limit size of pools

The shared buffer pools are containers whose size is used to calculate
the maximum usage for packets from / to a specific port / {port, PG/TC},
when dynamic threshold is employed.

While it's perfectly fine for the sum of the pools to exceed the maximum
size of the shared buffer, a single pool cannot.

Add a check when the pool size is set and forbid sizes larger than the
maximum size of the shared buffer.

Without the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
// No error is returned

With the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
devlink answers: Invalid argument

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: resources: Add maximum buffer size
Ido Schimmel [Mon, 28 Nov 2016 17:01:23 +0000 (18:01 +0100)]
mlxsw: resources: Add maximum buffer size

We need to be able to limit the size of shared buffer pools, so query
the maximum size from the device during init.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: switchib: add MLXSW_PCI dependency
Arnd Bergmann [Mon, 28 Nov 2016 14:26:04 +0000 (15:26 +0100)]
mlxsw: switchib: add MLXSW_PCI dependency

The newly added switchib driver fails to link if MLXSW_PCI=m:

drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function^Cmlxsw_sib_module_exit':
switchib.c:(.exit.text+0x8): undefined reference to `mlxsw_pci_driver_unregister'
switchib.c:(.exit.text+0x10): undefined reference to `mlxsw_pci_driver_unregister'
drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function `mlxsw_sib_module_init':
switchib.c:(.init.text+0x28): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x38): undefined reference to `mlxsw_pci_driver_register'
switchib.c:(.init.text+0x48): undefined reference to `mlxsw_pci_driver_unregister'

The other two such sub-drivers have a dependency, so add the same one
here. In theory we could allow this driver if MLXSW_PCI is disabled,
but it's probably not worth it.

Fixes: d1ba52638456 ("mlxsw: switchib: Introduce SwitchIB and SwitchIB silicon driver")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: phy: Fix use after free in phy_detach()
Geert Uytterhoeven [Mon, 28 Nov 2016 14:18:31 +0000 (15:18 +0100)]
net: phy: Fix use after free in phy_detach()

If device_release_driver(&phydev->mdio.dev) is called, it releases all
resources belonging to the PHY device. Hence the subsequent call to
phy_led_triggers_unregister() will access already freed memory when
unregistering the LEDs.

Move the call to phy_led_triggers_unregister() before the possible call
to device_release_driver() to fix this.

Fixes: 2e0bc452f4721520 ("net: phy: leds: add support for led triggers on phy link state change")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Zach Brown <zach.brown@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agostmmac: fix comments, make debug output consistent
Pavel Machek [Mon, 28 Nov 2016 11:55:59 +0000 (12:55 +0100)]
stmmac: fix comments, make debug output consistent

Fix comments, add some new, and make debugfs output consistent.

Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: cgroup: fix documentation of __cgroup_bpf_update()
Daniel Mack [Mon, 28 Nov 2016 13:11:04 +0000 (14:11 +0100)]
bpf: cgroup: fix documentation of __cgroup_bpf_update()

There's a 'not' missing in one paragraph. Add it.

Fixes: 3007098494be ("cgroup: add support for eBPF programs")
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reported-by: Rami Rosen <roszenrami@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'eee-broken-modes'
David S. Miller [Wed, 30 Nov 2016 00:38:31 +0000 (19:38 -0500)]
Merge branch 'eee-broken-modes'

Jerome Brunet says:

====================
Fix OdroidC2 Gigabit Tx link issue

This patchset fixes an issue with the OdroidC2 board (DWMAC + RTL8211F).
The platform seems to enter LPI on the Rx path too often while performing
relatively high TX transfer. This eventually break the link (both Tx and
Rx), and require to bring the interface down and up again to get the Rx
path working again.

The root cause of this issue is not fully understood yet but disabling EEE
advertisement on the PHY prevent this feature to be negotiated.
With this change, the link is stable and reliable, with the expected
throughput performance.

The patchset adds options in the generic phy driver to disable EEE
advertisement, through device tree. The way it is done is very similar
to the handling of the max-speed property.

Changes since V2: [2]
 - Rename "eee-advert-disable" to "eee-broken-modes" to make the intended
   purpose of this option clear (flag broken configuration, not a
   configuration option)
 - Add DT bindings constants so the DT configuration is more user friendly
 - Submit to net-next instead of net.

Changes since V1: [1]
 - Disable the advertisement of EEE in the generic code instead of the
   realtek driver.

[1] : http://lkml.kernel.org/r/1479220154-25851-1-git-send-email-jbrunet@baylibre.com
[2] : http://lkml.kernel.org/r/1479742524-30222-1-git-send-email-jbrunet@baylibre.com
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodt: bindings: add ethernet phy eee-broken-modes option documentation
jbrunet [Mon, 28 Nov 2016 09:46:48 +0000 (10:46 +0100)]
dt: bindings: add ethernet phy eee-broken-modes option documentation

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agodt-bindings: net: add EEE capability constants
jbrunet [Mon, 28 Nov 2016 09:46:47 +0000 (10:46 +0100)]
dt-bindings: net: add EEE capability constants

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: phy: add an option to disable EEE advertisement
jbrunet [Mon, 28 Nov 2016 09:46:46 +0000 (10:46 +0100)]
net: phy: add an option to disable EEE advertisement

This patch adds an option to disable EEE advertisement in the generic PHY
by providing a mask of prohibited modes corresponding to the value found in
the MDIO_AN_EEE_ADV register.

On some platforms, PHY Low power idle seems to be causing issues, even
breaking the link some cases. The patch provides a convenient way for these
platforms to disable EEE advertisement and work around the issue.

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: stmmac: enable tx queue 0 for gmac4 IPs synthesized with multiple TX queues
Niklas Cassel [Thu, 24 Nov 2016 14:36:33 +0000 (15:36 +0100)]
net: stmmac: enable tx queue 0 for gmac4 IPs synthesized with multiple TX queues

The dwmac4 IP can synthesized with 1-8 number of tx queues.
On an IP synthesized with DWC_EQOS_NUM_TXQ > 1, all txqueues are disabled
by default. For these IPs, the bitfield TXQEN is R/W.

Always enable tx queue 0. The write will have no effect on IPs synthesized
with DWC_EQOS_NUM_TXQ == 1.

The driver does still not utilize more than one tx queue in the IP.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: arc_emac: add dependencies on associated arches and compile test
Peter Robinson [Mon, 28 Nov 2016 07:12:37 +0000 (07:12 +0000)]
net: arc_emac: add dependencies on associated arches and compile test

Add dependencies on the architectures that support these devices and
add compile test to ensure ongoing code build coverage.

Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMAINTAINERS: Add device tree bindings to mv88e6xx section
Andreas Färber [Sun, 27 Nov 2016 22:59:51 +0000 (23:59 +0100)]
MAINTAINERS: Add device tree bindings to mv88e6xx section

Also include the netdev list for convenience, as done elsewhere.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlx4: give precise rx/tx bytes/packets counters
Eric Dumazet [Fri, 25 Nov 2016 15:46:20 +0000 (07:46 -0800)]
mlx4: give precise rx/tx bytes/packets counters

mlx4 stats are chaotic because a deferred work queue is responsible
to update them every 250 ms.

Even sampling stats every one second with "sar -n DEV 1" gives
variations like the following :

lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:39:22         eth0 146877.00 3265554.00   9467.15 4828168.50
07:39:23         eth0 146587.00 3260329.00   9448.15 4820445.98
07:39:24         eth0 146894.00 3259989.00   9468.55 4819943.26
07:39:25         eth0 110368.00 2454497.00   7113.95 3629012.17  <<>>
07:39:26         eth0 146563.00 3257502.00   9447.25 4816266.23
07:39:27         eth0 145678.00 3258292.00   9389.79 4817414.39
07:39:28         eth0 145268.00 3253171.00   9363.85 4809852.46
07:39:29         eth0 146439.00 3262185.00   9438.97 4823172.48
07:39:30         eth0 146758.00 3264175.00   9459.94 4826124.13
07:39:31         eth0 146843.00 3256903.00   9465.44 4815381.97
Average:         eth0 142827.50 3179259.70   9206.30 4700578.16

This patch allows rx/tx bytes/packets counters being folded at the
time we need stats.

We now can fetch stats every 1 ms if we want to check NIC behavior
on a small time window. It is also easier to detect anomalies.

lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
07:42:50         eth0 142915.00 3177696.00   9212.06 4698270.42
07:42:51         eth0 143741.00 3200232.00   9265.15 4731593.02
07:42:52         eth0 142781.00 3171600.00   9202.92 4689260.16
07:42:53         eth0 143835.00 3192932.00   9271.80 4720761.39
07:42:54         eth0 141922.00 3165174.00   9147.64 4679759.21
07:42:55         eth0 142993.00 3207038.00   9216.78 4741653.05
07:42:56         eth0 141394.06 3154335.64   9113.85 4663731.73
07:42:57         eth0 141850.00 3161202.00   9144.48 4673866.07
07:42:58         eth0 143439.00 3180736.00   9246.05 4702755.35
07:42:59         eth0 143501.00 3210992.00   9249.99 4747501.84
Average:         eth0 142835.66 3182165.93   9206.98 4704874.08

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agogeneve: fix ip_hdr_len reserved for geneve6 tunnel.
Haishuang Yan [Mon, 28 Nov 2016 05:26:58 +0000 (13:26 +0800)]
geneve: fix ip_hdr_len reserved for geneve6 tunnel.

It shold reserved sizeof(ipv6hdr) for geneve in ipv6 tunnel.

Fixes: c3ef5aa5e5 ('geneve: Merge ipv4 and ipv6 geneve_build_skb()')
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'phy-doc-improvements'
David S. Miller [Mon, 28 Nov 2016 21:07:43 +0000 (16:07 -0500)]
Merge branch 'phy-doc-improvements'

Florian Fainelli says:

====================
Documentation: net: phy: Improve documentation

This patch series addresses discussions and feedback that was recently received
on the mailing-list in the area of: flow control/pause frames, interpretation of
phy_interface_t and finally add some links to useful standards documents.

Changes in v3:

- add Timur's feedback into patch 3

Changes in v2:

- clarify a few things in the RGMII section, add a paragraph about common issues
  with RGMII delay mismatches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoDocumentation: net: phy: Add links to several standards documents
Florian Fainelli [Mon, 28 Nov 2016 02:45:15 +0000 (18:45 -0800)]
Documentation: net: phy: Add links to several standards documents

Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0
revisions of the standard.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoDocumentation: net: phy: Add blurb about RGMII
Florian Fainelli [Mon, 28 Nov 2016 02:45:14 +0000 (18:45 -0800)]
Documentation: net: phy: Add blurb about RGMII

RGMII is a recurring source of pain for people with Gigabit Ethernet
hardware since it may require PHY driver and MAC driver level
configuration hints. Document what are the expectations from PHYLIB and
what options exist.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoDocumentation: net: phy: Add a paragraph about pause frames/flow control
Florian Fainelli [Mon, 28 Nov 2016 02:45:13 +0000 (18:45 -0800)]
Documentation: net: phy: Add a paragraph about pause frames/flow control

Describe that the Ethernet MAC controller is ultimately responsible for
dealing with proper pause frames/flow control advertisement and
enabling, and that it is therefore allowed to have it change
phydev->supported/advertising with SUPPORTED_Pause and
SUPPORTED_AsymPause.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoDocumentation: net: phy: remove description of function pointers
Florian Fainelli [Mon, 28 Nov 2016 02:45:12 +0000 (18:45 -0800)]
Documentation: net: phy: remove description of function pointers

Remove the function pointers documentation which duplicates information
found in include/linux/phy.h. Maintaining documentation about two
different locations just does not work, but the code is less likely to
be outdated.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count
Andreas Färber [Sun, 27 Nov 2016 22:26:28 +0000 (23:26 +0100)]
net: dsa: mv88e6xxx: Fix mv88e6xxx_g1_irq_free() interrupt count

mv88e6xxx_g1_irq_setup() sets up chip->g1_irq.nirqs interrupt mappings,
so free the same amount. This will be 8 or 9 in practice, less than 16.

Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'mlx5-DCBX-and-ethtool-updates'
David S. Miller [Mon, 28 Nov 2016 20:09:37 +0000 (15:09 -0500)]
Merge branch 'mlx5-DCBX-and-ethtool-updates'

Saeed Mahameed says:

====================
Mellanox 100G mlx5 DCBX and ethtool updates

This series provides the following mlx5 updates:

From Huy:
DCBX CEE API and DCBX firmware/host modes support.
    - 1st patch ensures the dcbnl_rtnl_ops is published only when the qos
      capability bits is on.
    - 2nd patch adds the support for CEE interfaces into mlx5 dcbnl_rtnl_ops
    - 3rd patch refactors ETS query to read ETS configuration directly from
      firmware rather than having a software shadow to it. The existing IEEE
      interfaces stays the same.
    - 4th patch adds the support for MLX5_REG_DCBX_PARAM and MLX5_REG_DCBX_APP
      firmware commands to manipulate mlx5 DCBX mode.
    - 5th patch adds the driver support for the new DCBX firmware.  This ensures
      the backward compatibility versus the old and new firmware. With the new DCBX
      firmware, qos settings can be controlled by either firmware or software
      depending on the DCBX mode.

From Kamal and Saeed:
    - mlx5 self-test support.

From Shaker:
    - Private flag to give the user the ability to enable/disable mlx5 CQE
      compression.

V1->V2:
    - Check ETS capability where needed in:
("net/mlx5e: Read ETS settings directly from firmware")
    - Fix return value of mlx5e_dcbnl_switch_to_host_mode in:
("net/mlx5e: ConnectX-4 firmware support for DCBX")
    - Update commit message of:
("net/mlx5e: ConnectX-4 firmware support for DCBX")
    - Fix two sparse static check warnings in en_selftest.c

This series was generated against commit:
e5f12b3f5ebb ("Merge branch 'mlxsw-trap-groups-and-policers'")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Add CQE compression user control
Shaker Daibes [Sun, 27 Nov 2016 15:02:12 +0000 (17:02 +0200)]
net/mlx5e: Add CQE compression user control

The user can now override the automatic driver decision using the
rx_cqe_compress flag, which is the preference for CQE compression.
The flag is initialized with the automatic driver decision.

Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Moves pflags to priv->params
Shaker Daibes [Sun, 27 Nov 2016 15:02:11 +0000 (17:02 +0200)]
net/mlx5e: Moves pflags to priv->params

pflags is a configuration parameter for the netdev, naturally it belongs
to priv->params.
Also introduce MLX5E_GET_PFLAG

Signed-off-by: Shaker Daibes <shakerd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Add support for loopback selftest
Saeed Mahameed [Sun, 27 Nov 2016 15:02:10 +0000 (17:02 +0200)]
net/mlx5e: Add support for loopback selftest

Extend the self diagnostic tests to support loopback test.

The loopback test doesn't require the offline flag, it will use the
generic dev_queue_xmit and a dedicated packet_type to capture and verify
mlx5e selftest loopback packets.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Add support for ethtool self diagnostics test
Kamal Heib [Sun, 27 Nov 2016 15:02:09 +0000 (17:02 +0200)]
net/mlx5e: Add support for ethtool self diagnostics test

The self diagnostics test implementaion include the following features:
1. Link Test: Check that link is in up state.
2. Speed Test: Check that link was negotiated correctly.
3. Health Test: Check the device health.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Add DCBX control interface
Huy Nguyen [Sun, 27 Nov 2016 15:02:08 +0000 (17:02 +0200)]
net/mlx5e: Add DCBX control interface

Use setdcbx interface to set the DCBX mode to firmware or os.
If setdcbx is called with mode value of zero, the DCBX mode
is set to firmware.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: ConnectX-4 firmware support for DCBX
Huy Nguyen [Sun, 27 Nov 2016 15:02:07 +0000 (17:02 +0200)]
net/mlx5e: ConnectX-4 firmware support for DCBX

DBCX by default is controlled by firmware where dcbx capability bit
is set. In this mode, firmware is responsible for reading/sending the
TLV packets from/to the remote partner.

This patch sets up the infrastructure to move between HOST/FW DCBX
control mode.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5: Add DCBX firmware commands support
Huy Nguyen [Sun, 27 Nov 2016 15:02:06 +0000 (17:02 +0200)]
net/mlx5: Add DCBX firmware commands support

Add set/query commands for DCBX_PARAM register

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Read ETS settings directly from firmware
Huy Nguyen [Sun, 27 Nov 2016 15:02:05 +0000 (17:02 +0200)]
net/mlx5e: Read ETS settings directly from firmware

Issue description:
Current implementation saves the ETS settings from user in
a temporal soft copy and returns this settings when user
queries the ETS settings.

With the new DCBX firmware, the ETS settings can be changed
by firmware when the DCBX is in firmware controlled mode. Therefore,
user will obtain wrong values from the temporal soft copy.

Solution:
1. Read the ETS settings directly from firmware.
2. For tc_tsa:
   a. Initialize tc_tsa to vendor IEEE_8021QAZ_TSA_VENDOR at netdev
      creation.
   b. When reading ETS setting from FW, if the traffic class bandwidth
      is less than 100, set tc_tsa to IEEE_8021QAZ_TSA_ETS. This
      implementation solves the scenarios when the DCBX is in FW control
      and willing bit is on which means the ETS setting is dictated
      by remote switch.

Also check ETS capability where needed.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Support DCBX CEE API
Huy Nguyen [Sun, 27 Nov 2016 15:02:04 +0000 (17:02 +0200)]
net/mlx5e: Support DCBX CEE API

Add DCBX CEE API interface for ConnectX-4. Configurations are stored in
a temporary structure and are applied to the card's firmware when
the CEE's setall callback function is called.

Note:
  priority group in CEE is equivalent to traffic class in ConnectX-4
  hardware spec.

  bw allocation per priority in CEE is not supported because ConnectX-4
  only supports bw allocation per traffic class.

  user priority in CEE does not have an equivalent term in ConnectX-4.
  Therefore, user priority to priority mapping in CEE is not supported.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet/mlx5e: Add qos capability check
Huy Nguyen [Sun, 27 Nov 2016 15:02:03 +0000 (17:02 +0200)]
net/mlx5e: Add qos capability check

Make sure firmware supports qos before exposing the DCB API.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agovirtio-net: enable multiqueue by default
Jason Wang [Fri, 25 Nov 2016 04:37:26 +0000 (12:37 +0800)]
virtio-net: enable multiqueue by default

We use single queue even if multiqueue is enabled and let admin to
enable it through ethtool later. This is used to avoid possible
regression (small packet TCP stream transmission). But looks like an
overkill since:

- single queue user can disable multiqueue when launching qemu
- brings extra troubles for the management since it needs extra admin
  tool in guest to enable multiqueue
- multiqueue performs much better than single queue in most of the
  cases

So this patch enables multiqueue by default: if #queues is less than or
equal to #vcpu, enable as much as queue pairs; if #queues is greater
than #vcpu, enable #vcpu queue pairs.

Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: Jeremy Eder <jeder@redhat.com>
Cc: Marko Myllynen <myllynen@redhat.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'MV88E6097-fixes'
David S. Miller [Mon, 28 Nov 2016 16:58:57 +0000 (11:58 -0500)]
Merge branch 'MV88E6097-fixes'

Stefan Eichenberger says:

====================
Fix support for the MV88E6097

This patchset fixes the following two issues for the MV88E6097:
- Add missing definition of g1_irqs
- Add missing comment
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: add missing comment for MV88E6097
Stefan Eichenberger [Fri, 25 Nov 2016 08:41:30 +0000 (09:41 +0100)]
net: dsa: mv88e6xxx: add missing comment for MV88E6097

Add a missing comment for the MV88E6097 because of unification.

Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonet: dsa: mv88e6xxx: add g1_irqs definition for MV88E6097
Stefan Eichenberger [Fri, 25 Nov 2016 08:41:29 +0000 (09:41 +0100)]
net: dsa: mv88e6xxx: add g1_irqs definition for MV88E6097

Add the missing definition of g1_irqs for MV88E6097.

Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'bpf-misc-next'
David S. Miller [Mon, 28 Nov 2016 01:38:49 +0000 (20:38 -0500)]
Merge branch 'bpf-misc-next'

Daniel Borkmann says:

====================
BPF cleanups and misc updates

This patch set adds couple of cleanups in first few patches,
exposes owner_prog_type for array maps as well as mlocked mem
for maps in fdinfo, allows for mount permissions in fs and
fixes various outstanding issues in selftests and samples.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: fix multiple issues in selftest suite and samples
Daniel Borkmann [Sat, 26 Nov 2016 00:28:09 +0000 (01:28 +0100)]
bpf: fix multiple issues in selftest suite and samples

1) The test_lru_map and test_lru_dist fails building on my machine since
   the sys/resource.h header is not included.

2) test_verifier fails in one test case where we try to call an invalid
   function, since the verifier log output changed wrt printing function
   names.

3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for
   retrieving the number of possible CPUs. This is broken at least in our
   scenario and really just doesn't work.

   glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF.
   First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l,
   if that fails, depending on the config, it either tries to count CPUs
   in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead.
   If /proc/cpuinfo has some issue, it returns just 1 worst case. This
   oddity is nothing new [1], but semantics/behaviour seems to be settled.
   _SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if
   that fails it looks into /proc/stat for cpuX entries, and if also that
   fails for some reason, /proc/cpuinfo is consulted (and returning 1 if
   unlikely all breaks down).

   While that might match num_possible_cpus() from the kernel in some
   cases, it's really not guaranteed with CPU hotplugging, and can result
   in a buffer overflow since the array in user space could have too few
   number of slots, and on perpcu map lookup, the kernel will write beyond
   that memory of the value buffer.

   William Tu reported such mismatches:

     [...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
     happens when CPU hotadd is enabled. For example, in Fusion when
     setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64
     -smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf()
     will be 2 [2]. [...]

   Documentation/cputopology.txt says /sys/devices/system/cpu/possible
   outputs cpu_possible_mask. That is the same as in num_possible_cpus(),
   so first step would be to fix the _SC_NPROCESSORS_CONF calls with our
   own implementation. Later, we could add support to bpf(2) for passing
   a mask via CPU_SET(3), for example, to just select a subset of CPUs.

   BPF samples code needs this fix as well (at least so that people stop
   copying this). Thus, define bpf_num_possible_cpus() once in selftests
   and import it from there for the sample code to avoid duplicating it.
   The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.

After all three issues are fixed, the test suite runs fine again:

  # make run_tests | grep self
  selftests: test_verifier [PASS]
  selftests: test_maps [PASS]
  selftests: test_lru_map [PASS]
  selftests: test_kmod.sh [PASS]

  [1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html
  [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html

Fixes: 3059303f59cf ("samples/bpf: update tracex[23] examples to use per-cpu maps")
Fixes: 86af8b4191d2 ("Add sample for adding simple drop program to link")
Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Fixes: e15596717948 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH")
Fixes: ebb676daa1a3 ("bpf: Print function name in addition to function id")
Fixes: 5db58faf989f ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: allow for mount options to specify permissions
Daniel Borkmann [Sat, 26 Nov 2016 00:28:08 +0000 (01:28 +0100)]
bpf: allow for mount options to specify permissions

Since we recently converted the BPF filesystem over to use mount_nodev(),
we now have the possibility to also hold mount options in sb's s_fs_info.
This work implements mount options support for specifying permissions on
the sb's inode, which will be used by tc when it manually needs to mount
the fs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: add owner_prog_type and accounted mem to array map's fdinfo
Daniel Borkmann [Sat, 26 Nov 2016 00:28:07 +0000 (01:28 +0100)]
bpf: add owner_prog_type and accounted mem to array map's fdinfo

Allow for checking the owner_prog_type of a program array map. In some
cases bpf(2) can return -EINVAL /after/ the verifier passed and did all
the rewrites of the bpf program.

The reason that lets us fail at this late stage is that program array
maps are incompatible. Allow users to inspect this earlier after they
got the map fd through BPF_OBJ_GET command. tc will get support for this.

Also, display how much we charged the map with regards to RLIMIT_MEMLOCK.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: reuse dev_is_mac_header_xmit for redirect
Daniel Borkmann [Sat, 26 Nov 2016 00:28:06 +0000 (01:28 +0100)]
bpf: reuse dev_is_mac_header_xmit for redirect

Commit dcf800344a91 ("net/sched: act_mirred: Refactor detection whether
dev needs xmit at mac header") added dev_is_mac_header_xmit(); since it's
also useful elsewhere, move it to if_arp.h and reuse it for BPF.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: drop useless bpf_fd member from cls/act
Daniel Borkmann [Sat, 26 Nov 2016 00:28:05 +0000 (01:28 +0100)]
bpf: drop useless bpf_fd member from cls/act

After setup we don't need to keep user space fd number around anymore, as
it also has no useful meaning for anyone, just remove it.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agobpf: drop unnecessary context cast from BPF_PROG_RUN
Daniel Borkmann [Sat, 26 Nov 2016 00:28:04 +0000 (01:28 +0100)]
bpf: drop unnecessary context cast from BPF_PROG_RUN

Since long already bpf_func is not only about struct sk_buff * as
input anymore. Make it generic as void *, so that callers don't
need to cast for it each time they call BPF_PROG_RUN().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agosfc: remove unneeded variable
Dan Carpenter [Fri, 25 Nov 2016 10:43:04 +0000 (13:43 +0300)]
sfc: remove unneeded variable

We don't use ->heap_buf after commit 46d1efd852cc ("sfc: remove Software
TSO") so let's remove the last traces.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge tag 'wireless-drivers-next-for-davem-2016-11-25' of git://git.kernel.org/pub...
David S. Miller [Mon, 28 Nov 2016 01:26:59 +0000 (20:26 -0500)]
Merge tag 'wireless-drivers-next-for-davem-2016-11-25' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.10

Major changes:

iwlwifi

* finalize and enable dynamic queue allocation
* use dev_coredumpmsg() to prevent locking the driver
* small fix to pass the AID to the FW
* use FW PS decisions with multi-queue

ath9k

* add device tree bindings
* switch to use mac80211 intermediate software queues to reduce
  latency and fix bufferbloat

wl18xx

* allow scanning in AP mode
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agonetdevice: fix sparse warning for HARD_TX_LOCK
Michael S. Tsirkin [Thu, 24 Nov 2016 05:04:08 +0000 (07:04 +0200)]
netdevice: fix sparse warning for HARD_TX_LOCK

sparse warns about context imbalance in any code
that uses HARD_TX_LOCK/UNLOCK - this is because it's
unable to determine that flags don't change so
lock and unlock are paired.

Seems easy enough to fix by adding __acquire/__release
calls.

With this patch af_packet.c is now sparse-clean,

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoptp: gianfar: Use high resolution frequency method.
Ulrik De Bie [Wed, 23 Nov 2016 20:11:04 +0000 (21:11 +0100)]
ptp: gianfar: Use high resolution frequency method.

This patch depends on commit d8d263541913 ("ptp: Introduce a high
resolution frequency adjustment method.")

The gianfar devices offer a frequency resolution of about 0.46 ppb
(depends on actual value of tmr_add, for the calculation assumed
0x80000000). This patch lets users of the device benefit from the increased
frequency resolution when tuning the clock. Thanks to the rounding the
maximum error between the requested frequency and the applied frequency
will then be about 0.23 ppb.

Tested on a v3.3.8 kernel on a real gianfar device. Verified compilation
on net-next (currently at v4.9-rc5).

Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlx4: do not use priv->stats_lock in mlx4_en_auto_moderation()
Eric Dumazet [Wed, 23 Nov 2016 17:46:52 +0000 (09:46 -0800)]
mlx4: do not use priv->stats_lock in mlx4_en_auto_moderation()

Per RX ring packets/bytes counters are not protected by global
priv->stats_lock.

Better not confuse the reader, and use READ_ONCE() to show we read
these counters without surrounding synchronization.

Interrupt moderation is best effort, and we do not really care of
ultra precise counters.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Sun, 27 Nov 2016 04:42:21 +0000 (23:42 -0500)]
Merge git://git./linux/kernel/git/davem/net

udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.

Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 27 Nov 2016 01:21:13 +0000 (17:21 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull vfs splice fix from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix default_file_splice_read()

8 years agofix default_file_splice_read()
Al Viro [Sun, 27 Nov 2016 01:05:42 +0000 (20:05 -0500)]
fix default_file_splice_read()

Botched calculation of number of pages.  As the result,
we were dropping pieces when doing splice to pipe from
e.g. 9p.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 26 Nov 2016 23:28:34 +0000 (15:28 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Here is a revert and two bugfixes for the I2C designware driver.

  Please note that we are still hunting down a regression for the
  i2c-octeon driver. While there is a fix pending, we have unclear
  feedback from the testers currently. An rc8 would be quite helpful
  for this case"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  Revert "i2c: designware: do not disable adapter after transfer"
  i2c: designware: fix rx fifo depth tracking
  i2c: designware: report short transfers

8 years agoMerge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Linus Torvalds [Sat, 26 Nov 2016 23:26:20 +0000 (15:26 -0800)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fix from Russell King:
 "This resolves the ksyms issues by reverting the commit which
  introduced the breakage"

There was what I consider to be a better fix, but it's late in the rc
game, so I'll take the revert.

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  Revert "arm: move exports to definitions"

8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 26 Nov 2016 21:05:05 +0000 (13:05 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix leak in fsl/fman driver, from Dan Carpenter.

 2) Call flow dissector initcall earlier than any networking driver can
    register and start to use it, from Eric Dumazet.

 3) Some dup header fixes from Geliang Tang.

 4) TIPC link monitoring compat fix from Jon Paul Maloy.

 5) Link changes require EEE re-negotiation in bcm_sf2 driver, from
    Florian Fainelli.

 6) Fix bogus handle ID passed into tfilter_notify_chain(), from Roman
    Mashak.

 7) Fix dump size calculation in rtnl_calcit(), from Zhang Shengju.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  tipc: resolve connection flow control compatibility problem
  mvpp2: use correct size for memset
  net/mlx5: drop duplicate header delay.h
  net: ieee802154: drop duplicate header delay.h
  ibmvnic: drop duplicate header seq_file.h
  fsl/fman: fix a leak in tgec_free()
  net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS
  tipc: improve sanity check for received domain records
  tipc: fix compatibility bug in link monitoring
  net: ethernet: mvneta: Remove IFF_UNICAST_FLT which is not implemented
  dwc_eth_qos: drop duplicate headers
  net sched filters: fix filter handle ID in tfilter_notify_chain()
  net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
  bnxt: do not busy-poll when link is down
  udplite: call proper backlog handlers
  ipv6: bump genid when the IFA_F_TENTATIVE flag is clear
  net/mlx4_en: Free netdev resources under state lock
  net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit"
  rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit()
  bnxt_en: Fix a VXLAN vs GENEVE issue
  ...

8 years agoMerge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdim...
Linus Torvalds [Sat, 26 Nov 2016 20:24:47 +0000 (12:24 -0800)]
Merge branch 'libnvdimm-fixes' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:

 - Fix a crash that occurs at driver initialization if the memory region
   is already busy (request_mem_region() fails).

 - Fix a vma validation check that mistakenly allows a private device-
   dax mapping to be established. Device-dax explicitly forbids private
   mappings so it can guarantee a given fault granularity and backing
   memory type.

 Both of these fixes have soaked in -next and are tagged for -stable.

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: fail all private mapping attempts
  device-dax: check devm_nsio_enable() return value

8 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sat, 26 Nov 2016 20:18:59 +0000 (12:18 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "Four fixes for bugs found by syzkaller on x86, all for stable"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: check for pic and ioapic presence before use
  KVM: x86: fix out-of-bounds accesses of rtc_eoi map
  KVM: x86: drop error recovery in em_jmp_far and em_ret_far
  KVM: x86: fix out-of-bounds access in lapic

8 years agoMerge tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sat, 26 Nov 2016 19:24:03 +0000 (11:24 -0800)]
Merge tag 'powerpc-4.9-6' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fixes marked for stable:
   - Set missing wakeup bit in LPCR on POWER9
   - Fix the early OPAL console wrappers
   - Fixup kernel read only mapping

  Fixes for code merged this cycle:
   - Fix missing CRCs, add more asm-prototypes.h declarations"

* tag 'powerpc-4.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Fixup kernel read only mapping
  powerpc/boot: Fix the early OPAL console wrappers
  powerpc: Fix missing CRCs, add more asm-prototypes.h declarations
  powerpc: Set missing wakeup bit in LPCR on POWER9

8 years agotipc: resolve connection flow control compatibility problem
Jon Paul Maloy [Thu, 24 Nov 2016 23:47:07 +0000 (18:47 -0500)]
tipc: resolve connection flow control compatibility problem

In commit 10724cc7bb78 ("tipc: redesign connection-level flow control")
we replaced the previous message based flow control with one based on
1k blocks. In order to ensure backwards compatibility the mechanism
falls back to using message as base unit when it senses that the peer
doesn't support the new algorithm. The default flow control window,
i.e., how many units can be sent before the sender blocks and waits
for an acknowledge (aka advertisement) is 512. This was tested against
the previous version, which uses an acknowledge frequency of on ack per
256 received message, and found to work fine.

However, we missed the fact that versions older than Linux 3.15 use an
acknowledge frequency of 512, which is exactly the limit where a 4.6+
sender will stop and wait for acknowledge. This would also work fine if
it weren't for the fact that if the first sent message on a 4.6+ server
side is an empty SYNACK, this one is also is counted as a sent message,
while it is not counted as a received message on a legacy 3.15-receiver.
This leads to the sender always being one step ahead of the receiver, a
scenario causing the sender to block after 512 sent messages, while the
receiver only has registered 511 read messages. Hence, the legacy
receiver is not trigged to send an acknowledge, with a permanently
blocked sender as result.

We solve this deadlock by simply allowing the sender to send one more
message before it blocks, i.e., by a making minimal change to the
condition used for determining connection congestion.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agoMerge branch 'mlxsw-trap-groups-and-policers'
David S. Miller [Sat, 26 Nov 2016 02:22:28 +0000 (21:22 -0500)]
Merge branch 'mlxsw-trap-groups-and-policers'

Jiri Pirko says:

====================
mlxsw: traps, trap groups and policers

Nogah says:

For a packet to be sent from the HW to the cpu, it needs to be trapped.
For a trap to be activate it should be assigned to a trap group.
Those trap groups can have policers, to limit the packet rate (the max
number of packets that can be sent to the cpu in a time slot, the rest
will be discarded) or the data rate (the same, but the count is not by the
number of packets but by their total length in bytes).

This patchset rearrange the trap setting API, re-write the traps and the
trap groups list in spectrum and assign them policers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum: Add policers for trap groups
Nogah Frankel [Fri, 25 Nov 2016 09:33:47 +0000 (10:33 +0100)]
mlxsw: spectrum: Add policers for trap groups

Configure policers and connect them to trap groups.

While many trap groups share policer's configuration they don't share
the actual policer because each trap group represents a different
flow / protocol and we don't want one of them to be able to exceed its
rate on behalf of another.
For example, if STP and LLDP gets to send 128 packets/sec each, if we
put them in one 256 packets/sec policer, one can send 200 packets while
the other only 50.

Note that IP2ME covers lots of flows, so it's limit is set to match the
cpu ability to handle data.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: reg: Add QoS Policer Configuration Register
Nogah Frankel [Fri, 25 Nov 2016 09:33:46 +0000 (10:33 +0100)]
mlxsw: reg: Add QoS Policer Configuration Register

The QPCR register is used to create and control policers.
A policer can discard or change the color of packets that are
trapped by a specific trap.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: resources: Add max cpu policers resource
Nogah Frankel [Fri, 25 Nov 2016 09:33:45 +0000 (10:33 +0100)]
mlxsw: resources: Add max cpu policers resource

Add a new resource to resources query: max cpu policers which tells us how
many policers can be used to limit the data rate to the cpu port.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: Create a different trap group list for each device
Nogah Frankel [Fri, 25 Nov 2016 09:33:44 +0000 (10:33 +0100)]
mlxsw: Create a different trap group list for each device

Trap groups can be used to control traps priority, both in terms of
which trap "wins" if a packet matches two traps (priority) and in terms
of packets from which trap group will be scheduled to the cpu first (tc).
They can also be used to set rate limiters (policers) on them (will be
added in the next patches).

Currently, we support two trap groups. In Spectrum we want a better
resolution, so every protocol / flow will have a different trap group,
so we can control its parameters separately. Once the policers will be
implemented, it will also allow us limit the rate of each protocol by
itself.

This patch change the trap group list to include:
* the emad trap group, which is shared for all the devices.
* Switchx2's trap groups, which are a copy of the current trap groups.
* Spectrum's new trap groups, in order to match the above guidelines.
(Switchib is using only the emad trap group, so it require no changes).

This patch also includes new configuration for Spectrum's trap groups,
with primary priority order within them.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum: Add BGP trap
Nogah Frankel [Fri, 25 Nov 2016 09:33:43 +0000 (10:33 +0100)]
mlxsw: spectrum: Add BGP trap

Add a trap for BGP protocol that was previously trapped by the generic trap
for IP2ME. This trap will allow us to have better control (over priority
and rate) of the traffic.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: Change trap groups setting
Nogah Frankel [Fri, 25 Nov 2016 09:33:42 +0000 (10:33 +0100)]
mlxsw: Change trap groups setting

Trap groups have many options which we currently set to default values.
In the next patches we will use many of them with non-default values.

Some of these options have no default value, so this patch sets them as
params for the trap group set function. Others almost always use the same
values, so the set function will use this default values. In the rare cases
when they will need to be with other values, these values can be set
directly (using the macros for fields in registers).

Parameters without default value:
TC - the traffic class for packets that hit this trap group.
    (old default is the max tc)
priority - if one packet hits multiple trap groups, the group with the
   higher priority will "catch" it. (old default is 0)
policer - limit rate policer (old default is disabled)

Default parameters:
swid - switch id, relevant for the emad trap only, ignored on Spectrum.
       (new default is 0)
rdq - CPU receive descriptor queue (new default is identical to trap
      group id)

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: resources: Add max trap groups resource
Nogah Frankel [Fri, 25 Nov 2016 09:33:41 +0000 (10:33 +0100)]
mlxsw: resources: Add max trap groups resource

Add the max number of trap groups to resource query.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: core: Change emad trap group settings
Nogah Frankel [Fri, 25 Nov 2016 09:33:40 +0000 (10:33 +0100)]
mlxsw: core: Change emad trap group settings

Currently, the emad trap init was done in the core. In the future we will
want to add some changes to the traps groups, according to device type.
This commit create a driver function to create the trap group for the
emad, so later it can be changed by devices. It also changes the emad
registration to use the new generic functions.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: Add option to choose trap group
Nogah Frankel [Fri, 25 Nov 2016 09:33:39 +0000 (10:33 +0100)]
mlxsw: Add option to choose trap group

Currently, we set the trap group to pre-determined option, based on whether
it is an rx or event trap.
This commit adds a possibility to chose the trap group, so it can be set
to different values in the following patches.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: Change trap set function
Nogah Frankel [Fri, 25 Nov 2016 09:33:38 +0000 (10:33 +0100)]
mlxsw: Change trap set function

Change trap setting function so instead of determining the trap group by
trap id, it gets it as a parameter (so later we can have different trap
groups for Spectrum and Switchx2).
Add "is_ctrl" parameter to the trap setting function. It control whether
the trapped packets wait in a designated control buffer or in their
default one. This parameter is ignored by Switchx2 and Switchib.
Add these parameters to the traps array in Spectrum, Switchx2 and
Switchib.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: switchib: Use generic listener struct for events
Nogah Frankel [Fri, 25 Nov 2016 09:33:37 +0000 (10:33 +0100)]
mlxsw: switchib: Use generic listener struct for events

Change the event handling in Switchib to be comptible with Spectrum and
Switchx2.
Use the generic listener struct for the events. Init and fini them by loop
(and not by calling each event by its name).

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: switchx2: Use generic listener struct for events
Nogah Frankel [Fri, 25 Nov 2016 09:33:36 +0000 (10:33 +0100)]
mlxsw: switchx2: Use generic listener struct for events

Change the events to use the generic listener struct.
Merge the event list into the trap list, so the same functions will
handle both.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8 years agomlxsw: spectrum: Use generic listener struct for events
Nogah Frankel [Fri, 25 Nov 2016 09:33:35 +0000 (10:33 +0100)]
mlxsw: spectrum: Use generic listener struct for events

Change the events to use the generic listener struct.
Merge the event list into the trap list, so the same functions will
handle both.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>