Radhey Shyam Pandey [Thu, 28 Jun 2018 13:11:49 +0000 (18:41 +0530)]
net: emaclite: Fix block comments style
This patch fixes below checkpatch warnings-
WARNING: Block comments use a trailing */ on a separate line
WARNING: Block comments use * on subsequent lines
WARNING: networking block comments don't use an empty /* line,
use /* Comment
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Radhey Shyam Pandey [Thu, 28 Jun 2018 13:11:48 +0000 (18:41 +0530)]
net: emaclite: update kernel-doc comments
This patch fixes below kernel-doc warnings:
Function parameter or member 'maxlen' not described in 'xemaclite_recv_data'
Function parameter or member 'address'not described in 'xemaclite_set_mac_address'
Excess function parameter 'addr' description in 'xemaclite_set_mac_address'
No description found for return value of 'xemaclite_interrupt'
No description found for return value of 'xemaclite_mdio_write'
Function parameter or member 'dev' not described in 'xemaclite_mdio_setup'
Excess function parameter 'ofdev' description in 'xemaclite_mdio_setup'
No description found for return value of 'xemaclite_open'
No description found for return value of 'xemaclite_close'
Excess function parameter 'match' description in 'xemaclite_of_probe'
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Radhey Shyam Pandey [Thu, 28 Jun 2018 13:11:47 +0000 (18:41 +0530)]
net: emaclite: Simplify if-else statements
Remove else as it is not required with if doing a return.
It also coalesce the format onto a single line and add the
missing space after the comma. Fixes below checkpatch warning-
WARNING: else is not generally useful after a break or return
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Radhey Shyam Pandey [Thu, 28 Jun 2018 13:11:46 +0000 (18:41 +0530)]
net: emaclite: Use __func__ instead of hardcoded name
Switch hardcoded function name with a reference to __func__ making
the code more maintainable. Address below checkpatch warning:
WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_read',
this function's name, in a string
+ "xemaclite_mdio_read(phy_id=%i, reg=%x) == %x\n",
WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_write',
this function's name, in a string
+ "xemaclite_mdio_write(phy_id=%i, reg=%x, val=%x)\n",
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 30 Jun 2018 09:54:09 +0000 (18:54 +0900)]
Merge branch 'mvpp2-Add-big-endian-support'
Maxime Chevallier says:
====================
net: mvpp2: Add big-endian support
This series allows to use PPv2 on system built as big endian.
The first patch fixes the way we represent TX and RX descriptors, so that
they used fixed little endianness as expected by the PPv2 controller.
The second reworks the way we handle the software representation of the
Header Parser entries, so that we don't use a union of arrays.
The last two patches fixes some incorrect byte swapping logic, that wen't
un-noticed on little-endian.
This whole series doesn't fix any existing bug for little-endian systems, and
since big-endian never worked for this driver, I didn't include 'fixes' tags.
This was tested on MacchiatoBin (Armada 8040).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Thu, 28 Jun 2018 12:42:07 +0000 (14:42 +0200)]
net: mvpp2: Use htons when checking protocol info
When checking the skb->protocol field, we have to make sure we use the
proper endianness using htons, and not swab16.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Thu, 28 Jun 2018 12:42:06 +0000 (14:42 +0200)]
net: mvpp2: prs: Drop unnecessary swab16 in vlan detection
Vlan IDs must not be swapped when creating Header Parser entries. This
has no effect on little-endian systems, but is wrong for big-endian.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Thu, 28 Jun 2018 12:42:05 +0000 (14:42 +0200)]
net: mvpp2: prs: Drop unions representing TCAM and SRAM entries
PPv2's Header Parser use some large TCAM and SRAM entries, that are
duplicated in software so that we can write them to hardware only when
we are done modifying them.
Currently, PPv2 uses a union containing arrays of u32 and u8 to represent
these entries, to facilitate byte per byte access. This representation is
broken when we want to support big endian, and this makes the code
confusing to read.
This patch drops the union, and simply stores the TCAM and SRAM entries
as u32 arrays, each entry corresponding to a 32-bit register.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Chevallier [Thu, 28 Jun 2018 12:42:04 +0000 (14:42 +0200)]
net: mvpp2: Make TX / RX descriptors little-endian
The PPv2 controller always expect descriptors to be in little endian. We
must therefore force descriptors to use that format, and convert to the
host endianness when necessary.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yafang Shao [Thu, 28 Jun 2018 04:22:56 +0000 (00:22 -0400)]
tcp: add new SNMP counter for drops when try to queue in rcv queue
When sk_rmem_alloc is larger than the receive buffer and we can't
schedule more memory for it, the skb will be dropped.
In above situation, if this skb is put into the ofo queue,
LINUX_MIB_TCPOFODROP is incremented to track it.
While if this skb is put into the receive queue, there's no record.
So a new SNMP counter is introduced to track this behavior.
LINUX_MIB_TCPRCVQDROP: Number of packets meant to be queued in rcv queue
but dropped because socket rcvbuf limit hit.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Thu, 28 Jun 2018 01:32:23 +0000 (20:32 -0500)]
bnx2x: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Wed, 27 Jun 2018 14:57:02 +0000 (15:57 +0100)]
net: stmmac: Add support for CBS QDISC
This adds support for CBS reconfiguration using the TC application.
A new callback was added to TC ops struct and another one to DMA ops to
reconfigure the channel mode.
Tested in GMAC5.10.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 29 Jun 2018 14:54:31 +0000 (23:54 +0900)]
Merge tag 'mlx5e-updates-2018-06-28' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5e-updates-2018-06-28
mlx5e netdevice driver updates:
- Boris Pismenny added the support for UDP GSO in the first two patches.
Impressive performance numbers are included in the commit message,
@Line rate with ~half of the cpu utilization compared to non offload
or no GSO at all.
- From Tariq Toukan:
- Convert large order kzalloc allocations to kvzalloc.
- Added performance diagnostic statistics to several places in data path.
From Saeed and Eran,
- Update NIC HW stats on demand only, this is to eliminate the background
thread needed to update some HW statistics in the driver cache in
order to report error and drop counters from HW in ndo_get_stats.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 29 Jun 2018 14:50:27 +0000 (23:50 +0900)]
Merge branch 'net-Geneve-options-support-for-TC-act_tunnel_key'
Jakub Kicinski says:
====================
net: Geneve options support for TC act_tunnel_key
Simon & Pieter say:
This set adds Geneve Options support to the TC tunnel key action.
It provides the plumbing required to configure Geneve variable length
options. The options can be configured in the form CLASS:TYPE:DATA,
where CLASS is represented as a 16bit hexadecimal value, TYPE as an 8bit
hexadecimal value and DATA as a variable length hexadecimal value.
Additionally multiple options may be listed using a comma delimiter.
v2:
- fix sparse warnings in patches 3 and 4 (first one reported by
build bot).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Wed, 27 Jun 2018 04:39:37 +0000 (21:39 -0700)]
net/sched: add tunnel option support to act_tunnel_key
Allow setting tunnel options using the act_tunnel_key action.
Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.
# ip link add name geneve0 type geneve dstport 0 external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 \
ip_proto udp \
action tunnel_key \
set src_ip 10.0.99.192 \
dst_ip 10.0.99.193 \
dst_port 6081 \
id 11 \
geneve_opts 0102:80:
00800022,0102:80:
00800022 \
action mirred egress redirect dev geneve0
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren [Wed, 27 Jun 2018 04:39:36 +0000 (21:39 -0700)]
net: check tunnel option type in tunnel flags
Check the tunnel option type stored in tunnel flags when creating options
for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel
options on interfaces that are not associated with them.
Make sure all users of the infrastructure set correct flags, for the BPF
helper we have to set all bits to keep backward compatibility.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Wed, 27 Jun 2018 04:39:35 +0000 (21:39 -0700)]
net/sched: act_tunnel_key: add extended ack support
Add extended ack support for the tunnel key action by using NL_SET_ERR_MSG
during validation of user input.
Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Wed, 27 Jun 2018 04:39:34 +0000 (21:39 -0700)]
net/sched: act_tunnel_key: disambiguate metadata dst error cases
Metadata may be NULL for one of two reasons:
* Missing user input
* Failure to allocate the metadata dst
Disambiguate these case by returning -EINVAL for the former and -ENOMEM
for the latter rather than -EINVAL for both cases.
This is in preparation for using extended ack to provide more information
to users when parsing their input.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Tue, 26 Jun 2018 11:40:50 +0000 (17:10 +0530)]
cxgb4: Support ethtool private flags
This is used to change TX workrequests, which helps in
host->vf communication.
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Tue, 26 Jun 2018 11:40:25 +0000 (17:10 +0530)]
cxgb4: Add support for FW_ETH_TX_PKT_VM_WR
The present TX workrequest(FW_ETH_TX_PKT_WR) cant be used for
host->vf communication, since it doesn't loopback the outgoing
packets to virtual interfaces on the same port. This can be done
using FW_ETH_TX_PKT_VM_WR.
This fix depends on ethtool_flags to determine what WR to use for
TX path. Support for setting this flags by user is added in next
commit.
Based on the original work by : Casey Leedom <leedom@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 28 Jun 2018 07:31:00 +0000 (15:31 +0800)]
sctp: add support for SCTP_REUSE_PORT sockopt
This feature is actually already supported by sk->sk_reuse which can be
set by socket level opt SO_REUSEADDR. But it's not working exactly as
RFC6458 demands in section 8.1.27, like:
- This option only supports one-to-one style SCTP sockets
- This socket option must not be used after calling bind()
or sctp_bindx().
Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
work in linux.
To separate it from the socket level version, this patch adds 'reuse' in
sctp_sock and it works pretty much as sk->sk_reuse, but with some extra
setup limitations that are needed when it is being enabled.
"It should be noted that the behavior of the socket-level socket option
to reuse ports and/or addresses for SCTP sockets is unspecified", so it
leaves SO_REUSEADDR as is for the compatibility.
Note that the name SCTP_REUSE_PORT is somewhat confusing, as its
functionality is nearly identical to SO_REUSEADDR, but with some
extra restrictions. Here it uses 'reuse' in sctp_sock instead of
'reuseport'. As for sk->sk_reuseport support for SCTP, it will be
added in another patch.
Thanks to Neil to make this clear.
v1->v2:
- add sctp_sk->reuse to separate it from the socket level version.
v2->v3:
- improve changelog according to Marcelo's suggestion.
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Wu [Thu, 28 Jun 2018 01:33:21 +0000 (09:33 +0800)]
net: ethernet: stmmac: dwmac-rk: Add GMAC support for px30
Add constants and callback functions for the dwmac on px30 Soc.
The base structure is the same, but registers and the bits in
them are moved slightly, and add the clk_mac_speed for selecting
mac speed.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Thu, 28 Jun 2018 01:45:24 +0000 (20:45 -0500)]
tg3: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 29 Jun 2018 02:32:55 +0000 (11:32 +0900)]
Merge branch 'ila-Cleanup'
Tom Herbert says:
====================
ila: Cleanup
Perform some cleanup in ILA code. This includes:
- Fix rhashtable walk for cases where nl dumps are done with muliple
function calls. Add a skip index to skip over entries in
a node that have been previously visitied. Call rhashtable_walk_peek
to avoid dropping items between calls to ila_nl_dump.
- Call alloc_bucket_spinlocks to create bucket locks.
- Split out module initialization and netlink definitions into
separate files.
- Add ILA_CMD_FLUSH netlink command to clear the ILA translation table.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 27 Jun 2018 21:39:02 +0000 (14:39 -0700)]
ila: Flush netlink command to clear xlat table
Add ILA_CMD_FLUSH netlink command to clear the ILA translation table.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 27 Jun 2018 21:39:01 +0000 (14:39 -0700)]
ila: Create main ila source file
Create a main ila file that contains the module initialization functions
as well as netlink definitions. Previously these were defined in
ila_xlat and ila_common. This approach allows better extensibility.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 27 Jun 2018 21:39:00 +0000 (14:39 -0700)]
ila: Call library function alloc_bucket_locks
To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 27 Jun 2018 21:38:59 +0000 (14:38 -0700)]
ila: Fix use of rhashtable walk in ila_xlat.c
Perform better EAGAIN handling, handle case where ila_dump_info
fails and we missed objects in the dump, and add a skip index
to skip over ila entires in a list on a rhashtable node that have
already been visited (by a previous call to ila_nl_dump).
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 29 Jun 2018 02:06:35 +0000 (11:06 +0900)]
Merge branch 'hns3-a-few-code-improvements'
Peng Li says:
====================
net: hns3: a few code improvements
This patchset fixes a few code stylistic issues from
concentrated review, no functional changes introduced.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 28 Jun 2018 04:12:29 +0000 (12:12 +0800)]
net: hns3: use lower_32_bits and upper_32_bits
MACRO lower_32_bits and upper_32_bits can help to get bits 0-31
and bits 32-63 of a number, so just use it.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 28 Jun 2018 04:12:28 +0000 (12:12 +0800)]
net: hns3: remove back in struct hclge_hw
hclge_hw is embedded in hclge_dev, so use container_of instead of
back to get hclge_dev.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 28 Jun 2018 04:12:27 +0000 (12:12 +0800)]
net: hns3: remove the Redundant put_vector in hns3_client_uninit
The interface h->ae_algo->ops->put_vector is called in both
hns3_nic_dealloc_vector_data and hns3_nic_uninit_vector_data in
hns3_client_uninit, this will cause vector freed twice.
This patch remove the Redundant put_vector to make vector freed
only once.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 28 Jun 2018 04:12:26 +0000 (12:12 +0800)]
net: hns3: print the ret value in error information
Print the ret value in error information can help find the reason.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 28 Jun 2018 04:12:25 +0000 (12:12 +0800)]
net: hns3: extraction an interface for state init|uninit
Extraction an interface for state init|uninit to make the code
easier to read.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 28 Jun 2018 04:12:24 +0000 (12:12 +0800)]
net: hns3: remove unused head file in hnae3.c
linux/slab.h is not used in hnae3.h, this patch removes it.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 28 Jun 2018 04:12:23 +0000 (12:12 +0800)]
net: hns3: add unlikely for error check
The first bd of a packet is invalid and invalid ring head for tx
IRQ is not offen, they may occur when there is error,
Add unlikely for error check branch is better for performance.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 28 Jun 2018 04:12:22 +0000 (12:12 +0800)]
net: hns3: add l4_type check for both ipv4 and ipv6
HW supports UDP, TCP and SCTP packets checksum for both ipv4 and
ipv6, but do not support other type packets checksum for ipv4 or
ipv6.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 28 Jun 2018 04:12:21 +0000 (12:12 +0800)]
net: hns3: add vector status check before free vector
If the hdev->vector_status[vector_id] is already HCLGE_INVALID_VPORT,
should log the error and return.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 28 Jun 2018 04:12:20 +0000 (12:12 +0800)]
net: hns3: rename the interface for init_client_instance and uninit_client_instance
The interface init_client_instance and uninit_client_instance
do not register anything, only initialize the client instance.
This patch rename the related interface to make the function
name to indicate the purpose.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 28 Jun 2018 04:12:19 +0000 (12:12 +0800)]
net: hns3: remove hclge_get_vector_index from hclge_bind_ring_with_vector
In hclge_unmap_ring_frm_vector, there are 2 steps:
step 1: get vector index.
step 2 unbind ring with vector.
But it gets vector id again in step 2 interface. This patch
removes hclge_get_vector_index from hclge_bind_ring_with_vector,
and make the step the same with hns3 PF driver.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Thu, 24 May 2018 01:26:09 +0000 (18:26 -0700)]
net/mlx5e: Update NIC HW stats on demand only
Disable periodic stats update background thread and update stats in
background on demand when ndo_get_stats is called.
Having a background thread running in the driver all the time is bad for
power consumption and normally a user space daemon will query the stats
once every specific interval, so ideally the background thread and its
interval can be done in user space..
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Tariq Toukan [Sun, 13 May 2018 10:42:16 +0000 (13:42 +0300)]
net/mlx5e: Add counter for total num of NOP operations
A per-ring counter for NOP operations already exists.
Here I add a counter that sums them up.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Wed, 28 Jun 2017 16:27:18 +0000 (19:27 +0300)]
net/mlx5e: Add counter for MPWQE filler strides
Add ethtool counter to indicate the number of strides consumed
by filler CQEs.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Tue, 13 Mar 2018 09:19:28 +0000 (11:19 +0200)]
net/mlx5e: Add channel events counter
Add per-channel and global ethtool counters for channel events.
Each event indicates an interrupt on one of the channel's
completion queues.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Sun, 4 Mar 2018 12:25:00 +0000 (14:25 +0200)]
net/mlx5e: Add a counter for congested UMRs
Add per-ring and global ethtool counters for congested UMR requests.
These events indicate congestion in UMR handlers in HW.
Such event is concluded when there's an outstanding UMR post,
yet the SW consumed at least two additional MPWQEs in the meanwhile.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Wed, 2 May 2018 15:29:42 +0000 (18:29 +0300)]
net/mlx5e: Add NAPI statistics
Add per-channel and global ethtool counters for NAPI.
This helps us monitor and analyze performance in general.
- ch[i]_poll:
the number of times the channel's NAPI poll was invoked.
- ch[i]_arm:
the number of times the channel's NAPI poll completed
and armed the completion queues.
- ch[i]_aff_change:
the number of times the channel's NAPI poll explicitly
stopped execution on a cpu due to a change in affinity.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Sun, 4 Mar 2018 08:35:00 +0000 (10:35 +0200)]
net/mlx5e: Add XDP_TX completions statistics
Add per-ring and global ethtool counters for XDP_TX completions.
This helps us monitor and analyze XDP_TX flow performance.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Wed, 18 Apr 2018 10:33:15 +0000 (13:33 +0300)]
net/mlx5e: Add TX completions statistics
Add per-ring and global ethtool counters for TX completions.
This helps us monitor and analyze TX flow performance.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Sun, 3 Jun 2018 14:41:48 +0000 (17:41 +0300)]
net/mlx5e: RX, Use existing WQ local variable
Local variable 'wq' already points to &sq->wq, use it.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tariq Toukan [Tue, 5 Jun 2018 08:47:04 +0000 (11:47 +0300)]
net/mlx5e: Convert large order kzalloc allocations to kvzalloc
Replace calls to kzalloc_node with kvzalloc_node, as it fallsback
to lower-order pages if the higher-order trials fail.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Boris Pismenny [Mon, 11 Jun 2018 14:24:58 +0000 (17:24 +0300)]
net/mlx5e: Add UDP GSO remaining counter
This patch adds a counter for tx UDP GSO packets that contain a segment
that is not aligned to MSS - remaining segment.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Boris Pismenny [Thu, 31 May 2018 12:29:42 +0000 (15:29 +0300)]
net/mlx5e: Add UDP GSO support
This patch enables UDP GSO support. We enable this by using two WQEs
the first is a UDP LSO WQE for all segments with equal length, and the
second is for the last segment in case it has different length.
Due to HW limitation, before sending, we must adjust the packet length fields.
We measure performance between two Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
machines connected back-to-back with Connectx4-Lx (40Gbps) NICs.
We compare single stream UDP, UDP GSO and UDP GSO with offload.
Performance:
| MSS (bytes) | Throughput (Gbps) | CPU utilization (%)
UDP GSO offload | 1472 | 35.6 | 8%
UDP GSO | 1472 | 25.5 | 17%
UDP | 1472 | 10.2 | 17%
UDP GSO offload | 1024 | 35.6 | 8%
UDP GSO | 1024 | 19.2 | 17%
UDP | 1024 | 5.7 | 17%
UDP GSO offload | 512 | 33.8 | 16%
UDP GSO | 512 | 10.4 | 17%
UDP | 512 | 3.5 | 17%
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
David S. Miller [Thu, 28 Jun 2018 13:21:33 +0000 (22:21 +0900)]
Merge branch 'net-preserve-sock-reference-when-scrubbing-the-skb'
Flavio Leitner says:
====================
net: preserve sock reference when scrubbing the skb.
The sock reference is lost when scrubbing the packet and that breaks
TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
performance impacts of about 50% in a single TCP stream when crossing
network namespaces.
XPS breaks because the queue mapping stored in the socket is not
available, so another random queue might be selected when the stack
needs to transmit something like a TCP ACK, or TCP Retransmissions.
That causes packet re-ordering and/or performance issues.
TSQ breaks because it orphans the packet while it is still in the
host, so packets are queued contributing to the buffer bloat problem.
Preserving the sock reference fixes both issues. The socket is
orphaned anyways in the receiving path before any relevant action,
but the transmit side needs some extra checking included in the
first patch.
The first patch will update netfilter to check if the socket
netns is local before use it.
The second patch removes the skb_orphan() from the skb_scrub_packet()
and improve the documentation.
ChangeLog:
- split into two (Eric)
- addressed Paolo's offline feedback to swap the checks in xt_socket.c
to preserve original behavior.
- improved ip-sysctl.txt (reported by Cong)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Flavio Leitner [Wed, 27 Jun 2018 13:34:26 +0000 (10:34 -0300)]
skbuff: preserve sock reference when scrubbing the skb.
The sock reference is lost when scrubbing the packet and that breaks
TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
performance impacts of about 50% in a single TCP stream when crossing
network namespaces.
XPS breaks because the queue mapping stored in the socket is not
available, so another random queue might be selected when the stack
needs to transmit something like a TCP ACK, or TCP Retransmissions.
That causes packet re-ordering and/or performance issues.
TSQ breaks because it orphans the packet while it is still in the
host, so packets are queued contributing to the buffer bloat problem.
Preserving the sock reference fixes both issues. The socket is
orphaned anyways in the receiving path before any relevant action
and on TX side the netfilter checks if the reference is local before
use it.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flavio Leitner [Wed, 27 Jun 2018 13:34:25 +0000 (10:34 -0300)]
netfilter: check if the socket netns is correct.
Netfilter assumes that if the socket is present in the skb, then
it can be used because that reference is cleaned up while the skb
is crossing netns.
We want to change that to preserve the socket reference in a future
patch, so this is a preparation updating netfilter to check if the
socket netns matches before use it.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Jun 2018 13:12:03 +0000 (22:12 +0900)]
Merge branch 'net-sched-actions-code-style-cleanup-and-fixes'
Roman Mashak says:
====================
net sched actions: code style cleanup and fixes
The patchset fixes a few code stylistic issues and typos, as well as one
detected by sparse semantic checker tool.
No functional changes introduced.
Patch 1 & 2 fix coding style bits caught by the checkpatch.pl script
Patch 3 fixes an issue with a shadowed variable
Patch 4 adds sizeof() operator instead of magic number for buffer length
Patch 5 fixes typos in diagnostics messages
Patch 6 explicitly sets unsigned char for bitwise operation
v2:
- submit for net-next
- added Reviewed-by tags
- use u8* instead of char* as per Davide Caratti suggestion
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Wed, 27 Jun 2018 17:33:35 +0000 (13:33 -0400)]
net sched actions: avoid bitwise operation on signed value in pedit
Since char can be unsigned or signed, and bitwise operators may have
implementation-dependent results when performed on signed operands,
declare 'u8 *' operand instead.
Suggested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Wed, 27 Jun 2018 17:33:34 +0000 (13:33 -0400)]
net sched actions: fix misleading text strings in pedit action
Change "tc filter pedit .." to "tc actions pedit .." in error
messages to clearly refer to pedit action.
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Wed, 27 Jun 2018 17:33:33 +0000 (13:33 -0400)]
net sched actions: use sizeof operator for buffer length
Replace constant integer with sizeof() to clearly indicate
the destination buffer length in skb_header_pointer() calls.
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Wed, 27 Jun 2018 17:33:32 +0000 (13:33 -0400)]
net sched actions: fix sparse warning
The variable _data in include/asm-generic/sections.h defines sections,
this causes sparse warning in pedit:
net/sched/act_pedit.c:293:35: warning: symbol '_data' shadows an earlier one
./include/asm-generic/sections.h:36:13: originally declared here
Therefore rename the variable.
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Wed, 27 Jun 2018 17:33:31 +0000 (13:33 -0400)]
net sched actions: fix coding style in pedit headers
Fix coding style issues in tc pedit headers detected by the
checkpatch script.
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Wed, 27 Jun 2018 17:33:30 +0000 (13:33 -0400)]
net sched actions: fix coding style in pedit action
Fix coding style issues in tc pedit action detected by the
checkpatch script.
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yousuk Seung [Wed, 27 Jun 2018 17:32:19 +0000 (10:32 -0700)]
netem: slotting with non-uniform distribution
Extend slotting with support for non-uniform distributions. This is
similar to netem's non-uniform distribution delay feature.
Commit
f043efeae2f1 ("netem: support delivering packets in delayed
time slots") added the slotting feature to approximate the behaviors
of media with packet aggregation but only supported a uniform
distribution for delays between transmission attempts. Tests with TCP
BBR with emulated wifi links with non-uniform distributions produced
more useful results.
Syntax:
slot dist DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
[bytes MAX_BYTES]
The syntax and use of the distribution table is the same as in the
non-uniform distribution delay feature. A file DISTRIBUTION must be
present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
NETEM_DIST_SCALE. A random value x is selected from the table and it
takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
supported.
Examples:
Normal distribution delay with mean = 800us and stdev = 100us.
> tc qdisc add dev eth0 root netem slot dist normal 800us 100us
Optionally set the max slot size in bytes and/or packets.
> tc qdisc add dev eth0 root netem slot dist normal 800us 100us \
bytes 64k packets 42
Signed-off-by: Yousuk Seung <ysseung@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 26 Jun 2018 19:39:18 +0000 (12:39 -0700)]
netlink: Return extack message if attribute validation fails
Have one extack message for parsing and validating.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brandon Maier [Tue, 26 Jun 2018 17:50:50 +0000 (12:50 -0500)]
net: phy: xgmiitorgmii: Check read_status results
We're ignoring the result of the attached phy device's read_status().
Return it so we can detect errors.
Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brandon Maier [Tue, 26 Jun 2018 17:50:49 +0000 (12:50 -0500)]
net: phy: xgmiitorgmii: Use correct mdio bus
The xgmiitorgmii is using the mii_bus of the device it's attached to,
instead of the bus it was given during probe.
Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brandon Maier [Tue, 26 Jun 2018 17:50:48 +0000 (12:50 -0500)]
net: phy: xgmiitorgmii: Check phy_driver ready before accessing
Since a phy_device is added to the global mdio_bus list during
phy_device_register(), but a phy_device's phy_driver doesn't get
attached until phy_probe(). It's possible of_phy_find_device() in
xgmiitorgmii will return a valid phy with a NULL phy_driver. Leading to
a NULL pointer access during the memcpy().
Fixes this Oops:
Unable to handle kernel NULL pointer dereference at virtual address
00000000
pgd =
c0004000
[
00000000] *pgd=
00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.40 #1
Hardware name: Xilinx Zynq Platform
task:
ce4c8d00 task.stack:
ce4ca000
PC is at memcpy+0x48/0x330
LR is at xgmiitorgmii_probe+0x90/0xe8
pc : [<
c074bc68>] lr : [<
c0529548>] psr:
20000013
sp :
ce4cbb54 ip :
00000000 fp :
ce4cbb8c
r10:
00000000 r9 :
00000000 r8 :
c0c49178
r7 :
00000000 r6 :
cdc14718 r5 :
ce762800 r4 :
cdc14710
r3 :
00000000 r2 :
00000054 r1 :
00000000 r0 :
cdc14718
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control:
18c5387d Table:
0000404a DAC:
00000051
Process swapper/0 (pid: 1, stack limit = 0xce4ca210)
...
[<
c074bc68>] (memcpy) from [<
c0529548>] (xgmiitorgmii_probe+0x90/0xe8)
[<
c0529548>] (xgmiitorgmii_probe) from [<
c0526a94>] (mdio_probe+0x28/0x34)
[<
c0526a94>] (mdio_probe) from [<
c04db98c>] (driver_probe_device+0x254/0x414)
[<
c04db98c>] (driver_probe_device) from [<
c04dbd58>] (__device_attach_driver+0xac/0x10c)
[<
c04dbd58>] (__device_attach_driver) from [<
c04d96f4>] (bus_for_each_drv+0x84/0xc8)
[<
c04d96f4>] (bus_for_each_drv) from [<
c04db5bc>] (__device_attach+0xd0/0x134)
[<
c04db5bc>] (__device_attach) from [<
c04dbdd4>] (device_initial_probe+0x1c/0x20)
[<
c04dbdd4>] (device_initial_probe) from [<
c04da8fc>] (bus_probe_device+0x98/0xa0)
[<
c04da8fc>] (bus_probe_device) from [<
c04d8660>] (device_add+0x43c/0x5d0)
[<
c04d8660>] (device_add) from [<
c0526cb8>] (mdio_device_register+0x34/0x80)
[<
c0526cb8>] (mdio_device_register) from [<
c0580b48>] (of_mdiobus_register+0x170/0x30c)
[<
c0580b48>] (of_mdiobus_register) from [<
c05349c4>] (macb_probe+0x710/0xc00)
[<
c05349c4>] (macb_probe) from [<
c04dd700>] (platform_drv_probe+0x44/0x80)
[<
c04dd700>] (platform_drv_probe) from [<
c04db98c>] (driver_probe_device+0x254/0x414)
[<
c04db98c>] (driver_probe_device) from [<
c04dbc58>] (__driver_attach+0x10c/0x118)
[<
c04dbc58>] (__driver_attach) from [<
c04d9600>] (bus_for_each_dev+0x8c/0xd0)
[<
c04d9600>] (bus_for_each_dev) from [<
c04db1fc>] (driver_attach+0x2c/0x30)
[<
c04db1fc>] (driver_attach) from [<
c04daa98>] (bus_add_driver+0x50/0x260)
[<
c04daa98>] (bus_add_driver) from [<
c04dc440>] (driver_register+0x88/0x108)
[<
c04dc440>] (driver_register) from [<
c04dd6b4>] (__platform_driver_register+0x50/0x58)
[<
c04dd6b4>] (__platform_driver_register) from [<
c0b31248>] (macb_driver_init+0x24/0x28)
[<
c0b31248>] (macb_driver_init) from [<
c010203c>] (do_one_initcall+0x60/0x1a4)
[<
c010203c>] (do_one_initcall) from [<
c0b00f78>] (kernel_init_freeable+0x15c/0x1f8)
[<
c0b00f78>] (kernel_init_freeable) from [<
c0763d10>] (kernel_init+0x18/0x124)
[<
c0763d10>] (kernel_init) from [<
c0112d74>] (ret_from_fork+0x14/0x20)
Code:
ba000002 f5d1f03c f5d1f05c f5d1f07c (
e8b151f8)
---[ end trace
3e4ec21905820a1f ]---
Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Jun 2018 07:10:08 +0000 (16:10 +0900)]
Merge branch 'ipsec-selftests-updates'
Shannon Nelson says:
====================
Updates for ipsec selftests
Fix up the existing ipsec selftest and add tests for
the ipsec offload driver API.
v2: addressed formatting nits in netdevsim from Jakub Kicinski
v3: a couple more nits from Jakub
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 26 Jun 2018 17:07:55 +0000 (10:07 -0700)]
selftests: rtnetlink: add ipsec offload API test
Using the netdevsim as a device for testing, try out the XFRM commands
for setting up IPsec hardware offloads.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 26 Jun 2018 17:07:54 +0000 (10:07 -0700)]
netdevsim: add ipsec offload testing
Implement the IPsec/XFRM offload API for testing.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 26 Jun 2018 17:07:53 +0000 (10:07 -0700)]
selftests: rtnetlink: use dummydev as a test device
We really shouldn't mess with local system settings, so let's
use the already created dummy device instead for ipsec testing.
Oh, and let's put the temp file into a proper directory.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 26 Jun 2018 17:07:52 +0000 (10:07 -0700)]
selftests: rtnetlink: clear the return code at start of ipsec test
Following the custom from the other functions, clear the global
ret code before starting the test so as to not have previously
failed tests cause us to thing this test has failed.
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Tue, 26 Jun 2018 16:41:36 +0000 (18:41 +0200)]
l2tp: define helper for parsing struct sockaddr_pppol2tp*
'sockaddr_len' is checked against various values when entering
pppol2tp_connect(), to verify its validity. It is used again later, to
find out which sockaddr structure was passed from user space. This
patch combines these two operations into one new function in order to
simplify pppol2tp_connect().
A new structure, l2tp_connect_info, is used to pass sockaddr data back
to pppol2tp_connect(), to avoid passing too many parameters to
l2tp_sockaddr_get_info(). Also, the first parameter is void* in order
to avoid casting between all sockaddr_* structures manually.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 26 Jun 2018 15:45:49 +0000 (08:45 -0700)]
tcp: remove one indentation level in tcp_create_openreq_child
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Tue, 26 Jun 2018 15:42:33 +0000 (18:42 +0300)]
sh_eth: fix *enum* {A|M}PR_BIT
The *enum* {A|M}PR_BIT were declared in the commit
86a74ff21a7a ("net:
sh_eth: add support for Renesas SuperH Ethernet") adding SH771x support,
however the SH771x manual doesn't have the APR/MPR registers described
and the code writing to them for SH7710 was later removed by the commit
380af9e390ec ("net: sh_eth: CPU dependency code collect to "struct
sh_eth_cpu_data""). All the newer SoC manuals have these registers
documented as having a 16-bit TIME parameter of the PAUSE frame, not
1-bit -- update the *enum* accordingly, fixing up the APR/MPR writes...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keara Leibovitz [Tue, 26 Jun 2018 14:16:28 +0000 (10:16 -0400)]
tc-tests: add an extreme-case csum action test
Added an extreme-case test for all 7 csum action headers.
Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Jun 2018 05:18:49 +0000 (14:18 +0900)]
Merge branch 'mscc-ocelot-add-more-features'
Alexandre Belloni says:
====================
net: mscc: ocelot: add more features
This series adds link aggregation and VLAN filtering hardware offload
support to the ocelot driver.
PTP support will be sent later.
changes in v2:
- rebased on v4.18-rc1
- check for aggregation type and only offload it when type is hash (balance-xor
or 802.3ad)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Tue, 26 Jun 2018 12:28:49 +0000 (14:28 +0200)]
net: mscc: ocelot: add VLAN filtering
Add hardware VLAN filtering offloading on ocelot.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexandre Belloni [Tue, 26 Jun 2018 12:28:48 +0000 (14:28 +0200)]
net: mscc: ocelot: add bonding support
Add link aggregation hardware offload support for Ocelot.
ocelot_get_link_ksettings() is not great but it does work until the driver
is reworked to switch to phylink.
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Tue, 26 Jun 2018 09:21:13 +0000 (14:51 +0530)]
cxgb4: Add new T5 PCI device id 0x50ae
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Casey Leedom [Tue, 26 Jun 2018 09:18:48 +0000 (14:48 +0530)]
cxgb4: Add flag tc_flower_initialized
Add flag tc_flower_initialized to indicate the
completion if tc flower initialization.
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Tue, 26 Jun 2018 03:32:53 +0000 (20:32 -0700)]
neighbour: force neigh_invalidate when NUD_FAILED update is from admin
In systems where neigh gc thresh holds are set to high values,
admin deleted neigh entries (eg ip neigh flush or ip neigh del) can
linger around in NUD_FAILED state for a long time until periodic gc kicks
in. This patch forces neigh_invalidate when NUD_FAILED neigh_update is
from an admin.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 Jun 2018 01:42:13 +0000 (10:42 +0900)]
Merge branch 'Multipath-tests-for-tunnel-devices'
Petr Machata says:
====================
Multipath tests for tunnel devices
This patchset adds a test for ECMP and weighted ECMP between two GRE
tunnels.
In patches #1 and #2, the function multipath_eval() is first moved from
router_multipath.sh to lib.sh for ease of reuse, and then fixed up.
In patch #3, the function tc_rule_stats_get() is parameterized to be
useful for egress rules as well.
In patch #4, a new function __simple_if_init() is extracted from
simple_if_init(). This covers the logic that needs to be done for the
usual interface: VRF migration, upping and installation of IP addresses.
Patch #5 then adds the test itself.
Additionally in patch #6, a requirement to add diagrams to selftests is
documented.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 26 Jun 2018 00:08:17 +0000 (02:08 +0200)]
selftests: forwarding: README: Require diagrams
ASCII art diagrams are well suited for presenting the topology that a
test uses while being easy to embed directly in the test file iteslf.
They make the information very easy to grasp even for simple topologies,
and for more complex ones they are almost essential, as figuring out the
interconnects from the script itself proves to be difficult.
Therefore state the requirement for topology ASCII art in README.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 26 Jun 2018 00:08:05 +0000 (02:08 +0200)]
selftests: forwarding: Test multipath tunneling
Add a GRE-tunneling test such that there are two tunnels involved, with
a multipath route listing both as next hops. Similarly to
router_multipath.sh, test that the distribution of traffic to the
tunnels honors the configured weights.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 26 Jun 2018 00:08:00 +0000 (02:08 +0200)]
selftests: forwarding: lib: Extract interface-init functions
The function simple_if_init() does two things: it creates a VRF, then
moves an interface into this VRF and configures addresses. The latter
comes in handy when adding more interfaces into a VRF later on. The
situation is similar for simple_if_fini().
Therefore split the interface remastering and address de/initialization
logic to a new pair of helpers __simple_if_init() / __simple_if_fini(),
and defer to these helpers from simple_if_init() and simple_if_fini().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 26 Jun 2018 00:07:45 +0000 (02:07 +0200)]
selftests: forwarding: tc_rule_stats_get: Parameterize direction
The GRE multipath tests need stats on an egress counter. Change
tc_rule_stats_get() to take direction as an optional argument, with
default of ingress.
Take the opportunity to change line continuation character from | to \.
Move the | to the next line, which indent.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 26 Jun 2018 00:07:08 +0000 (02:07 +0200)]
selftests: forwarding: multipath_eval(): Improve style
- Change the indentation of the function body from 7 spaces to one tab.
- Move initialization of weights_ratio up so that it can be referenced
from the error message about packet difference being zero.
- Move |'s consistently to continuation line, which reindent.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 26 Jun 2018 00:06:06 +0000 (02:06 +0200)]
selftests: forwarding: Move multipath_eval() to lib.sh
This function will be useful for the GRE multipath test that is coming
later.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Mon, 25 Jun 2018 23:55:05 +0000 (16:55 -0700)]
net/tls: Remove VLA usage on nonce
It looks like the prior VLA removal, commit
b16520f7493d ("net/tls: Remove
VLA usage"), and a new VLA addition, commit
c46234ebb4d1e ("tls: RX path
for ktls"), passed in the night. This removes the newly added VLA, which
happens to have its bounds based on the same max value.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Mon, 25 Jun 2018 23:20:32 +0000 (01:20 +0200)]
selftests: forwarding: mirror_gre_vlan_bridge_1q: Unset rp_filter
The IP addresses of tunnel endpoint at H3 are set at the VLAN device
$h3.555. Therefore when test_gretap_untagged_egress() sets vlan 555 to
egress untagged at $swp3, $h3's rp_filter rejects these packets. The
test then spuriously fails.
Therefore turn off net.ipv4.conf.{all, $h3}.rp_filter.
Fixes: 9c7c8a82442c ("selftests: forwarding: mirror_gre_vlan_bridge_1q: Add more tests")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Mon, 25 Jun 2018 22:49:49 +0000 (15:49 -0700)]
mdio-mux-gpio: Remove VLA usage
In the quest to remove all stack VLA usage from the kernel[1], this
allocates the values buffer during the callback instead of putting it
on the stack.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 26 Jun 2018 14:21:33 +0000 (23:21 +0900)]
Merge branch 'net-sched-support-replay-of-filter-offload-when-binding-to-block'
Jakub Kicinski says:
====================
net: sched: support replay of filter offload when binding to block
This series from John adds the ability to replay filter offload requests
when new offload callback is being registered on a TC block. This is most
likely to take place for shared blocks today, when a block which already
has rules is bound to another interface. Prior to this patch set if any
of the rules were offloaded the block bind would fail.
A new tcf_proto_op is added to generate a filter-specific offload request.
The new 'offload' op is supporting extack from day 0, hence we need to
propagate extack to .ndo_setup_tc TC_BLOCK_BIND/TC_BLOCK_UNBIND and
through tcf_block_cb_register() to tcf_block_playback_offloads().
The immediate use of this patch set is to simplify life of drivers which
require duplicating rules when sharing blocks. Switch drivers (mlxsw)
can bind ports to rule lists dynamically, NIC drivers generally don't
have that ability and need the rules to be duplicated for each ingress
they match on. In code terms this means that switch drivers don't
register multiple callbacks for each port. NIC drivers do, and get a
separate request and hance rule per-port, as if the block was not shared.
The registration fails today, however, if some rules were already present.
As John notes in description of patch 7, drivers which register multiple
callbacks to shared blocks will likely need to flush the rules on block
unbind. This set makes the core not only replay the the offload add
requests but also offload remove requests when callback is unregistered.
v2:
- name parameters in patch 2;
- use unsigned int instead of u32 for in_hw_coun;
- improve extack message in patch 7.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Mon, 25 Jun 2018 21:30:10 +0000 (14:30 -0700)]
net: sched: call reoffload op on block callback reg
Call the reoffload tcf_proto_op on all tcf_proto nodes in all chains of a
block when a callback tries to register to a block that already has
offloaded rules. If all existing rules cannot be offloaded then the
registration is rejected. This replaces the previous policy of rejecting
such callback registration outright.
On unregistration of a callback, the rules are flushed for that given cb.
The implementation of block sharing in the NFP driver, for example,
duplicates shared rules to all devs bound to a block. This meant that
rules could still exist in hw even after a device is unbound from a block
(assuming the block still remains active).
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Mon, 25 Jun 2018 21:30:09 +0000 (14:30 -0700)]
net: sched: cls_bpf: implement offload tcf_proto_op
Add the offload tcf_proto_op in cls_bpf to generate an offload message for
each bpf prog in the given tcf_proto. Call the specified callback with
this new offload message. The function only returns an error if the
callback rejects adding a 'hardware only' prog.
A prog contains a flag to indicate if it is in hardware or not. To
ensure the offload function properly maintains this flag, keep a reference
counter for the number of instances of the prog that are in hardware. Only
update the flag when this counter changes from or to 0.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Mon, 25 Jun 2018 21:30:08 +0000 (14:30 -0700)]
net: sched: cls_u32: implement offload tcf_proto_op
Add the offload tcf_proto_op in cls_u32 to generate an offload message for
each filter and the hashtable in the given tcf_proto. Call the specified
callback with this new offload message. The function only returns an error
if the callback rejects adding a 'hardware only' rule.
A filter contains a flag to indicate if it is in hardware or not. To
ensure the offload function properly maintains this flag, keep a reference
counter for the number of instances of the filter that are in hardware.
Only update the flag when this counter changes from or to 0.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Mon, 25 Jun 2018 21:30:07 +0000 (14:30 -0700)]
net: sched: cls_matchall: implement offload tcf_proto_op
Add the reoffload tcf_proto_op in matchall to generate an offload message
for each filter in the given tcf_proto. Call the specified callback with
this new offload message. The function only returns an error if the
callback rejects adding a 'hardware only' rule.
Ensure matchall flags correctly report if the rule is in hw by keeping a
reference counter for the number of instances of the rule offloaded. Only
update the flag when this counter changes from or to 0.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Mon, 25 Jun 2018 21:30:06 +0000 (14:30 -0700)]
net: sched: cls_flower: implement offload tcf_proto_op
Add the reoffload tcf_proto_op in flower to generate an offload message
for each filter in the given tcf_proto. Call the specified callback with
this new offload message. The function only returns an error if the
callback rejects adding a 'hardware only' rule.
A filter contains a flag to indicate if it is in hardware or not. To
ensure the reoffload function properly maintains this flag, keep a
reference counter for the number of instances of the filter that are in
hardware. Only update the flag when this counter changes from or to 0. Add
a generic helper function to implement this behaviour.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Mon, 25 Jun 2018 21:30:05 +0000 (14:30 -0700)]
net: sched: add tcf_proto_op to offload a rule
Create a new tcf_proto_op called 'reoffload' that generates a new offload
message for each node in a tcf_proto. Pointers to the tcf_proto and
whether the offload request is to add or delete the node are included.
Also included is a callback function to send the offload message to and
the option of priv data to go with the cb.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Mon, 25 Jun 2018 21:30:04 +0000 (14:30 -0700)]
net: sched: pass extack pointer to block binds and cb registration
Pass the extact struct from a tc qdisc add to the block bind function and,
in turn, to the setup_tc ndo of binding device via the tc_block_offload
struct. Pass this back to any block callback registrations to allow
netlink logging of fails in the bind process.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>