Jarno Rajahalme [Fri, 18 Nov 2016 23:40:39 +0000 (15:40 -0800)]
virtio_net.h: Fix comment.
Fix incorrent comment after the final #endif.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Fri, 18 Nov 2016 23:40:38 +0000 (15:40 -0800)]
virtio_net: Simplify call sites for virtio_net_hdr_{from, to}_skb().
No point storing the return value of virtio_net_hdr_to_skb() or
virtio_net_hdr_from_skb() to a variable when the value is used only
once as a boolean in an immediately following if statement.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai [Fri, 18 Nov 2016 11:07:40 +0000 (16:37 +0530)]
cxgb4: Allocate Tx queues dynamically
Allocate resources dynamically for Upper layer driver's (ULD) like
cxgbit, iw_cxgb4, cxgb4i and chcr. The resources allocated include Tx
queues which are allocated when ULD register with cxgb4 driver and freed
while un-registering. The Tx queues which are shared by ULD shall be
allocated by first registering driver and un-allocated by last
unregistering driver.
Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 18 Nov 2016 11:47:35 +0000 (14:47 +0300)]
liquidio CN23XX: bitwise vs logical AND typo
We obviously intended a bitwise AND here, not a logical one.
Fixes: 8c978d059224 ("liquidio CN23XX: Mailbox support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Woojung Huh [Thu, 17 Nov 2016 22:10:02 +0000 (22:10 +0000)]
lan78xx: relocate mdix setting to phy driver
Relocate mdix code to phy driver to be called at config_init().
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2016 18:54:22 +0000 (13:54 -0500)]
Merge branch 'net-marvell-freescale-compile-test'
Florian Fainelli says:
====================
net: Enable COMPILE_TEST for Marvell & Freescale drivers
This patch series allows building the Freescale and Marvell Ethernet network
drivers with COMPILE_TEST.
Changes in v4:
- add proper HAS_DMA to fix build errors on m32r
- provide an inline stub for mvebu_mbus_get_dram_win_info
- added an additional patch to fix build errors with mv88e6xxx on m32r
Changes in v3:
- reorder patches to avoid introducing a build warning between commits
Changes in v2:
- rename register define clash when building for i386 (spotted by LKP)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 17 Nov 2016 19:19:14 +0000 (11:19 -0800)]
net: dsa: mv88e6xxx: Select IRQ_DOMAIN
Some architectures may not define IRQ_DOMAIN (like m32r), fixes
undefined references to IRQ_DOMAIN functions.
Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 17 Nov 2016 19:19:13 +0000 (11:19 -0800)]
net: marvell: Allow drivers to be built with COMPILE_TEST
All Marvell Ethernet drivers actually build fine with COMPILE_TEST with
a few warnings. We need to add a few HAS_DMA dependencies to fix linking
failures on problematic architectures like m32r.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 17 Nov 2016 19:19:12 +0000 (11:19 -0800)]
bus: mvebu-bus: Provide inline stub for mvebu_mbus_get_dram_win_info
In preparation for allowing CONFIG_MVNETA_BM to build with COMPILE_TEST,
provide an inline stub for mvebu_mbus_get_dram_win_info().
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 17 Nov 2016 19:19:11 +0000 (11:19 -0800)]
net: fsl: Allow most drivers to be built with COMPILE_TEST
There are only a handful of Freescale Ethernet drivers that don't
actually build with COMPILE_TEST:
* FEC, for which we would need to define a default register layout if no
supported architecture is defined
* UCC_GETH which depends on PowerPC cpm.h header (which could be moved
to a generic location)
* GIANFAR needs to depend on HAS_DMA to fix linking failures on some
architectures (like m32r)
We need to fix an unmet dependency to get there though:
warning: (FSL_XGMAC_MDIO) selects OF_MDIO which has unmet direct
dependencies (OF && PHYLIB)
which would result in CONFIG_OF_MDIO=[ym] without CONFIG_OF to be set.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 17 Nov 2016 19:19:10 +0000 (11:19 -0800)]
net: gianfar_ptp: Rename FS bit to FIPERST
FS is a global symbol used by the x86 32-bit architecture, fixes builds
re-definitions:
>> drivers/net/ethernet/freescale/gianfar_ptp.c:75:0: warning: "FS"
>> redefined
#define FS (1<<28) /* FIPER start indication */
In file included from arch/x86/include/uapi/asm/ptrace.h:5:0,
from arch/x86/include/asm/ptrace.h:6,
from arch/x86/include/asm/math_emu.h:4,
from arch/x86/include/asm/processor.h:11,
from include/linux/mutex.h:19,
from include/linux/kernfs.h:13,
from include/linux/sysfs.h:15,
from include/linux/kobject.h:21,
from include/linux/device.h:17,
from
drivers/net/ethernet/freescale/gianfar_ptp.c:23:
arch/x86/include/uapi/asm/ptrace-abi.h:15:0: note: this is the
location of the previous definition
#define FS 9
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Thu, 17 Nov 2016 14:43:37 +0000 (08:43 -0600)]
amd-xgbe: Update connection validation for backplane mode
Update the connection type enumeration for backplane mode and return
an error when there is a mismatch between the mode and the connection
type.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2016 17:12:15 +0000 (12:12 -0500)]
Merge branch 'ethtool-phy-downshift'
Allan W. Nielsen says:
====================
Adding PHY-Tunables and downshift support
(This is a re-post of the v3 patch set with a new cover letter - I was not
aware that the cover letters was used a commit comments in merge commits).
This series add support for PHY tunables, and uses this facility to
configure downshifting. The downshifting mechanism is implemented for MSCC
phys.
This series tries to address the comments provided back in mid October when
this feature was posted along with fast-link-failure. Fast-link-failure has
been separated out, but we would like to pick continue on that if/when we
agree on how the phy-tunables and downshifting should be done.
The proposed generic interface is similar to
ETHTOOL_GTUNABLE/ETHTOOL_STUNABLE, it uses the same type
(ethtool_tunable/tunable_type_id) but a new enum (phy_tunable_id) is added
to reflect the PHY tunable.
The implementation just call the newly added function pointers in
get_tunable/set_tunable phy_device structure.
To configure downshifting, the ethtool_tunable structure is used. 'id' must
be set to 'ETHTOOL_PHY_DOWNSHIFT', 'type_id' must be set to
'ETHTOOL_TUNABLE_U8' and 'data' value configure the amount of downshift
re-tries.
If configured to DOWNSHIFT_DEV_DISABLE, then downshift is disabled If
configured to DOWNSHIFT_DEV_DEFAULT_COUNT, then it is up to the device to
choose a device-specific re-try count.
Tested on Beaglebone Black with VSC 8531 PHY.
Change set:
v0:
- Link Speed downshift and Fast Link failure-2 features coded by using
Device tree.
v1:
- Split the Downshift and FLF2 features in different set of patches.
- Removed DT access and implemented IOCTL access suggested by Andrew.
- Added function pointers in get_tunable/set_tunable phy_device structure
v2:
- Added trace message with a hist is printed when downshifting clould not
be eanbled with the requested count
- (ethtool) Syntax is changed from "--set-phy-tunable downshift on|off|%d"
to "--set-phy-tunable [downshift on|off [count N]]" - as requested by
Andrew.
v3:
- Fixed Spelling in "net: phy: Add downshift get/set support in Microsemi
PHYs driver"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Thu, 17 Nov 2016 12:07:24 +0000 (13:07 +0100)]
net: phy: Add downshift get/set support in Microsemi PHYs driver
Implements the phy tunable function pointers and implement downshift
functionality for MSCC PHYs.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Thu, 17 Nov 2016 12:07:23 +0000 (13:07 +0100)]
ethtool: Core impl for ETHTOOL_PHY_DOWNSHIFT tunable
Adding validation support for the ETHTOOL_PHY_DOWNSHIFT. Functional
implementation needs to be done in the individual PHY drivers.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Thu, 17 Nov 2016 12:07:22 +0000 (13:07 +0100)]
ethtool: (uapi) Add ETHTOOL_PHY_DOWNSHIFT to PHY tunables
For operation in cabling environments that are incompatible with
1000BASE-T, PHY device may provide an automatic link speed downshift
operation. When enabled, the device automatically changes its 1000BASE-T
auto-negotiation to the next slower speed after a configured number of
failed attempts at 1000BASE-T. This feature is useful in setting up in
networks using older cable installations that include only pairs A and B,
and not pairs C and D.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Thu, 17 Nov 2016 12:07:21 +0000 (13:07 +0100)]
ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE
Adding get_tunable/set_tunable function pointer to the phy_driver
structure, and uses these function pointers to implement the
ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE ioctls.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Thu, 17 Nov 2016 12:07:20 +0000 (13:07 +0100)]
ethtool: (uapi) Add ETHTOOL_PHY_GTUNABLE and ETHTOOL_PHY_STUNABLE
Defines a generic API to get/set phy tunables. The API is using the
existing ethtool_tunable/tunable_type_id types which is already being used
for mac level tunables.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2016 17:08:58 +0000 (12:08 -0500)]
Merge branch 'mlx5-next'
Saeed Mahameed says:
====================
Mellanox 100G mlx5 update 2016-11-15
This series contains four humble mlx5 features.
From Gal,
- Add the support for PCIe statistics and expose them in ethtool
From Huy,
- Add the support for port module events reporting and statistics
- Add the support for driver version setting into FW (for display purposes only)
From Mohamad,
- Extended the command interface cache flexibility
This series was generated against commit
6a02f5eb6a8a ("Merge branch 'mlxsw-i2c")
V2:
- Changed plain "unsigned" to "unsigned int"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Thu, 17 Nov 2016 11:46:02 +0000 (13:46 +0200)]
net/mlx5e: Expose PCIe statistics to ethtool
This patch exposes two groups of PCIe counters:
- Performance counters.
- Timers and states counters.
Queried with ethtool -S <devname>.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Thu, 17 Nov 2016 11:46:01 +0000 (13:46 +0200)]
net/mlx5: Add MPCNT register infrastructure
Add the needed infrastructure for future use of MPCNT register.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Thu, 17 Nov 2016 11:46:00 +0000 (13:46 +0200)]
net/mlx5: Set driver version into firmware
If driver_version capability bit is enabled, set driver version
to firmware after the init HCA command, for display purposes.
Example of driver version: "Linux,mlx5_core,3.0-1"
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Thu, 17 Nov 2016 11:45:59 +0000 (13:45 +0200)]
net/mlx5: Set driver version infrastructure
Add driver_version capability bit is enabled, and set driver
version command in mlx5_ifc firmware header. The only purpose
of this command is to store a driver version/OS string in FW
to be reported and displayed in various management systems,
such as IPMI/BMC.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Thu, 17 Nov 2016 11:45:58 +0000 (13:45 +0200)]
net/mlx5e: Add port module event counters to ethtool stats
Add port module event counters to ethtool -S command
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Thu, 17 Nov 2016 11:45:57 +0000 (13:45 +0200)]
net/mlx5: Add handling for port module event
For each asynchronous port module event:
1. print with ratelimit to the dmesg log
2. increment the corresponding event counter
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Thu, 17 Nov 2016 11:45:56 +0000 (13:45 +0200)]
net/mlx5: Port module event hardware structures
Add hardware structures and constants definitions needed for module
events support.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mohamad Haj Yahia [Thu, 17 Nov 2016 11:45:55 +0000 (13:45 +0200)]
net/mlx5: Make the command interface cache more flexible
Add more cache command size sets and more entries for each set based on
the current commands set different sizes and commands frequency.
Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 18 Nov 2016 16:55:39 +0000 (11:55 -0500)]
Merge branch 'sfc-tso-v2'
Edward Cree says:
====================
sfc: Firmware-Assisted TSO version 2
The firmware on 8000 series SFC NICs supports a new TSO API ("FATSOv2"), and
7000 series NICs will also support this in an imminent release. This series
adds driver support for this TSO implementation.
The series also removes SWTSO, as it's now equivalent to GSO. This does not
actually remove very much code, because SWTSO was grotesquely intertwingled
with FATSOv1, which will also be removed once 7000 series supports FATSOv2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 17 Nov 2016 10:52:36 +0000 (10:52 +0000)]
sfc: remove Software TSO
It gives no advantage over GSO now that xmit_more exists. If we find
ourselves unable to handle a TSO skb (because our TXQ doesn't have a
TSOv2 context and the NIC doesn't support TSOv1), hand it back to GSO.
Also do that if the TSO handler fails with EINVAL for any other reason.
As Falcon-architecture NICs don't support any firmware-assisted TSO,
they no longer advertise TSO feature flags at all.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 17 Nov 2016 10:52:07 +0000 (10:52 +0000)]
sfc: handle failure to allocate TSOv2 contexts
If we fail to init the TXQ because of insufficient TSOv2 contexts,
try again with TSOv2 disabled.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bert Kenward [Thu, 17 Nov 2016 10:51:54 +0000 (10:51 +0000)]
sfc: Firmware-Assisted TSO version 2
Add support for FATSOv2 to the driver. FATSOv2 offloads far more of the task
of TCP segmentation to the firmware, such that we now just pass a single
super-packet to the NIC. This means TSO has a great deal in common with a
normal DMA transmit, apart from adding a couple of option descriptors.
NIC-specific checks have been moved off the fast path and in to
initialisation where possible.
This also moves FATSOv1/SWTSO to a new file (tx_tso.c). The end of transmit
and some error handling is now outside TSO, since it is common with other
code.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 17 Nov 2016 10:51:39 +0000 (10:51 +0000)]
sfc: Update EF10 register definitions
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 17 Nov 2016 10:51:30 +0000 (10:51 +0000)]
sfc: Update MCDI protocol definitions
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Thu, 17 Nov 2016 01:58:21 +0000 (04:58 +0300)]
netns: make struct pernet_operations::id unsigned int
Make struct pernet_operations::id unsigned.
There are 2 reasons to do so:
1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.
2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.
"int" being used as an array index needs to be sign-extended
to 64-bit before being used.
void f(long *p, int i)
{
g(p[i]);
}
roughly translates to
movsx rsi, esi
mov rdi, [rsi+...]
call g
MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.
Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:
static inline void *net_generic(const struct net *net, int id)
{
...
ptr = ng->ptr[id - 1];
...
}
And this function is used a lot, so those sign extensions add up.
Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):
add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]
However, overall balance is in negative direction:
add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
function old new delta
nfsd4_lock 3886 3959 +73
tipc_link_build_proto_msg 1096 1140 +44
mac80211_hwsim_new_radio 2776 2808 +32
tipc_mon_rcv 1032 1058 +26
svcauth_gss_legacy_init 1413 1429 +16
tipc_bcbase_select_primary 379 392 +13
nfsd4_exchange_id 1247 1260 +13
nfsd4_setclientid_confirm 782 793 +11
...
put_client_renew_locked 494 480 -14
ip_set_sockfn_get 730 716 -14
geneve_sock_add 829 813 -16
nfsd4_sequence_done 721 703 -18
nlmclnt_lookup_host 708 686 -22
nfsd4_lockt 1085 1063 -22
nfs_get_client 1077 1050 -27
tcf_bpf_init 1106 1076 -30
nfsd4_encode_fattr 5997 5930 -67
Total: Before=
154856051, After=
154854321, chg -0.00%
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Nov 2016 17:10:42 +0000 (09:10 -0800)]
udp: enable busy polling for all sockets
UDP busy polling is restricted to connected UDP sockets.
This is because sk_busy_loop() only takes care of one NAPI context.
There are cases where it could be extended.
1) Some hosts receive traffic on a single NIC, with one RX queue.
2) Some applications use SO_REUSEPORT and associated BPF filter
to split the incoming traffic on one UDP socket per RX
queue/thread/cpu
3) Some UDP sockets are used to send/receive traffic for one flow, but
they do not bother with connect()
This patch records the napi_id of first received skb, giving more
reach to busy polling.
Tested:
lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done
Before patch :
27867 28870 37324 41060 41215
36764 36838 44455 41282 43843
After patch :
73920 73213 70147 74845 71697
68315 68028 75219 70082 73707
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Nov 2016 18:35:19 +0000 (13:35 -0500)]
Merge branch 'rds-ha-failover-fixes'
Sowmini Varadhan says:
====================
RDS: TCP: HA/Failover fixes
This series contains a set of fixes for bugs exposed when
we ran the following in a loop between a test machine pair:
while (1); do
# modprobe rds-tcp on test nodes
# run rds-stress in bi-dir mode between test machine pair
# modprobe -r rds-tcp on test nodes
done
rds-stress in bi-dir mode will cause both nodes to initiate
RDS-TCP connections at almost the same instant, exposing the
bugs fixed in this series.
Without the fixes, rds-stress reports sporadic packet drops,
and packets arriving out of sequence. After the fixes,we have
been able to run the test overnight, without any issues.
Each patch has a detailed description of the root-cause fixed
by the patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Wed, 16 Nov 2016 21:29:50 +0000 (13:29 -0800)]
RDS: TCP: Force every connection to be initiated by numerically smaller IP address
When 2 RDS peers initiate an RDS-TCP connection simultaneously,
there is a potential for "duelling syns" on either/both sides.
See commit
241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") for a description of this
condition, and the arbitration logic which ensures that the
numerically large IP address in the TCP connection is bound to the
RDS_TCP_PORT ("canonical ordering").
The rds_connection should not be marked as RDS_CONN_UP until the
arbitration logic has converged for the following reason. The sender
may start transmitting RDS datagrams as soon as RDS_CONN_UP is set,
and since the sender removes all datagrams from the rds_connection's
cp_retrans queue based on TCP acks. If the TCP ack was sent from
a tcp socket that got reset as part of duel aribitration (but
before data was delivered to the receivers RDS socket layer),
the sender may end up prematurely freeing the datagram, and
the datagram is no longer reliably deliverable.
This patch remedies that condition by making sure that, upon
receipt of 3WH completion state change notification of TCP_ESTABLISHED
in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP
if, and only if, the IP addresses and ports for the connection are
canonically ordered. In all other cases, rds_tcp_state_change will
force an rds_conn_path_drop(), and rds_queue_reconnect() on
both peers will restart the connection to ensure canonical ordering.
A side-effect of enforcing this condition in rds_tcp_state_change()
is that rds_tcp_accept_one_path() can now be refactored for simplicity.
It is also no longer possible to encounter an RDS_CONN_UP connection in
the arbitration logic in rds_tcp_accept_one().
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Wed, 16 Nov 2016 21:29:49 +0000 (13:29 -0800)]
RDS: TCP: Track peer's connection generation number
The RDS transport has to be able to distinguish between
two types of failure events:
(a) when the transport fails (e.g., TCP connection reset)
but the RDS socket/connection layer on both sides stays
the same
(b) when the peer's RDS layer itself resets (e.g., due to module
reload or machine reboot at the peer)
In case (a) both sides must reconnect and continue the RDS messaging
without any message loss or disruption to the message sequence numbers,
and this is achieved by rds_send_path_reset().
In case (b) we should reset all rds_connection state to the
new incarnation of the peer. Examples of state that needs to
be reset are next expected rx sequence number from, or messages to be
retransmitted to, the new incarnation of the peer.
To achieve this, the RDS handshake probe added as part of
commit
5916e2c1554f ("RDS: TCP: Enable multipath RDS for TCP")
is enhanced so that sender and receiver of the RDS ping-probe
will add a generation number as part of the RDS_EXTHDR_GEN_NUM
extension header. Each peer stores local and remote generation
numbers as part of each rds_connection. Changes in generation
number will be detected via incoming handshake probe ping
request or response and will allow the receiver to reset rds_connection
state.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Wed, 16 Nov 2016 21:29:48 +0000 (13:29 -0800)]
RDS: TCP: set RDS_FLAG_RETRANSMITTED in cp_retrans list
As noted in rds_recv_incoming() sequence numbers on data packets
can decreas for the failover case, and the Rx path is equipped
to recover from this, if the RDS_FLAG_RETRANSMITTED is set
on the rds header of an incoming message with a suspect sequence
number.
The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED
flag in the rds_message, so make sure the flag is set on messages
queued for retransmission.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 16 Nov 2016 19:09:41 +0000 (20:09 +0100)]
net: stmmac: replace if (netif_msg_type) by their netif_xxx counterpart
As sugested by Joe Perches, we could replace all
if (netif_msg_type(priv)) dev_xxx(priv->devices, ...)
by the simpler macro netif_xxx(priv, hw, priv->dev, ...)
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 16 Nov 2016 19:09:40 +0000 (20:09 +0100)]
net: stmmac: replace hardcoded function name by __func__
Some printing have the function name hardcoded.
It is better to use __func__ instead.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Wed, 16 Nov 2016 19:09:39 +0000 (20:09 +0100)]
net: stmmac: replace all pr_xxx by their netdev_xxx counterpart
The stmmac driver use lots of pr_xxx functions to print information.
This is bad since we cannot know which device logs the information.
(moreover if two stmmac device are present)
Furthermore, it seems that it assumes wrongly that all logs will always
be subsequent by using a dev_xxx then some indented pr_xxx like this:
kernel: sun7i-dwmac
1c50000.ethernet: no reset control found
kernel: Ring mode enabled
kernel: No HW DMA feature register supported
kernel: Normal descriptors
kernel: TX Checksum insertion supported
So this patch replace all pr_xxx by their netdev_xxx counterpart.
Excepts for some printing where netdev "cause" unpretty output like:
sun7i-dwmac
1c50000.ethernet (unnamed net_device) (uninitialized): no reset control found
In those case, I keep dev_xxx.
In the same time I remove some "stmmac:" print since
this will be a duplicate with that dev_xxx displays.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 17 Nov 2016 17:48:30 +0000 (09:48 -0800)]
net_sched: sch_fq: use hash_ptr()
When I wrote sch_fq.c, hash_ptr() on 64bit arches was awful,
and I chose hash_32().
Linus Torvalds and George Spelvin fixed this issue, so we can
use hash_ptr() to get more entropy on 64bit arches with Terabytes
of memory, and avoid the cast games.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Nov 2016 14:21:34 +0000 (06:21 -0800)]
net/mlx5e: remove napi_hash_del() calls
Calling napi_hash_del() after netif_napi_del() is pointless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Nov 2016 13:49:22 +0000 (05:49 -0800)]
net/mlx4_en: remove napi_hash_del() call
There is no need calling napi_hash_del()+synchronize_rcu() before
calling netif_napi_del()
netif_napi_del() does this already.
Using napi_hash_del() in a driver is useful only when dealing with
a batch of NAPI structures, so that a single synchronize_rcu() can
be used. mlx4_en_deactivate_cq() is deactivating a single NAPI.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Nov 2016 04:29:05 +0000 (23:29 -0500)]
Merge branch 'mlxsw-i2c'
Jiri Pirko says:
====================
mlxsw: Introduce support for I2C bus
Vadim says:
This patchset adds I2C access support for SwitchX, SwitchX2, SwitchIB,
SwitchIB2 and Spectrum silicones.
It contains:
- Small changes in mlxsw core code, needed for I2C bus support;
- I2C driver, which obtains I2C input/output mailboxes setting and
provides command interface implementation.
- Minimal driver, which works on top of I2C driver and allows running
of mlxsw command interface over I2C bus;
Use case:
On system, which does not have PCI to ASIC (BMC), hwmon functionality
(sensors, pwm, tacho) will be available through I2C.
Usage (manual probing):
echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device
Sysfs interface:
/sys/bus/i2c/devices/2-0048/hwmon/hwmon5/pwm1
/sys/bus/i2c/devices/2-0048/hwmon/hwmon5/temp1_input
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 16 Nov 2016 14:20:46 +0000 (15:20 +0100)]
mlxsw: minimal: Add I2C support for Mellanox ASICs
Add I2C access support for Mellanox ASICs:
- Virtual Protocol Interconnect switches SwitchX, SwitchX2,
providing InfiniBand, Ethernet and Fibre Channel connectivity;
- Infiniband switches SwitchIB, SwitchIB2:
- Ethernet switch Spectrum.
Example of probing activation:
echo mlxsw_minimal 0x48 > /sys/bus/i2c/devices/i2c-2/new_device
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 16 Nov 2016 14:20:45 +0000 (15:20 +0100)]
mlxsw: Invoke driver's init/fini methods only if defined
We are going to add a minimal driver on top of the mlxsw core
infrastructure, which will be mainly used for hardware monitoring in
Baseboard management controller (BMC) installations.
Unlike the switch drivers (e.g., spectrum, switchx2), this driver does not
initialize the ASIC and therefore doesn't need to implement the init() and
fini() methods in its 'mlxsw_driver' struct.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 16 Nov 2016 14:20:44 +0000 (15:20 +0100)]
mlxsw: Introduce support for I2C bus
Add I2C bus implementation for Mellanox Technologies Switch ASICs.
This includes command interface implementation using input / out mailboxes,
whose location is retrieved from the firmware during probe time.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Pasternak [Wed, 16 Nov 2016 14:20:43 +0000 (15:20 +0100)]
mlxsw: Add bus capability flag
The mlxsw core infrastructure currently assumes that communication with
the ASIC is always possible using Ethernet management datagrams (EMADs),
but this is only possible when the PCI bus is used.
The bus capability flag is added to indicate EMAD support and make core
initialize EMAD communication only when it's set. Otherwise, register
access is done using command interface.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Wed, 16 Nov 2016 10:43:33 +0000 (11:43 +0100)]
net: netcp: replace IS_ERR_OR_NULL by IS_ERR
knav_queue_open always returns an ERR_PTR value, never NULL. This can be
confirmed by unfolding the function calls and conforms to the function's
documentation. Thus, replace IS_ERR_OR_NULL by IS_ERR in error checks.
The change is made using the following semantic patch:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x;
statement S;
@@
x = knav_queue_open(...);
if (
- IS_ERR_OR_NULL
+ IS_ERR
(x)) S
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Tue, 15 Nov 2016 15:23:11 +0000 (23:23 +0800)]
sctp: use new rhlist interface on sctp transport rhashtable
Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
to hash a node to one chain. If in one host thousands of assocs connect
to one server with the same lport and different laddrs (although it's
not a normal case), all the transports would be hashed into the same
chain.
It may cause to keep returning -EBUSY when inserting a new node, as the
chain is too long and sctp inserts a transport node in a loop, which
could even lead to system hangs there.
The new rhlist interface works for this case that there are many nodes
with the same key in one chain. It puts them into a list then makes this
list be as a node of the chain.
This patch is to replace rhashtable_ interface with rhltable_ interface.
Since a chain would not be too long and it would not return -EBUSY with
this fix when inserting a node, the reinsert loop is also removed here.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 17 Nov 2016 04:11:07 +0000 (23:11 -0500)]
Merge branch 'bnxt_en-next'
Michael Chan says:
====================
bnxt_en: Updates.
New firmware spec. update, autoneg update, and UDP RSS support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 17 Nov 2016 02:13:10 +0000 (21:13 -0500)]
bnxt_en: Add ethtool -n|-N rx-flow-hash support.
To display and modify the RSS hash.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 17 Nov 2016 02:13:09 +0000 (21:13 -0500)]
bnxt_en: Add UDP RSS support for 57X1X chips.
The newer chips have proper support for 4-tuple UDP RSS.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 17 Nov 2016 02:13:08 +0000 (21:13 -0500)]
bnxt_en: Enhance autoneg support.
On some dual port NICs, the speed setting on one port can affect the
available speed on the other port. Add logic to detect these changes
and adjust the advertised speed settings when necessary.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 17 Nov 2016 02:13:07 +0000 (21:13 -0500)]
bnxt_en: Update firmware interface spec to 1.5.4.
Use the new FORCE_LINK_DWN bit to shutdown link during close.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Nov 2016 22:54:50 +0000 (14:54 -0800)]
netpoll: more efficient locking
Callers of netpoll_poll_lock() own NAPI_STATE_SCHED
Callers of netpoll_poll_unlock() have BH blocked between
the NAPI_STATE_SCHED being cleared and poll_lock is released.
We can avoid the spinlock which has no contention, and use cmpxchg()
on poll_owner which we need to set anyway.
This removes a possible lockdep violation after the cited commit,
since sk_busy_loop() re-enables BH before calling busy_poll_stop()
Fixes: 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafal Ozieblo [Wed, 16 Nov 2016 10:02:34 +0000 (10:02 +0000)]
cadence: Add LSO support.
New Cadence GEM hardware support Large Segment Offload (LSO):
TCP segmentation offload (TSO) as well as UDP fragmentation
offload (UFO). Support for those features was added to the driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 16 Nov 2016 14:10:49 +0000 (15:10 +0100)]
netronome: don't access real_num_rx_queues directly
The netdev->real_num_rx_queues setting is only available if CONFIG_SYSFS
is enabled, so we now get a build failure when that is turned off:
netronome/nfp/nfp_net_common.c: In function 'nfp_net_ring_swap_enable':
netronome/nfp/nfp_net_common.c:2489:18: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?
As far as I can tell, the check here is only used as an optimization that
we can skip in order to fix the compilation. If sysfs is disabled,
the following netif_set_real_num_rx_queues() has no effect.
Fixes: 164d1e9e5d52 ("nfp: add support for ethtool .set_channels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Nov 2016 14:01:47 +0000 (06:01 -0800)]
sfc: remove napi_hash_del() call
Calling napi_hash_del() after netif_napi_del() is pointless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Wed, 16 Nov 2016 09:05:46 +0000 (10:05 +0100)]
lwtunnel: subtract tunnel headroom from mtu on output redirect
This patch changes the lwtunnel_headroom() function which is called
in ipv4_mtu() and ip6_mtu(), to also return the correct headroom
value when the lwtunnel state is OUTPUT_REDIRECT.
This patch enables e.g. SR-IPv6 encapsulations to work without
manually setting the route mtu.
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 16 Nov 2016 08:51:58 +0000 (09:51 +0100)]
mlxsw: spectrum_router: Adjust placement of FIB abort warning
The recent merge commit
bb598c1b8c9b ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") would cause
the FIB abort warning to fire whenever we flush the FIB tables - either
during module removal or actual abort.
Move it back to its rightful location in the FIB abort function.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Wed, 16 Nov 2016 03:26:48 +0000 (04:26 +0100)]
net: dsa: mv88e6xxx: Respect SPEED_UNFORCED, don't set force bit
The SPEED_UNFORCED indicates the MAC & PHY should perform
auto-negotiation to determine a speed which works. If this is called
for, don't set the force bit. If it is set, the MAC actually does
10Gbps, why the internal PHYs don't support.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 18:57:45 +0000 (13:57 -0500)]
Merge branch 'amd-xgbe-next'
Tom Lendacky says:
====================
amd-xgbe: AMD XGBE driver updates 2016-11-15
This patch series addresses some minor issues found in the recently
accepted patch series for the AMD XGBE driver.
The following fixes are included in this driver update series:
- Fix a possibly uninitialized variable in the debugfs support
- Fix the GPIO pin number constraint check
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 15 Nov 2016 22:11:15 +0000 (16:11 -0600)]
amd-xgbe: Fix maximum GPIO value check
The GPIO support in the hardware allows for up to 16 GPIO pins, enumerated
from 0 to 15. The driver uses the wrong value (16) to validate the GPIO
pin range in the routines to set and clear the GPIO output pins. Update
the code to use the correct value (15).
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Tue, 15 Nov 2016 22:11:05 +0000 (16:11 -0600)]
amd-xgbe: Fix possible uninitialized variable
The debugfs support in the driver uses a common routine to write the
debugfs values. In this routine, if the input file position is non-zero
then the write routine will not return an error and an output parameter
will not have been set. Because an error isn't returned an uninitialized
value will be written into a register.
Fix the common write routine to return an error if the input file position
is non-zero, which will propagate back to the caller.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 18:44:02 +0000 (13:44 -0500)]
Merge branch 'nway-reset'
Florian Fainelli says:
====================
net: Implenent ethtool::nway_reset for a few drivers
This patch series depends on "net: phy: Centralize auto-negotation restart"
since it provides phy_ethtool_nway_reset as a helper function.
The drivers here already support PHYLIB, so there really is no reason why
restarting auto-negotiation would not be possible with these.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 19:19:49 +0000 (11:19 -0800)]
net: ethernet: marvell: pxa168_eth: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 19:19:48 +0000 (11:19 -0800)]
net: ethernet: mvpp2: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 19:19:47 +0000 (11:19 -0800)]
net: ethernet: mvneta: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 19:19:46 +0000 (11:19 -0800)]
net: ethoc: Implement ethtool::nway_reset
Implement ethtool::nway_reset using phy_ethtool_nway_reset. We are
already using dev->phydev all over the place so this comes for free.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 15 Nov 2016 19:19:45 +0000 (11:19 -0800)]
net: stmmac: Implement ethtool::nway_reset
Utilize the generic phy_ethtool_nway_reset() helper function to
implement an autonegotiation restart.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 18:40:59 +0000 (13:40 -0500)]
Merge branch 'busypoll-preemption-and-other-optimizations'
Eric Dumazet says:
====================
net: busy-poll: allow preemption and other optimizations
It is time to have preemption points in sk_busy_loop() and improve
its scalability.
Also napi_complete() and friends can tell drivers when it is safe to
not re-enable device interrupts, saving some overhead under
high busy polling.
mlx4 and bnx2x are changed accordingly, to show how this busy polling
status can be exploited by drivers.
Next steps will implement Zach Brown suggestion, where NAPI polling
would be enabled all the time for some chosen queues.
This is needed for efficient epoll() support anyway.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2016 18:15:15 +0000 (10:15 -0800)]
bnx2x: switch to napi_complete_done()
Switch from napi_complete() to napi_complete_done()
for better GRO support (gro_flush_timeout) and core NAPI
features.
Do not rearm interrupts if we are busy polling,
to reduce bus and interrupts overhead.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2016 18:15:14 +0000 (10:15 -0800)]
net/mlx4_en: use napi_complete_done() return value
Do not rearm interrupts if we are busy polling.
mlx4 uses separate CQ for TX and RX, so number of TX interrupts
does not change, unfortunately.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2016 18:15:13 +0000 (10:15 -0800)]
net: busy-poll: return busypolling status to drivers
NAPI drivers use napi_complete_done() or napi_complete() when
they drained RX ring and right before re-enabling device interrupts.
In busy polling, we can avoid interrupts being delivered since
we are polling RX ring in a controlled loop.
Drivers can chose to use napi_complete_done() return value
to reduce interrupts overhead while busy polling is active.
This is optional, legacy drivers should work fine even
if not updated.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2016 18:15:12 +0000 (10:15 -0800)]
net: busy-poll: remove need_resched() from sk_can_busy_loop()
Now sk_busy_loop() can schedule by itself, we can remove
need_resched() check from sk_can_busy_loop()
Also add a const to its struct sock parameter.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2016 18:15:11 +0000 (10:15 -0800)]
net: busy-poll: allow preemption in sk_busy_loop()
After commit
4cd13c21b207 ("softirq: Let ksoftirqd do its job"),
sk_busy_loop() needs a bit of care :
softirqs might be delayed since we do not allow preemption yet.
This patch adds preemptiom points in sk_busy_loop(),
and makes sure no unnecessary cache line dirtying
or atomic operations are done while looping.
A new flag is added into napi->state : NAPI_STATE_IN_BUSY_POLL
This prevents napi_complete_done() from clearing NAPIF_STATE_SCHED,
so that sk_busy_loop() does not have to grab it again.
Similarly, netpoll_poll_lock() is done one time.
This gives about 10 to 20 % improvement in various busy polling
tests, especially when many threads are busy polling in
configurations with large number of NIC queues.
This should allow experimenting with bigger delays without
hurting overall latencies.
Tested:
On a 40Gb mlx4 NIC, 32 RX/TX queues.
echo 70 >/proc/sys/net/core/busy_read
for i in `seq 1 40`; do echo -n $i: ; ./super_netperf $i -H lpaa24 -t UDP_RR -- -N -n; done
Before: After:
1: 90072 92819
2: 157289 184007
3: 235772 213504
4: 344074 357513
5: 394755 458267
6: 461151 487819
7: 549116 625963
8: 544423 716219
9: 720460 738446
10: 794686 837612
11: 915998 923960
12: 937507 925107
13:
1019677 971506
14:
1046831 1113650
15:
1114154 1148902
16:
1105221 1179263
17:
1266552 1299585
18:
1258454 1383817
19:
1341453 1312194
20:
1363557 1488487
21:
1387979 1501004
22:
1417552 1601683
23:
1550049 1642002
24:
1568876 1601915
25:
1560239 1683607
26:
1640207 1745211
27:
1706540 1723574
28:
1638518 1722036
29:
1734309 1757447
30:
1782007 1855436
31:
1724806 1888539
32:
1717716 1944297
33:
1778716 1869118
34:
1805738 1983466
35:
1815694 2020758
36:
1893059 2035632
37:
1843406 2034653
38:
1888830 2086580
39:
1972827 2143567
40:
1877729 2181851
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Adam Belay <abelay@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yuval Mintz <Yuval.Mintz@cavium.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Tue, 15 Nov 2016 19:00:04 +0000 (11:00 -0800)]
bpf: Fix compilation warning in __bpf_lru_list_rotate_inactive
gcc-6.2.1 gives the following warning:
kernel/bpf/bpf_lru_list.c: In function ‘__bpf_lru_list_rotate_inactive.isra.3’:
kernel/bpf/bpf_lru_list.c:201:28: warning: ‘next’ may be used uninitialized in this function [-Wmaybe-uninitialized]
The "next" is currently initialized in the while() loop which must have >=1
iterations.
This patch initializes next to get rid of the compiler warning.
Fixes: 3a08c2fd7634 ("bpf: LRU List")
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 15 Nov 2016 15:14:04 +0000 (16:14 +0100)]
ipv6: sr: add option to control lwtunnel support
This patch adds a new option CONFIG_IPV6_SEG6_LWTUNNEL to enable/disable
support of encapsulation with the lightweight tunnels. When this option
is enabled, CONFIG_LWTUNNEL is automatically selected.
Fix commit
6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Without a proper option to control lwtunnel support for SR-IPv6, if
CONFIG_LWTUNNEL=n then the IPv6 initialization fails as a consequence
of seg6_iptunnel_init() failure with EOPNOTSUPP:
NET: Registered protocol family 10
IPv6: Attempt to unregister permanent protocol 6
IPv6: Attempt to unregister permanent protocol 136
IPv6: Attempt to unregister permanent protocol 17
NET: Unregistered protocol family 10
Tested (compiling, booting, and loading ipv6 module when relevant)
with possible combinations of CONFIG_IPV6={y,m,n},
CONFIG_IPV6_SEG6_LWTUNNEL={y,n} and CONFIG_LWTUNNEL={y,n}.
Reported-by: Lorenzo Colitti <lorenzo@google.com>
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 03:46:31 +0000 (22:46 -0500)]
Merge branch 'alx-multiqueue-support'
Tobias Regnery says:
====================
alx: add multi queue support
This patchset lays the groundwork for multi queue support in the alx driver
and enables multi queue support for the tx path by default. The hardware
supports up to 4 tx queues.
Benefits are better utilization of multi core cpus and the usage of the
msi-x support by default which splits the handling of rx / tx and misc
other interrupts.
The rx path is a little bit harder because apparently (based on the limited
information from the downstream driver) the hardware supports up to 8 rss
queues but only has one hardware descriptor ring on the rx side. So the rx
path will be part of another patchset.
Tested on my AR8161 ethernet adapter with different tests:
- there are no regressions observed during my daily usage
- iperf tcp and udp tests shows no performance regressions
- netperf TCP_RR and UDP_RR shows a slight performance increase of about
1-2% with this patchset applied
This work is based on the downstream driver at github.com/qca/alx
Changes in V2:
- drop unneeded casts in alx_alloc_rx_ring (Patch 1)
- add additional information about testing and benefit to the
changelog
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:16 +0000 (12:43 +0100)]
alx: enable multiple tx queues
Enable multiple tx queues by default based on the number of online cpus. The
hardware supports up to four tx queues.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:15 +0000 (12:43 +0100)]
alx: enable msi-x interrupts by default
Remove the module parameter to enable msi-x support and enable msi-x
interrupts unconditionally by default. This is a preparatory step to enable
multi queue support by default, because this is only working with msi-x
interrupts.
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:14 +0000 (12:43 +0100)]
alx: prepare tx path for multi queue support
This patch prepares the tx path to send data on multiple tx queues. It
introduces per queue register adresses and uses them in the alx_tx_queue
structs.
There are new helper functions for the queue mapping in the tx path.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:13 +0000 (12:43 +0100)]
alx: prepare resource allocation for multi queue support
Allocate, initialise and free alx_tx_queue structs based on the number of
alx_napi structures. Also increase the size of the descriptor memory based
on the number of tx queues in use.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:12 +0000 (12:43 +0100)]
alx: prepare interrupt functions for multiple queues
Extend the interrupt bringup code and the interrupt handler for msi-x
interrupts in order to handle multiple queues.
We must change the poll function because with multiple queues it is possible
that an alx_napi structure has only a tx or only a rx queue pointer.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:11 +0000 (12:43 +0100)]
alx: switch to per queue data structures
Remove the tx and rx queue structures from the alx_priv structure and switch
everything over to the queue pointers in the alx_napi structure.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:10 +0000 (12:43 +0100)]
alx: add ability to allocate and free alx_napi structures
Add new functions to allocate and free the alx_napi structures and use them
in __alx_open and __alx_stop. We only allocate one of these structures for
now, as the rest of the driver is not yet ready for multiple queues.
We switch over the setup of the interrupt mask and the call to netif_napi_add
to the new function because we must adjust these later on a per queue basis.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:09 +0000 (12:43 +0100)]
alx: extend data structures for multi queue support
Extend the driver data structures to be able to handle multiple queues.
Based on the downstream driver at github.com/qca/alx
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Regnery [Tue, 15 Nov 2016 11:43:08 +0000 (12:43 +0100)]
alx: refactor descriptor allocation
Split the allocation of descriptor memory and the buffer allocation into a
tx and rx function. This is in preparation for multiple queues where we
need to iterate over the new functions.
While at it drop the unneeded casting on the rx side.
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 16 Nov 2016 03:34:27 +0000 (22:34 -0500)]
Merge branch 'dpaa_eth-next'
Madalin Bucur says:
====================
dpaa_eth: Add the QorIQ DPAA Ethernet driver
This patch series adds the Ethernet driver for the Freescale
QorIQ Data Path Acceleration Architecture (DPAA).
This version includes changes following the feedback received
on previous versions from Eric Dumazet, Bob Cochran, Joe Perches,
Paul Bolle, Joakim Tjernlund, Scott Wood, David Miller - thank you.
Together with the driver a managed version of alloc_percpu
is provided that simplifies the release of per-CPU memory.
The Freescale DPAA architecture consists in a series of hardware
blocks that support the Ethernet connectivity. The Ethernet driver
depends upon the following drivers that are currently in the Linux
kernel:
- Peripheral Access Memory Unit (PAMU)
drivers/iommu/fsl_*
- Frame Manager (FMan) added in v4.4
drivers/net/ethernet/freescale/fman
- Queue Manager (QMan), Buffer Manager (BMan) added in v4.9-rc1
drivers/soc/fsl/qbman
dpaa_eth interfaces mapping to FMan MACs:
dpaa_eth /eth0\ ... /ethN\
driver | | | |
------------- ---- ----------- ---- -------------
-Ports / Tx Rx \ ... / Tx Rx \
FMan | | | |
-MACs | MAC0 | | MACN |
/ dtsec0 \ ... / dtsecN \ (or tgec)
/ \ / \(or memac)
--------- -------------- --- -------------- ---------
FMan, FMan Port, FMan SP, FMan MURAM drivers
---------------------------------------------------------
FMan HW blocks: MURAM, MACs, Ports, SP
---------------------------------------------------------
dpaa_eth relation to QMan, FMan:
________________________________
dpaa_eth / eth0 \
driver / \
--------- -^- -^- -^- --- ---------
QMan driver / \ / \ / \ \ / | BMan |
|Rx | |Rx | |Tx | |Tx | | driver |
--------- |Dfl| |Err| |Cnf| |FQs| | |
QMan HW |FQ | |FQ | |FQ | | | | |
/ \ / \ / \ \ / | |
--------- --- --- --- -v- ---------
| FMan QMI | |
| FMan HW FMan BMI | BMan HW |
----------------------- --------
where the acronyms used above (and in the code) are:
DPAA = Data Path Acceleration Architecture
FMan = DPAA Frame Manager
QMan = DPAA Queue Manager
BMan = DPAA Buffers Manager
QMI = QMan interface in FMan
BMI = BMan interface in FMan
FMan SP = FMan Storage Profiles
MURAM = Multi-user RAM in FMan
FQ = QMan Frame Queue
Rx Dfl FQ = default reception FQ
Rx Err FQ = Rx error frames FQ
Tx Cnf FQ = Tx confirmation FQ
Tx FQs = transmission frame queues
dtsec = datapath three speed Ethernet controller (10/100/1000 Mbps)
tgec = ten gigabit Ethernet controller (10 Gbps)
memac = multirate Ethernet MAC (10/100/1000/10000)
Changes from v7:
- remove the debug option to use a common buffer pool for all the
interfaces
Changed from v6:
- fixed an issue on an error path in dpaa_set_mac_address()
- removed NDO operation definitions that were not needed
- sorted the local variable declarations
- cleaned up a few checkpatch checks
- removed friendly network interface naming code
Changes from v5:
- adapt to the latest Q/BMan drivers API
- use build_skb() on Rx path instead of buffer pool refill path
- proper support for multiple buffer pools
- align function, variable names, code cleanup
- driver file structure cleanup
Changes from v4:
- addressed feedback from Scott Wood and Joe Perches
- fixed spelling
- fixed leak of uninitialized stack to userspace
- fix prints
- replace raw_cpu_ptr() with this_cpu_ptr()
- remove _s from the end of structure names
- remove underscores at start of functions, goto labels
- remove likely in error paths
- use container_of() instead of open casts
- remove priv from the driver name
- move return type on same line with function name
- drop DPA_READ_SKB_PTR/DPA_WRITE_SKB_PTR
Changes from v3:
- removed bogus delay and comment in .ndo_stop implementation
- addressed minor issues reported by David Miller
Changes from v2:
- removed debugfs, moved exports to ethtool statistics
- removed congestion groups Kconfig params
Changes from v1:
- bpool level Kconfig options removed
- print format using pr_fmt, cleaned up prints
- __hot/__cold removed
- gratuitous unlikely() removed
- code style aligned, consistent spacing for declarations
- comment formatting
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:09 +0000 (10:41 +0200)]
arch/powerpc: Enable dpaa_eth
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:08 +0000 (10:41 +0200)]
arch/powerpc: Enable FSL_FMAN
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:07 +0000 (10:41 +0200)]
arch/powerpc: Enable FSL_PAMU
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:06 +0000 (10:41 +0200)]
dpaa_eth: add trace points
Add trace points on the hot processing path.
Signed-off-by: Ruxandra Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:05 +0000 (10:41 +0200)]
dpaa_eth: add sysfs exports
Export Frame Queue and Buffer Pool IDs through sysfs.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:04 +0000 (10:41 +0200)]
dpaa_eth: add ethtool statistics
Add a series of counters to be exported through ethtool:
- add detailed counters for reception errors;
- add detailed counters for QMan enqueue reject events;
- count the number of fragmented skbs received from the stack;
- count all frames received on the Tx confirmation path;
- add congestion group statistics;
- count the number of interrupts for each CPU.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:03 +0000 (10:41 +0200)]
dpaa_eth: add ethtool functionality
Add support for basic ethtool operations.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 15 Nov 2016 08:41:02 +0000 (10:41 +0200)]
dpaa_eth: add support for DPAA Ethernet
This introduces the Freescale Data Path Acceleration Architecture
(DPAA) Ethernet driver (dpaa_eth) that builds upon the DPAA QMan,
BMan, PAMU and FMan drivers to deliver Ethernet connectivity on
the Freescale DPAA QorIQ platforms.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>