openwrt/staging/blogic.git
7 years agosh_eth: no need for *else* after *goto*
Sergei Shtylyov [Wed, 4 Jan 2017 12:10:50 +0000 (15:10 +0300)]
sh_eth: no need for *else* after *goto*

Well, checkpatch.pl complains about *else* after *return* and *break* but
not after *goto*... and it probably should have complained about the code
in sh_eth_error().  Win couple LoCs by removing that *else*. :-)

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosh_eth: handle only enabled E-MAC interrupts
Sergei Shtylyov [Wed, 4 Jan 2017 12:10:21 +0000 (15:10 +0300)]
sh_eth: handle only enabled E-MAC interrupts

The driver should only handle the enabled E-MAC interrupts, like it does
for the E-DMAC interrupts since commit 3893b27345ac ("sh_eth: workaround
for spurious ECI interrupt"),  so mask ECSR with  ECSIPR when reading it
in sh_eth_error().

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipvlan: assign unique dev-id for each slave device.
Mahesh Bandewar [Tue, 3 Jan 2017 20:47:16 +0000 (12:47 -0800)]
ipvlan: assign unique dev-id for each slave device.

IPvlan setup uses one mac-address (of master). The IPv6 link-local
addresses are derived using the mac-address on the link. Lack of
dev-ids makes these link-local addresses same for all slaves including
that of master device. dev-ids are necessary to add differentiation
when L2 address is shared.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: remove out label in dsa_switch_setup_one
Vivien Didelot [Tue, 3 Jan 2017 19:31:49 +0000 (14:31 -0500)]
net: dsa: remove out label in dsa_switch_setup_one

The "out" label in dsa_switch_setup_one() is useless, thus remove it.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoliquidio: remove PTP support in 23XX adapters
Prasad Kanneganti [Tue, 3 Jan 2017 19:27:33 +0000 (11:27 -0800)]
liquidio: remove PTP support in 23XX adapters

liquidio driver incorrectly indicates that PTP is supported in 23XX
adapters; this patch fixes that.  PTP is supported in 66XX and 68XX
adapters, and the driver correctly indicates that.

Signed-off-by: Prasad Kanneganti <prasad.kanneganti@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Assert at build time the assumptions we make about the CMSG header.
David S. Miller [Wed, 4 Jan 2017 18:24:19 +0000 (13:24 -0500)]
net: Assert at build time the assumptions we make about the CMSG header.

It must always be the case that CMSG_ALIGN(sizeof(hdr)) == sizeof(hdr).

Otherwise there are missing adjustments in the various calculations
that parse and build these things.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoscm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct {compat_}cmsghdr))
yuan linyu [Tue, 3 Jan 2017 12:42:17 +0000 (20:42 +0800)]
scm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct {compat_}cmsghdr))

sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned.
remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and
CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code consistent.

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/sched: cls_matchall: Fix error path
Yotam Gigi [Tue, 3 Jan 2017 17:20:24 +0000 (19:20 +0200)]
net/sched: cls_matchall: Fix error path

Fix several error paths in matchall:
 - Release reference to actions in case the hardware fails offloading
   (relevant to skip_sw only)
 - Fix error path in case tcf_exts initialization/validation fail

Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 3 Jan 2017 21:23:01 +0000 (16:23 -0500)]
Merge branch '10GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2017-01-03

This series contains updates to ixgbe and ixgbevf only.

Emil fixes ixgbe to use the NVM settings for FEC, so do not override the
settings.  Fixed the indirection table for x550, where newer devices can
support up to 64 RSS queues.  Extends the rtnl_lock() to protect the call
to netif_device_detach() and ixgbe_clear_interrupt_scheme() to avoid
against a double free WARN and/or a BUG in free_msi_irqs().  Fixed AER
error handling by making sure that the driver frees the IRQs in
ixgbe_io_error_detected() when responding to a PCIe AER error, and to
restore them when the interface recovers.

Tony updates the driver to report the driver version to the firmware using
the host interface command for x550 devices.  Fixed the PHY reset check
for x550em_ext_t PHY type.  Fixed bounds checking for x540 devices to
ensure the index is valid for the LED function.  Fixed the BaseT adapters
which support 100Mb capability and were not reporting the capability.

Ken Cox adds a missing check for the trusted bit before trying to set the
MACVLAN MAC address.

Yusuke Suzuki fixes an issue with 82599 and x540 devices where receive
timestamps were not working becase the bitwise operation for RX_HWSTAMP
falg was incorrect.

Don ensures that x553 KR/KX devices correctly advertise link speeds.
Adds the mailbox message to allow for VF promiscuous mode support.

Mark fixes two issues with EEPROM access, where the semaphore was not
being held until the entire response was read and the acquiring/releasing
of the semaphore is slow.  Cleaned up firmware version method and
functions which are no longer used.  Added new interfaces for firmware
commands to access some new PHYs.

v2: fixed tab indentation in patch 12 and mis-spelled words in patch 15
    based on feedback from Sergei Shtylyov and Rami Rosen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoixgbe: Add PF support for VF promiscuous mode
Don Skidmore [Fri, 16 Dec 2016 02:18:32 +0000 (21:18 -0500)]
ixgbe: Add PF support for VF promiscuous mode

This patch extends the xcast mailbox message to include support for
unicast promiscuous mode.  To allow a VF to enter this mode the PF
must be in promiscuous mode.

A later patch will add the support needed in the VF driver (ixgbevf)

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbevf: Add support for VF promiscuous mode
Don Skidmore [Fri, 16 Dec 2016 02:18:31 +0000 (21:18 -0500)]
ixgbevf: Add support for VF promiscuous mode

This patch extends the mailbox message to allow for VF promiscuous
mode support.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: Implement support for firmware-controlled PHYs
Mark Rustad [Wed, 14 Dec 2016 19:02:16 +0000 (11:02 -0800)]
ixgbe: Implement support for firmware-controlled PHYs

Implement support for devices that have firmware-controlled PHYs.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: Implement firmware interface to access some PHYs
Mark Rustad [Wed, 14 Dec 2016 19:02:11 +0000 (11:02 -0800)]
ixgbe: Implement firmware interface to access some PHYs

Implement new interface for firmware commands to access some PHYs.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: Remove unused firmware version functions and method
Mark Rustad [Wed, 14 Dec 2016 19:02:06 +0000 (11:02 -0800)]
ixgbe: Remove unused firmware version functions and method

The firmware version method and functions are not used anywhere, so
remove them all.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: Fix issues with EEPROM access
Mark Rustad [Wed, 14 Dec 2016 19:02:00 +0000 (11:02 -0800)]
ixgbe: Fix issues with EEPROM access

There are two problems with EEPROM access. One is that it needs to
hold the semaphore until the entire response is read or else the
response can be corrupted by other firmware accesses. The second
problem is that acquiring and releasing the semaphore is slow, so
it should be taken and released once when multiple EEPROM accesses
will be done.

Both of these issues can be solved by adding a new function,
ixgbe_hic_unlocked, to issue firmware commands that will assume
that the caller has acquired the needed semaphore.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: Configure advertised speeds correctly for KR/KX backplane
Don Skidmore [Wed, 14 Dec 2016 01:34:51 +0000 (20:34 -0500)]
ixgbe: Configure advertised speeds correctly for KR/KX backplane

This patch ensures that the advertised link speeds are configured
for X553 KR/KX backplane.  Without this patch the link remains at
1G when resuming from low power after being downshifted by LPLU.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbevf: restore hw_addr on resume or error
Emil Tantilov [Wed, 23 Nov 2016 19:24:08 +0000 (11:24 -0800)]
ixgbevf: restore hw_addr on resume or error

Restore adapter->hw.hw_addr after handling an error, or a resume
operation to make sure we can access the registers.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: Fix incorrect bitwise operations of PTP Rx timestamp flags
Yusuke Suzuki [Mon, 21 Nov 2016 06:48:45 +0000 (06:48 +0000)]
ixgbe: Fix incorrect bitwise operations of PTP Rx timestamp flags

Rx timestamp does not work on 82599 and X540 because bitwise operation
of RX_HWTSTAMP flags is incorrect and ixgbe_ptp_rx_hwtstamp() is never
called. This patch fixes it to enable Rx timestamp on 82599 and X540.

Without this fix:
ptp4l[278.730]: selected /dev/ptp8 as PTP clock
ptp4l[278.733]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[278.733]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[278.834]: port 1: received SYNC without timestamp
ptp4l[278.835]: port 1: new foreign master 1c3947.fffe.60f9cc-1
ptp4l[279.834]: port 1: received SYNC without timestamp
ptp4l[280.834]: port 1: received SYNC without timestamp
ptp4l[281.834]: port 1: received SYNC without timestamp
ptp4l[282.834]: port 1: received SYNC without timestamp
ptp4l[282.835]: selected best master clock 1c3947.fffe.60f9cc
ptp4l[282.835]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[283.834]: port 1: received SYNC without timestamp

With this fix:
ptp4l[239.154]: selected /dev/ptp8 as PTP clock
ptp4l[239.157]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[239.157]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[240.989]: port 1: new foreign master 1c3947.fffe.60f9cc-1
ptp4l[244.989]: selected best master clock 1c3947.fffe.60f9cc
ptp4l[244.989]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[246.977]: master offset -899583339542096 s0 freq      +0 path delay     16222
ptp4l[247.977]: master offset -899583339617265 s1 freq  -75169 path delay     16177
ptp4l[248.977]: master offset       -130 s2 freq  -75299 path delay     16177
ptp4l[248.977]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[249.977]: master offset         -9 s2 freq  -75217 path delay     16177
ptp4l[250.977]: master offset         88 s2 freq  -75123 path delay     16132

Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Yusuke Suzuki <yus-suzuki@uf.jp.nec.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbevf: fix AER error handling
Emil Tantilov [Wed, 16 Nov 2016 19:25:34 +0000 (11:25 -0800)]
ixgbevf: fix AER error handling

Make sure that we free the IRQs in ixgbevf_io_error_detected() when
responding to an PCIe AER error and also restore them when the
interface recovers from it.

Previously it was possible to trigger BUG_ON() check in free_msix_irqs()
in the case where we call ixgbevf_remove() after a failed recovery from
AER error because the interrupts were not freed.

Also moved the down and free functions into ixgbevf_close_suspend()
same as with ixgbe.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: fix AER error handling
Emil Tantilov [Wed, 16 Nov 2016 17:48:02 +0000 (09:48 -0800)]
ixgbe: fix AER error handling

Make sure that we free the IRQs in ixgbe_io_error_detected() when
responding to an PCIe AER error and also restore them when the
interface recovers from it.

Previously it was possible to trigger BUG_ON() check in free_msix_irqs()
in the case where we call ixgbe_remove() after a failed recovery from
AER error because the interrupts were not freed.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: test for trust in macvlan adjustments for VF
Ken Cox [Tue, 15 Nov 2016 19:00:37 +0000 (13:00 -0600)]
ixgbe: test for trust in macvlan adjustments for VF

There are two methods for setting mac addresses in a Macvlan, that
differentiate themselves in the function macvlan_set_mac_Address.
If the macvlan mode is passthru, then we use the dev_set_mac_address
method, otherwise we use the dev_uc api via macvlan_sync_addresses.
The latter method (which would stem from using any non-passthru mode,
like bridge, or vepa), calls down into the driver in a path that terminates
in ixgbevf_set_uc_addr_vf, which sends a IXGBE_VF_SET_MACVLAN message,
which causes the pf to spawn the noted error message.  This occurs because
it appears that the guest is trying to delete the mac address of the macvlan
before adding another.

The other path in macvlan_set_mac_address uses dev_set_mac_address, which
calls into ixgbevf_set_mac which uses the IXGBE_VF_SET_MAC_ADDR to the
pf to set the macvlan mac address.

The discrepancy here is in the handlers.  The handler function for
IXGBE_VF_SET_MAC_ADDR (ixgbe_set_vf_mac_addr) has a check for
the vfinfo[].trusted bit to allow the operation if the vf is trusted.
In comparison, the IXGBE_VF_SET_MACVLAN message handler
(ixgbe_set_vf_macvlan_msg) has no such check of the trusted bit.

Signed-off-by: Ken Cox <jkc@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbevf: handle race between close and suspend on shutdown
Emil Tantilov [Fri, 11 Nov 2016 18:12:51 +0000 (10:12 -0800)]
ixgbevf: handle race between close and suspend on shutdown

When an interface is part of a namespace it is possible that
ixgbevf_close() may be called while ixgbevf_suspend() is running
which ends up in a double free WARN and/or a BUG in free_msi_irqs()

To handle this situation we extend the rtnl_lock() to protect the
call to netif_device_detach() and check for !netif_device_present()
to avoid entering close while in suspend.

Also added rtnl locks to ixgbevf_queue_reset_subtask().

CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: handle close/suspend race with netif_device_detach/present
Emil Tantilov [Fri, 11 Nov 2016 18:07:47 +0000 (10:07 -0800)]
ixgbe: handle close/suspend race with netif_device_detach/present

When an interface is part of a namespace it is possible that
ixgbe_close() may be called while __ixgbe_shutdown() is running
which ends up in a double free WARN and/or a BUG in free_msi_irqs().

To handle this situation we extend the rtnl_lock() to protect the
call to netif_device_detach() and ixgbe_clear_interrupt_scheme()
in __ixgbe_shutdown() and check for netif_device_present()
to avoid clearing the interrupts second time in ixgbe_close();

Also extend the rtnl lock in ixgbe_resume() to netif_device_attach().

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: Fix reporting of 100Mb capability
Tony Nguyen [Fri, 11 Nov 2016 00:00:33 +0000 (16:00 -0800)]
ixgbe: Fix reporting of 100Mb capability

BaseT adapters that are capable of supporting 100Mb are not reporting this
capability.  This patch corrects the reporting so that 100Mb is shown as
supported on those adapters.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: Reduce I2C retry count on X550 devices
Tony Nguyen [Thu, 10 Nov 2016 17:57:29 +0000 (09:57 -0800)]
ixgbe: Reduce I2C retry count on X550 devices

A retry count of 10 is likely to run into problems on X550 devices that
have to detect and reset unresponsive CS4227 devices. So, reduce the I2C
retry count to 3 for X550 and above. This should avoid any possible
regressions in existing devices.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: Add bounds check for x540 LED functions
Tony Nguyen [Wed, 9 Nov 2016 18:48:48 +0000 (10:48 -0800)]
ixgbe: Add bounds check for x540 LED functions

This is an extension of commit 003287e0f087 ("ixgbevf: Correct parameter
sent to LED function"); add bounds checking to x540 functions to ensure the
index is valid.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: add mask for 64 RSS queues
Emil Tantilov [Fri, 4 Nov 2016 21:03:03 +0000 (14:03 -0700)]
ixgbe: add mask for 64 RSS queues

The indirection table was reported incorrectly for X550 and newer
where we can support up to 64 RSS queues.

Reported-by Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: Fix check for ixgbe_phy_x550em_ext_t reset
Tony Nguyen [Mon, 31 Oct 2016 19:11:58 +0000 (12:11 -0700)]
ixgbe: Fix check for ixgbe_phy_x550em_ext_t reset

The generic PHY reset check we had previously is not sufficient for the
ixgbe_phy_x550em_ext_t PHY type.  Check 1.CC02.0 instead - same as
ixgbe_init_ext_t_x550().

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: Report driver version to firmware for x550 devices
Tony Nguyen [Wed, 26 Oct 2016 23:25:18 +0000 (16:25 -0700)]
ixgbe: Report driver version to firmware for x550 devices

Some x550 devices require the driver version reported to its firmware; this
patch sends the driver version string to the firmware through the host
interface command for x550 devices.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoixgbe: do not disable FEC from the driver
Emil Tantilov [Wed, 28 Sep 2016 23:01:48 +0000 (16:01 -0700)]
ixgbe: do not disable FEC from the driver

FEC is configured by the NVM and the driver should not be
overriding it.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoMerge branch 'tipc-link-starvation'
David S. Miller [Tue, 3 Jan 2017 16:13:06 +0000 (11:13 -0500)]
Merge branch 'tipc-link-starvation'

Jon Maloy says:

====================
tipc: improve interaction socket-link

We fix a very real starvation problem that may occur when a link
encounters send buffer congestion. At the same time we make the
interaction between the socket and link layer simpler and more
consistent.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: reduce risk of user starvation during link congestion
Jon Paul Maloy [Tue, 3 Jan 2017 15:55:11 +0000 (10:55 -0500)]
tipc: reduce risk of user starvation during link congestion

The socket code currently handles link congestion by either blocking
and trying to send again when the congestion has abated, or just
returning to the user with -EAGAIN and let him re-try later.

This mechanism is prone to starvation, because the wakeup algorithm is
non-atomic. During the time the link issues a wakeup signal, until the
socket wakes up and re-attempts sending, other senders may have come
in between and occupied the free buffer space in the link. This in turn
may lead to a socket having to make many send attempts before it is
successful. In extremely loaded systems we have observed latency times
of several seconds before a low-priority socket is able to send out a
message.

In this commit, we simplify this mechanism and reduce the risk of the
described scenario happening. When a message is attempted sent via a
congested link, we now let it be added to the link's backlog queue
anyway, thus permitting an oversubscription of one message per source
socket. We still create a wakeup item and return an error code, hence
instructing the sender to block or stop sending. Only when enough space
has been freed up in the link's backlog queue do we issue a wakeup event
that allows the sender to continue with the next message, if any.

The fact that a socket now can consider a message sent even when the
link returns a congestion code means that the sending socket code can
be simplified. Also, since this is a good opportunity to get rid of the
obsolete 'mtu change' condition in the three socket send functions, we
now choose to refactor those functions completely.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: modify struct tipc_plist to be more versatile
Jon Paul Maloy [Tue, 3 Jan 2017 15:55:10 +0000 (10:55 -0500)]
tipc: modify struct tipc_plist to be more versatile

During multicast reception we currently use a simple linked list with
push/pop semantics to store port numbers.

We now see a need for a more generic list for storing values of type
u32. We therefore make some modifications to this list, while replacing
the prefix 'tipc_plist_' with 'u32_'. We also add a couple of new
functions which will come to use in the next commits.

Acked-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions
Jon Paul Maloy [Tue, 3 Jan 2017 15:55:09 +0000 (10:55 -0500)]
tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions

The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very
similar. The latter function is also called from two locations, and
there will be more in the coming commits, which will all need to test on
different conditions.

Instead of making yet another duplicates of the function, we now
introduce a new macro tipc_wait_for_cond() where the wakeup condition
can be stated as an argument to the call. This macro replaces all
current and future uses of the two functions, which can now be
eliminated.

Acked-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'TPACKET_V3-TX_RING-support'
David S. Miller [Tue, 3 Jan 2017 16:00:28 +0000 (11:00 -0500)]
Merge branch 'TPACKET_V3-TX_RING-support'

Sowmini Varadhan says:

====================
TPACKET_V3 TX_RING support

This patch series allows an application to use a single PF_PACKET
descriptor and leverage the best implementations of TX_RING
and RX_RING that exist today.

Patch 1 adds the kernel/Documentation changes for TX_RING
support and patch2 adds the associated test case in selftests.

Changes since v2: additional sanity checks for setsockopt
input for TX_RING/TPACKET_V3. Refactored psock_tpacket.c
test code to avoid code duplication from V2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotools: test case for TPACKET_V3/TX_RING support
Sowmini Varadhan [Tue, 3 Jan 2017 14:31:48 +0000 (06:31 -0800)]
tools: test case for TPACKET_V3/TX_RING support

Add a test case and sample code for (TPACKET_V3, PACKET_TX_RING)

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoaf_packet: TX_RING support for TPACKET_V3
Sowmini Varadhan [Tue, 3 Jan 2017 14:31:47 +0000 (06:31 -0800)]
af_packet: TX_RING support for TPACKET_V3

Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
does not currently have TX_RING support. As a result an application
that wants the best perf for Tx and Rx (e.g. to handle request/response
transacations) ends up needing 2 sockets, one with *_v2 for Tx and
another with *_v3 for Rx.

This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
so that an application can use a single descriptor to get the benefits
of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
first filling up multiple frames in the Tx ring and then triggering a
transmit. This patch only support fixed size Tx frames for TPACKET_V3,
and requires that tp_next_offset must be zero.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosfc-falcon: declare module version (same as ethtool drvinfo version)
Edward Cree [Tue, 3 Jan 2017 15:46:15 +0000 (15:46 +0000)]
sfc-falcon: declare module version (same as ethtool drvinfo version)

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosfc: declare module version (same as ethtool drvinfo version)
Edward Cree [Tue, 3 Jan 2017 15:46:00 +0000 (15:46 +0000)]
sfc: declare module version (same as ethtool drvinfo version)

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipmr, ip6mr: add RTNH_F_UNRESOLVED flag to unresolved cache entries
Nikolay Aleksandrov [Tue, 3 Jan 2017 11:13:39 +0000 (12:13 +0100)]
ipmr, ip6mr: add RTNH_F_UNRESOLVED flag to unresolved cache entries

While working with ipmr, we noticed that it is impossible to determine
if an entry is actually unresolved or its IIF interface has disappeared
(e.g. virtual interface got deleted). These entries look almost
identical to user-space when dumping or receiving notifications. So in
order to recognize them add a new RTNH_F_UNRESOLVED flag which is set when
sending an unresolved cache entry to user-space.

Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodsa:mv88e6xxx: allow address 0x1 in smi_init
Volodymyr Bendiuga [Tue, 3 Jan 2017 09:49:09 +0000 (10:49 +0100)]
dsa:mv88e6xxx: allow address 0x1 in smi_init

Some devices, such as the mv88e6097 do have ADDR[0] external and so it
is possible to configure the device to use SMI address 0x1. Remove the
restriction, as there are boards using this address.

Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: freescale: dpaa: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Mon, 2 Jan 2017 22:49:15 +0000 (23:49 +0100)]
net: freescale: dpaa: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'for_4.11/net-next/rds_v3' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Tue, 3 Jan 2017 14:45:44 +0000 (09:45 -0500)]
Merge branch 'for_4.11/net-next/rds_v3' of git://git./linux/kernel/git/ssantosh/linux

Santosh Shilimkar says:

====================
net: RDS updates

v2->v3:
- Re-based against latest net-next head.
- Dropped a user visible change after discussing with David Miller.
  It needs some more work to fully support old/new tools matrix.
- Addressed Dave's comment about bool usage in patch
  "RDS: IB: track and log active side..."

v1->v2:
Re-aligned indentation in patch 'RDS: mark few internal functions..."

Series consist of:
 - RDMA transport fixes for map failure, listen sequence, handler panic and
   composite message notification.
 - Couple of sparse fixes.
 - Message logging improvements for bind failure, use once mr semantics
   and connection remote address, active end point.
 - Performance improvement for RDMA transport by reducing the post send
   pressure on the queue and spreading the CQ vectors.
 - Useful statistics for socket send/recv usage and receive cache usage.
 - Additional RDS CMSG used by application to track the RDS message
   stages for certain type of traffic to find out latency spots.
   Can be enabled/disabled per socket.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoRDS: add receive message trace used by application
Santosh Shilimkar [Tue, 5 Jul 2016 05:35:15 +0000 (22:35 -0700)]
RDS: add receive message trace used by application

Socket option to tap receive path latency in various stages
in nano seconds. It can be enabled on selective sockets using
using SO_RDS_MSG_RXPATH_LATENCY socket option. RDS will return
the data to application with RDS_CMSG_RXPATH_LATENCY in defined
format. Scope is left to add more trace points for future
without need of change in the interface.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: make message size limit compliant with spec
Avinash Repaka [Mon, 29 Feb 2016 23:30:57 +0000 (15:30 -0800)]
RDS: make message size limit compliant with spec

RDS support max message size as 1M but the code doesn't check this
in all cases. Patch fixes it for RDMA & non-RDMA and RDS MR size
and its enforced irrespective of underlying transport.

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: add stat for socket recv memory usage
Venkat Venkatsubra [Sun, 10 Jul 2016 00:36:20 +0000 (17:36 -0700)]
RDS: add stat for socket recv memory usage

Tracks the receive side memory added to scokets and removed from sockets.

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: IB: fix panic due to handlers running post teardown
Santosh Shilimkar [Thu, 29 Sep 2016 18:07:11 +0000 (11:07 -0700)]
RDS: IB: fix panic due to handlers running post teardown

Shutdown code reaping loop takes care of emptying the
CQ's before they being destroyed. And once tasklets are
killed, the hanlders are not expected to run.

But because of core tasklet code issues, tasklet handler could
still run even after tasklet_kill,
RDS IB shutdown code already reaps the CQs before freeing
cq/qp resources so as such the handlers have nothing left
to do post shutdown.

On other hand any handler running after teardown and trying
to access already freed qp/cq resources causes issues
Patch fixes this race by  makes sure that handlers returns
without any action post teardown.

Reviewed-by: Wengang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: RDMA: Fix the composite message user notification
Santosh Shilimkar [Fri, 19 Feb 2016 04:06:47 +0000 (20:06 -0800)]
RDS: RDMA: Fix the composite message user notification

When application sends an RDS RDMA composite message consist of
RDMA transfer to be followed up by non RDMA payload, it expect to
be notified *only* when the full message gets delivered. RDS RDMA
notification doesn't behave this way though.

Thanks to Venkat for debug and root casuing the issue
where only first part of the message(RDMA) was
successfully delivered but remainder payload delivery failed.
In that case, application should not be notified with
a false positive of message delivery success.

Fix this case by making sure the user gets notified only after
the full message delivery.

Reviewed-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: IB: Add vector spreading for cqs
Santosh Shilimkar [Mon, 4 Jul 2016 23:16:36 +0000 (16:16 -0700)]
RDS: IB: Add vector spreading for cqs

Based on available device vectors, allocate cqs accordingly to
get better spread of completion vectors which helps performace
great deal..

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: IB: add few useful cache stasts
Santosh Shilimkar [Sun, 10 Jul 2016 00:14:02 +0000 (17:14 -0700)]
RDS: IB: add few useful cache stasts

Tracks the ib receive cache total, incoming and frag allocations.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: IB: track and log active side endpoint in connection
Santosh Shilimkar [Sun, 10 Jul 2016 01:31:38 +0000 (18:31 -0700)]
RDS: IB: track and log active side endpoint in connection

Useful to know the active and passive end points in a
RDS IB connection.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: RDMA: silence the use_once mr log flood
Santosh Shilimkar [Mon, 4 Jul 2016 02:14:10 +0000 (19:14 -0700)]
RDS: RDMA: silence the use_once mr log flood

In absence of extension headers, message log will keep
flooding the console. As such even without use_once we can
clean up the MRs so its not really an error case message
so make it debug message

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: IB: split the mr registration and invalidation path
Santosh Shilimkar [Tue, 8 Mar 2016 17:19:01 +0000 (09:19 -0800)]
RDS: IB: split the mr registration and invalidation path

MR invalidation in RDS is done in background thread and not in
data path like registration. So break the dependency between them
which helps to remove the performance bottleneck.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: RDMA: return appropriate error on rdma map failures
Santosh Shilimkar [Tue, 5 Jul 2016 00:04:37 +0000 (17:04 -0700)]
RDS: RDMA: return appropriate error on rdma map failures

The first message to a remote node should prompt a new
connection even if it is RDMA operation. For RDMA operation
the MR mapping can fail because connections is not yet up.

Since the connection establishment is asynchronous,
we make sure the map failure because of unavailable
connection reach to the user by appropriate error code.
Before returning to the user, lets trigger the connection
so that its ready for the next retry.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: RDMA: start rdma listening after init
Qing Huang [Mon, 4 Jul 2016 23:29:13 +0000 (16:29 -0700)]
RDS: RDMA: start rdma listening after init

This prevents RDS from handling incoming rdma packets before RDS
completes initializing its recv/send components.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: RDMA: fix the ib_map_mr_sg_zbva() argument
Santosh Shilimkar [Mon, 5 Dec 2016 00:25:43 +0000 (16:25 -0800)]
RDS: RDMA: fix the ib_map_mr_sg_zbva() argument

Fixes warning: Using plain integer as NULL pointer

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: IB: make the transport retry count smallest
Santosh Shilimkar [Mon, 4 Jul 2016 22:31:21 +0000 (15:31 -0700)]
RDS: IB: make the transport retry count smallest

Transport retry is not much useful since it indicate packet loss
in fabric so its better to failover fast rather than longer retry.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: IB: include faddr in connection log
Santosh Shilimkar [Mon, 14 Mar 2016 14:43:55 +0000 (07:43 -0700)]
RDS: IB: include faddr in connection log

Also use pr_* for it.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: mark few internal functions static to make sparse build happy
Santosh Shilimkar [Mon, 5 Dec 2016 00:41:29 +0000 (16:41 -0800)]
RDS: mark few internal functions static to make sparse build happy

Fixes below warnings:
warning: symbol 'rds_send_probe' was not declared. Should it be static?
warning: symbol 'rds_send_ping' was not declared. Should it be static?
warning: symbol 'rds_tcp_accept_one_path' was not declared. Should it be static?
warning: symbol 'rds_walk_conn_path_info' was not declared. Should it be static?

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agoRDS: log the address on bind failure
Santosh Shilimkar [Wed, 4 Nov 2015 21:42:39 +0000 (13:42 -0800)]
RDS: log the address on bind failure

It's useful to know the IP address when RDS fails to bind a
connection. Thus, adding it to the error message.

Orabug: 21894138
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
7 years agonet: fealnx: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Mon, 2 Jan 2017 19:47:27 +0000 (20:47 +0100)]
net: fealnx: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: faraday: ftmac100: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Mon, 2 Jan 2017 18:53:11 +0000 (19:53 +0100)]
net: faraday: ftmac100: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: emulex: benet: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Mon, 2 Jan 2017 16:42:14 +0000 (17:42 +0100)]
net: emulex: benet: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dlink: sundance: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Sun, 1 Jan 2017 19:52:12 +0000 (20:52 +0100)]
net: dlink: sundance: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dlink: dl2k: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Sun, 1 Jan 2017 19:49:26 +0000 (20:49 +0100)]
net: dlink: dl2k: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

The previous implementation of set_settings was modifying
the value of speed and duplex, but with the new API, it's not
possible. The structure ethtool_link_ksettings is defined
as const.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dec: winbond-840: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Sun, 1 Jan 2017 19:47:01 +0000 (20:47 +0100)]
net: dec: winbond-840: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dec: uli526x: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Sun, 1 Jan 2017 18:11:06 +0000 (19:11 +0100)]
net: dec: uli526x: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dec: de2104x: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Sun, 1 Jan 2017 18:05:38 +0000 (19:05 +0100)]
net: dec: de2104x: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sfc: falcon: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Sun, 1 Jan 2017 18:02:46 +0000 (19:02 +0100)]
net: sfc: falcon: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mdio: add mdio45_ethtool_ksettings_get
Philippe Reynes [Sun, 1 Jan 2017 18:02:45 +0000 (19:02 +0100)]
net: mdio: add mdio45_ethtool_ksettings_get

There is a function in mdio for the old ethtool api gset.
We add a new function mdio45_ethtool_ksettings_get for the
new ethtool api glinksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlx5-odp'
David S. Miller [Mon, 2 Jan 2017 20:51:21 +0000 (15:51 -0500)]
Merge branch 'mlx5-odp'

Saeed Mahameed says:

====================
Mellanox mlx5 core and ODP updates 2017-01-01

The following eleven patches mainly come from Artemy Kovalyov
who expanded mlx5 on-demand-paging (ODP) support. In addition
there are three cleanup patches which don't change any functionality,
but are needed to align codebase prior accepting other patches.

Memory region (MR) in IB can be huge and ODP (on-demand paging)
technique allows to use unpinned memory, which can be consumed and
released on demand. This allows to applications do not pin down
the underlying physical pages of the address space, and save from them
need to track the validity of the mappings.

Rather, the HCA requests the latest translations from the OS when pages
are not present, and the OS invalidates translations which are no longer
valid due to either non-present pages or mapping changes.

In existing ODP implementation applications is needed to register
memory buffers for communication, though registered memory regions
need not have valid mappings at registration time.

This patch set performs the following steps to expand
current ODP implementation:

1. It refactors UMR to support large regions, by introducing generic
   function to perform HCA translation table modifications. This
   function supports both atomic and process contexts and is not limited
   by number of modified entries.

   This function allows to enable reallocated memory regions of
   arbitrary size, so adding MR cache buckets to support up to 16GB MRs.

2. It changes page fault event format and refactor page faults logic
   together with addition of atomic support.

3. It prepares mlx5 core code to support implicit registration with
   simplified and relaxed semantics.

   Implicit ODP semantics allows to applications provide special memory
   key that represents their complete address space. Thus all IO accesses
   referencing to this key (with proper access rights associated with the key)
   wouldn't need not register any virtual address range.

Thanks,
        Artemy, Ilya and Leon

v1->v2:
  - Don't use 'inline' in .c files
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoIB/mlx5: Improve MR check
Artemy Kovalyov [Mon, 2 Jan 2017 09:37:48 +0000 (11:37 +0200)]
IB/mlx5: Improve MR check

Add "type" field to mlx5_core MKEY struct.
Check whether page fault happens on MKEY corresponding to MR.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoIB/mlx5: Add ODP atomics support
Artemy Kovalyov [Mon, 2 Jan 2017 09:37:47 +0000 (11:37 +0200)]
IB/mlx5: Add ODP atomics support

Handle ODP atomic operations. When initiator of RDMA atomic
operation use ODP MR to provide source data handle pagefault properly.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years ago{net,IB}/mlx5: Refactor page fault handling
Artemy Kovalyov [Mon, 2 Jan 2017 09:37:46 +0000 (11:37 +0200)]
{net,IB}/mlx5: Refactor page fault handling

* Update page fault event according to last specification.
* Separate code path for page fault EQ, completion EQ and async EQ.
* Move page fault handling work queue from mlx5_ib static variable
  into mlx5_core page fault EQ.
* Allocate memory to store ODP event dynamically as the
  events arrive, since in atomic context - use mempool.
* Make mlx5_ib page fault handler run in process context.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5: Update PAGE_FAULT_RESUME layout
Artemy Kovalyov [Mon, 2 Jan 2017 09:37:45 +0000 (11:37 +0200)]
net/mlx5: Update PAGE_FAULT_RESUME layout

Update PAGE_FAULT_RESUME command layout.

Three bit fields describing page fault: rdma, rdma_write, req_res gave 8
possible combinations, while only a few were legal. Now they
are interpreted as three-bit type field, where former legal
combinations turns into corresponding types and unused were added as new
types.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoIB/mlx5: Add MR cache for large UMR regions
Artemy Kovalyov [Mon, 2 Jan 2017 09:37:44 +0000 (11:37 +0200)]
IB/mlx5: Add MR cache for large UMR regions

In this change we turn mlx5_ib_update_mtt() into generic
mlx5_ib_update_xlt() to perfrom HCA translation table modifiactions
supporting both atomic and process contexts and not limited by number
of modified entries.
Using this function we increase preallocated MRs up to 16GB.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoIB/mlx5: Add support for big MRs
Artemy Kovalyov [Mon, 2 Jan 2017 09:37:43 +0000 (11:37 +0200)]
IB/mlx5: Add support for big MRs

Make use of extended UMR translation offset.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoIB/mlx5: Refactor UMR post send format
Artemy Kovalyov [Mon, 2 Jan 2017 09:37:42 +0000 (11:37 +0200)]
IB/mlx5: Refactor UMR post send format

* Update struct mlx5_wqe_umr_ctrl_seg.
* Currenlty UMR send_flags aim only certain use cases: enabled/disable
  cached MR, modifying XLT for ODP. By making flags independent make UMR
  more flexible allowing arbitrary manipulations.
* Since different UMR formats have different entry sizes UMR request
  should receive exact size of translation table update instead of
  number of entries. Rename field npages to xlt_size in struct mlx5_umr_wr
  and update relevant code accordingly.
* Add support of length64 bit.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5: Support new MR features
Artemy Kovalyov [Mon, 2 Jan 2017 09:37:41 +0000 (11:37 +0200)]
net/mlx5: Support new MR features

This patch adds the following items to IFC file.

1. MLX5_MKC_ACCESS_MODE_KSM enum value for creating KSM memory keys.
KSM access mode used when indirect MKey associated with fixed memory
size entries.

2. null_mkey field that is used to indicate non-present KLM/KSM
entries, where it causes the device to generate page fault event
when trying to access it.

3. struct mlx5_ifc_cmd_hca_cap_bits capability bits indicating
related value/field is supported:
* fixed_buffer_size - MLX5_MKC_ACCESS_MODE_KSM
* umr_extended_translation_offset - translation_offset_42_16
    in UMR ctrl segment
* null_mkey - null_mkey in QUERY_SPECIAL_CONTEXTS

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoIB/mlx5: Add helper mlx5_ib_post_send_wait
Binoy Jayan [Mon, 2 Jan 2017 09:37:40 +0000 (11:37 +0200)]
IB/mlx5: Add helper mlx5_ib_post_send_wait

Clean up the following common code (to post a list of work requests to the
send queue of the specified QP) at various places and add a helper function
'mlx5_ib_post_send_wait' to implement the same.

 - Initialize 'mlx5_ib_umr_context' on stack
 - Assign "mlx5_umr_wr:wr:wr_cqe to umr_context.cqe
 - Acquire the semaphore
 - call ib_post_send with a single ib_send_wr
 - wait_for_completion()
 - Check for umr_context.status
 - Release the semaphore

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoIB/mlx5: Reorder code in query device command
Leon Romanovsky [Mon, 2 Jan 2017 09:37:39 +0000 (11:37 +0200)]
IB/mlx5: Reorder code in query device command

The order of features exposed by private mlx5-abi.h
file is CQE zipping, packet pacing and multi-packet WQE.

The internal order implemented in mlx5_ib_query_device() is
multi-packet WQE, CQE zipping and packet pacing.

Such difference hurts code readability, so let's sync,
while mlx5-abi.h (exposed to userspace) is the primary
order.

This commit doesn't change any functionality.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx5: Fix offset naming for reserved fields in hca_cap_bits
Max Gurtovoy [Mon, 2 Jan 2017 09:37:38 +0000 (11:37 +0200)]
net/mlx5: Fix offset naming for reserved fields in hca_cap_bits

Fix offset for reserved fields.

Fixes: 7486216b3a0b ("{net,IB}/mlx5: mlx5_ifc updates")
Fixes: b4ff3a36d3e4 ("net/mlx5: Use offset based reserved field names in the IFC header file")
Fixes: 7d5e14237a55 ("net/mlx5: Update mlx5_ifc hardware features")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'wireless-drivers-next-for-davem-2017-01-02' of git://git.kernel.org/pub...
David S. Miller [Mon, 2 Jan 2017 20:23:34 +0000 (15:23 -0500)]
Merge tag 'wireless-drivers-next-for-davem-2017-01-02' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.11

The most notable change here is the inclusion of airtime fairness
scheduling to ath9k. It prevents slow clients from hogging all the
airtime and unfairly slowing down faster clients.

Otherwise smaller changes and cleanup.

Major changes:

ath9k

* cleanup eeprom endian handling
* add airtime fairness scheduling

ath10k

* fix issues for new QCA9377 firmware version
* support dev_coredump() for firmware crash dump
* enable channel 169 on 5 GHz band
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: stmmac: remove unused duplicate property snps,axi_all
Niklas Cassel [Fri, 30 Dec 2016 12:56:46 +0000 (13:56 +0100)]
net: stmmac: remove unused duplicate property snps,axi_all

For core revision 3.x Address-Aligned Beats is available in two registers.
The DT property snps,aal was created for AAL in the DMA bus register,
which is a read/write bit.
The DT property snps,axi_all was created for AXI_AAL in the AXI bus mode
register, which is a read only bit that reflects the value of AAL in the
DMA bus register.

Since the value of snps,axi_all is never used in the driver,
and since the property was created for a bit that is read only,
it should be safe to remove the property.

Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'qed-driver-updates'
David S. Miller [Mon, 2 Jan 2017 02:02:21 +0000 (21:02 -0500)]
Merge branch 'qed-driver-updates'

Yuval Mintz says:

====================
qed*: Driver updates

The more interesting changes in this series include:
  - Restructuring of the qede files - qede_main.c has grown big and this
    series splits it into 3 parts [patches #2 and #3].
  - Some significant changes in the API through which RSS indirection
    table gets configured [#8].
  - Support for ndo_set_vf_trust() [#9] which would regulate which VFs
    are allowed to use promisc/multi-promisc mode.

It also contains various minor changes to qed/qede, as well as
non-functional changes [#1, #12] to complement other changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed*: Advance driver versions to 8.10.10.20.
Mintz, Yuval [Sun, 1 Jan 2017 11:57:11 +0000 (13:57 +0200)]
qed*: Advance driver versions to 8.10.10.20.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Conserve RDMA resources when !QEDR
Ram Amrani [Sun, 1 Jan 2017 11:57:10 +0000 (13:57 +0200)]
qed: Conserve RDMA resources when !QEDR

If qedr isn't part of the kernel then don't allocate RDMA resources
for it in qed.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed: Support Multicast on Tx-switching
Mintz, Yuval [Sun, 1 Jan 2017 11:57:09 +0000 (13:57 +0200)]
qed: Support Multicast on Tx-switching

Currently multicast traffic wouldn't be routed internally to
listener; Instead it would only be sent to network via the
physical carrier.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed*: Add support for ndo_set_vf_trust
Mintz, Yuval [Sun, 1 Jan 2017 11:57:08 +0000 (13:57 +0200)]
qed*: Add support for ndo_set_vf_trust

Trusted VFs would be allowed to receive promiscuous and
multicast promiscuous data.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed*: RSS indirection based on queue-handles
Mintz, Yuval [Sun, 1 Jan 2017 11:57:07 +0000 (13:57 +0200)]
qed*: RSS indirection based on queue-handles

A step toward having qede agnostic to the queue configurations
in firmware/hardware - let the RSS indirections use queue handles
instead of actual queue indices.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede: Remove unnecessary datapath dereference
Mintz, Yuval [Sun, 1 Jan 2017 11:57:06 +0000 (13:57 +0200)]
qede: Remove unnecessary datapath dereference

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede - mark SKB as encapsulated
Manish Chopra [Sun, 1 Jan 2017 11:57:05 +0000 (13:57 +0200)]
qede - mark SKB as encapsulated

When driver receives a recognized encapsulated packet it needs
to set the skb->encapsulation field as well.

Signed-off-by: Manish Chopra <Manish.Chopra@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede: Postpone reallocation until NAPI end
Mintz, Yuval [Sun, 1 Jan 2017 11:57:04 +0000 (13:57 +0200)]
qede: Postpone reallocation until NAPI end

During Rx flow driver allocates a replacement buffer each time
it consumes an Rx buffer. Failing to do so, it would consume the
currently processed buffer and re-post it on the ring.
As a result, the Rx ring is always completely full [from driver POV].

We now allow the Rx ring to shorten by doing the re-allocations
at the end of the NAPI run. The only limitation is that we still want to
make sure each time we reallocate that we'd still have sufficient
elements in the Rx ring to guarantee that FW would be able to post
additional data and trigger an interrupt.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed*: Change maximal number of queues
Mintz, Yuval [Sun, 1 Jan 2017 11:57:03 +0000 (13:57 +0200)]
qed*: Change maximal number of queues

Today qede requests contexts that would suffice for 64 'whole'
combined queues [192 meant for 64 rx, tx and xdp tx queues],
but registers netdev and limits the number of queues based on
information received by qed. In turn, qed doesn't take context
into account when informing qede how many queues it can support.

This would lead to a configuration problem in case user tries
configuring >64 combined queues to interface [or >96 in case
xdp isn't enabled]. Since we don't have a mangement firware
that actually provides so many interrupt lines to a single
device we're currently safe but that's about to change soon.

The new maximum is hence changed:
  - For RoCE devices, the limit would remain 64.
  - For non-RoCE devices, the limit might be higher [depending
    on the actual configuration of the device].
qed would start enforcing that limit in both scenarios.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede: Split filtering logic to its own file
Mintz, Yuval [Sun, 1 Jan 2017 11:57:02 +0000 (13:57 +0200)]
qede: Split filtering logic to its own file

This takes the various filtering logic of the driver and
moves them into their own dedicated file - qede_filter.c.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqede: Break datapath logic into its own file
Mintz, Yuval [Sun, 1 Jan 2017 11:57:01 +0000 (13:57 +0200)]
qede: Break datapath logic into its own file

This adds a new file qede_fp.c and relocates the datapath-related
logic into it [from qede_main.c].

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoqed*: Update to dual-license
Mintz, Yuval [Sun, 1 Jan 2017 11:57:00 +0000 (13:57 +0200)]
qed*: Update to dual-license

Since the submission of the qedr driver, there's inconsistency
in the licensing of the various qed/qede files - some are GPLv2
and some are dual-license.
Since qedr requires dual-license and it's dependent on both,
we're updating the licensing of all qed/qede source files.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortlwifi: fix spelling mistake: "encrypiton" -> "encryption"
Colin Ian King [Fri, 30 Dec 2016 14:50:27 +0000 (14:50 +0000)]
rtlwifi: fix spelling mistake: "encrypiton" -> "encryption"

trivial fix to spelling mistake in RT_TRACE message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
7 years agowlcore: fix spelling mistake in wl1271_warning
Colin Ian King [Thu, 29 Dec 2016 20:14:00 +0000 (20:14 +0000)]
wlcore: fix spelling mistake in wl1271_warning

trivial fix to spelling mistake of function name in wl1271_warning,
should be dynamic_ps_timeout instead of dyanmic_ps_timeout.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
7 years agoMerge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
Kalle Valo [Sun, 1 Jan 2017 18:48:37 +0000 (20:48 +0200)]
Merge ath-next from git://git./linux/kernel/git/kvalo/ath.git

ath.git patches for 4.11. Major changes:

ath9k

* cleanup eeprom endian handling
* add airtime fairness scheduling

ath10k

* fix issues for new QCA9377 firmware version
* support dev_coredump() for firmware crash dump
* enable channel 169 on 5 GHz band