Daniel Borkmann [Wed, 27 Aug 2014 09:11:27 +0000 (11:11 +0200)]
net: add skb_get_tx_queue() helper
Replace occurences of skb_get_queue_mapping() and follow-up
netdev_get_tx_queue() with an actual helper function.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francois Romieu [Tue, 26 Aug 2014 20:40:38 +0000 (22:40 +0200)]
r8169: add missing MODULE_FIRMWARE.
Leftover from
6e1d0b8988188956dac091441c1492a79a342666 ("r8169:add
support for RTL8168H and RTL8107E").
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Chun-Hao Lin <hau@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Aug 2014 21:19:38 +0000 (14:19 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2014-08-27
This series contains updates to i40e and i40evf.
Carolyn provides two patches, first changes the wording of the flow
director add/remove and asynchronous failure messages to include the
fd_id to try and add some way to track the operations on a given fd_id.
Second adds a check during handle_link_event for unqualified modules
when link is down and there is a module plugged in.
Anjali provides four patches to i40e/i40evf. First update flow director
messages so that a user can tell if a filter was added or deleted. Then
updates the ATR policy to not auto-disable ATR when we have errors in
programming. The disabling of ATR when we got programming errors was
buggy and was still adding new rules and causing continuous errors.
With this policy change, we flush instead when we see too many errors.
In addition she adds a flow director flush counter to ethtool to help
know how many times the interface had to flush and replay the flow
director filter table. Updates the driver to ignores a driver
perceived transmit hang if the number of descriptors pending is less
than 4, and instead log a stat when this situation happens. This is
because the queue progresses forward and the stack never experiences
a real hang in these situations.
Shannon provides three patches for i40e/i40evf, first enables the
l2tsel bit on receive queue contexts that are assigned to VFs so that
the VF can get the stripped VLAN tag. Then adds a max buffer size
parameter to the print helper to be sure the code knows when to stop.
Lastly, remove the complaint when removing the default MAC VLAN filter.
This was because old firmware had an incorrect MAC VLAN filter that
needed to be replaced at startup, and now newer firmware does not have
this problem. So now we only add the new filter if the removal
succeeded and no need to complain if the removal fails.
Ashish provides a change to vsi->num_queue_pairs to equal the number
that is configured by the VF. This limits the number of queues that
are enabled/disabled and fixes the mismatch case for when a VF
configures fewer queues than is allocated to it by the PF.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Aug 2014 08:39:49 +0000 (01:39 -0700)]
virtio_net: flush when in xmit_more mode and under descriptor pressure
Mirror the changes made to ixgbe in commit
2367a17390138f68b3aa28f2f220b8d7ff8d91f4
("ixgbe: flush when in xmit_more mode and under descriptor pressure")
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Aug 2014 08:39:31 +0000 (01:39 -0700)]
igb: flush when in xmit_more mode and under descriptor pressure
Mirror the changes made to ixgbe in commit
2367a17390138f68b3aa28f2f220b8d7ff8d91f4
("ixgbe: flush when in xmit_more mode and under descriptor pressure")
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 26 Aug 2014 17:34:18 +0000 (19:34 +0200)]
ixgbe: flush when in xmit_more mode and under descriptor pressure
When xmit_more mode is being used and the ring is about to
become full or the stack has stopped the ring, enforce a tail
pointer write to the hw. Otherwise, we could risk a TX hang.
Code suggested by Alexander Duyck.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Aug 2014 06:16:19 +0000 (23:16 -0700)]
Merge branch 'bcm7xxx'
Florian Fainelli says:
====================
Broadcom BCM7xxx PHY updates for new entries
Another week, another set of updates for the Broadcom BCM7xxx PHY driver. This
patch set cleanups the existing definitions, adds a macro to ease the addition
of future chips, and finally add two new SoCs to the list of supported chips.
Resending since the first patch did not make it to the list, sorry about that.
Changes in v2:
- rephrased commit message for patch 1 to make it pass majordomo
capital triple X was rejected
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 26 Aug 2014 20:15:27 +0000 (13:15 -0700)]
net: phy: bcm7xxx: add BCM7250 and BCM7364 PHY entries
Add two new entries to the Broadcom BCM7xxx internal PHY driver for
BCM7250 and BCM7364 chips. Those chips share the usual 28nm process
Gigabit PHY sequence and require the same workarounds so far.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 26 Aug 2014 20:15:26 +0000 (13:15 -0700)]
net: phy: broadcom: add new Broadcom OUI
Broadcom started to use a new OUI for its 2013 and newer products:
D4-01-29 which translates into 0xae025000 for a 32-bits OUI, add its
definition.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 26 Aug 2014 20:15:25 +0000 (13:15 -0700)]
net: phy: broadcom: fix PHY_BCM_OUI_4
PHY_BCM_OUI_4 is missing two significant digits that actually make it an
OUI, add those missing bits so it becomes usable again for matching.
Fixes: b560a58c45c6 ("net: phy: add Broadcom BCM7xxx internal PHY driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 26 Aug 2014 20:15:24 +0000 (13:15 -0700)]
net: phy: bcm7xxx: introduce helper macro
All 28nm Gigabit PHYs supported by the driver have the same
callbacks, the only differences being the 32-bits OUI and the name. Use
a macro to factor this, making it easier in the future to add new
entries.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Aug 2014 06:07:18 +0000 (23:07 -0700)]
Merge branch 'sf2'
Florian Fainelli says:
====================
dsa: Broadcom Starfighter 2 switch support
This patch series adds support for the Broadcom Starfighter 2 (Roboswitch
successor) using the existing DSA infrastructure. This integrated switch
is heavily used in Set Top Box, Cable gateways and DSL gateways products
from Broadcom, and to a larger extent the new ARM-based Wi-Fi routers although
slightly differently.
Changes in v5 are the introduction of ETH_P_XDSA as suggested by Alexander to
help capture applications see this is a multiplexed DSA approach now.
Changes in v4 are the introducing of an indirection level for DSA switch tag
protocols receive and transmit functions.
I intentionnaly did not address one comment from Alexander who suggested to
move port_names and port_dn in a separate structure since that involves
touching arch/arm/ and arch/blackfin/ code which I am not yet comfortable
doing.
Notable changes in v3 is the preliminary patch that reworks the skb->protocol
override helpers for non-Ethertype switch tags, based on feedback from
Alexander Duyck.
The biggest changes from v1 of this patch series are:
- use the new fixed PHY helpers
- improved the switch driver with more complete features (interrupts,
(RG)MII configuration, memory arrays power down/up, port disabling/enable
VLAN separation
Future work will focus on bringing the upstream driver in feature parity with
the current downstream driver, including:
- adding Wake-on-LAN support to the switch
- adding suspend/resume callbacks for S2/S3 Power Management modes
- extending the switch register interface to cover BCM5310X SoCs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Aug 2014 05:59:26 +0000 (22:59 -0700)]
sungem: Fix global namespace pollution of phy accessors.
The sungem driver has "phy_read()" and "phy_write()" functions, which
we need to rename because the generic phy layer is about to export
generic interfaces with the same name.
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:58 +0000 (17:04 -0700)]
Documentation: devicetree: add Broadcom Starfighter 2 binding
Add the binding documentation for the Broadcom Starfighter 2 integrated
switch hardware.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:57 +0000 (17:04 -0700)]
Documentation: devicetree: update dsa binding with optional properties
Add documentation for a bunch of new optional properties described in
ethernet.txt and fixed-link.txt, this includes: 'phy-handle', 'phy-mode'
and the 'fixed-link' subnode.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:56 +0000 (17:04 -0700)]
net: dsa: add Broadcom SF2 switch driver
Add support for the Broadcom Starfigther 2 switch chip using a DSA
driver. This switch driver supports the following features:
- configuration of the external switch port interface: MII, RevMII,
RGMII and RGMII_NO_ID are supported
- support for the per-port MIB counters
- support for link interrupts for special ports (e.g: MoCA)
- powering up/down of switch memories to conserve power when ports are
unused
Finally, update the compatible property for the DSA core code to match
our switch top-level compatible node.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:55 +0000 (17:04 -0700)]
net: dsa: add Broadcom tag RX/TX handler
Add support for the 4-bytes Broadcom tag that built-in switches such as
the Starfighter 2 might insert when receiving packets, or that we need
to insert while targetting specific switch ports. We use a fake local
EtherType value for this 4-bytes switch tag: ETH_P_BRCMTAG to make sure
we can assign DSA-specific network operations within the DSA drivers.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:54 +0000 (17:04 -0700)]
net: dsa: allow updating fixed PHY link information
Allow switch drivers to hook a PHY link update callback to perform
port-specific link work.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:53 +0000 (17:04 -0700)]
net: dsa: allow drivers to do link adjustment
Whenever libphy determines that the link status of a given PHY/port has
changed, allow to call into the switch driver link adjustment callback
so proper actions can be taken care of by the switch driver upon link
notification.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:52 +0000 (17:04 -0700)]
net: dsa: allow switches to work without tagging
In case switch port tagging is disabled (voluntarily, or the switch just
does not support it), allow us to continue using the defined set of
dsa_device_ops in net/dsa/slave.c.
We introduce dsa_protocol_is_tagged() to check whether we need to
override skb->protocol and go through the DSA-specifif packet_type
function, or if we just go on and receive the SKB through the normal
path.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:51 +0000 (17:04 -0700)]
net: dsa: allow for more complex PHY setups
Modify the DSA slave interface to be bound to an arbitray PHY, not just
the ones that are available as child PHY devices of the switch MDIO bus.
This allows us for instance to have external PHYs connected to a
separate MDIO bus, but yet also connected to a given switch port.
Under certain configurations, the physical port mask might not be a 1:1
mapping to the MII PHYs mask. This is the case, if e.g: Port 1 of the
switch is used and connects to a PHY at a MDIO address different than 1.
Introduce a phys_mii_mask variable which allows driver to implement and
divert their own MDIO read/writes operations for a subset of the MDIO
PHY addresses.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:50 +0000 (17:04 -0700)]
net: dsa: retain a per-port device_node pointer
We will later use the per-port device_node pointer to fetch a bunch of
port-specific properties.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:49 +0000 (17:04 -0700)]
net: dsa: provide a switch device device tree node pointer
We might need to fetch additional resources from the device tree node
pointer, such as register ranges or other properties. Keep a device_node
pointer around for this.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:48 +0000 (17:04 -0700)]
net: phy: provide stub for fixed_phy_set_link_update
In preparation for updating the DSA code and avoid using ifdefs there,
provide an empty stub for fixed_phy_set_link_update when
CONFIG_FIXED_PHY is not selected.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:47 +0000 (17:04 -0700)]
net: phy: add generic UniMAC MDIO bus driver
Add a generic UniMAC MDIO bus driver and its Device Tree binding, which
can be used by the BCMGENET driver as-is, and the upcoming Starfighter 2
Ethernet switch MDIO bus controller.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 28 Aug 2014 00:04:46 +0000 (17:04 -0700)]
net: dsa: reduce number of protocol hooks
DSA is currently registering one packet_type function per EtherType it
needs to intercept in the receive path of a DSA-enabled Ethernet device.
Right now we have three of them: trailer, DSA and eDSA, and there might
be more in the future, this will not scale to the addition of new
protocols.
This patch proceeds with adding a new layer of abstraction and two new
functions:
dsa_switch_rcv() which will dispatch into the tag-protocol specific
receive function implemented by net/dsa/tag_*.c
dsa_slave_xmit() which will dispatch into the tag-protocol specific
transmit function implemented by net/dsa/tag_*.c
When we do create the per-port slave network devices, we iterate over
the switch protocol to assign the DSA-specific receive and transmit
operations.
A new fake ethertype value is used: ETH_P_XDSA to illustrate the fact
that this is no longer going to look like ETH_P_DSA or ETH_P_TRAILER
like it used to be.
This allows us to greatly simplify the check in eth_type_trans() and
always override the skb->protocol with ETH_P_XDSA for Ethernet switches
tagged protocol, while also reducing the number repetitive slave
netdevice_ops assignments.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Aug 2014 05:59:26 +0000 (22:59 -0700)]
sungem: Fix global namespace pollution of phy accessors.
The sungem driver has "phy_read()" and "phy_write()" functions, which
we need to rename because the generic phy layer is about to export
generic interfaces with the same name.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Aug 2014 01:24:16 +0000 (18:24 -0700)]
tulip: dmfe: Fix global namespace pollution of phy accessors.
The dmfe driver has "phy_read()" and "phy_write()" functions, which
we need to rename because the generic phy layer is about to export
generic interfaces with the same name.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Aug 2014 00:05:53 +0000 (17:05 -0700)]
f_ncm: Don't use netdev_start_xmit().
Unfortunately, the USB gadget layer has this weird things where NULL
skbs are passed into ops->ndo_start_xmit() in order to trigger the
dev->wrap() calls to build packets.
This is completely outside of the allowable range of sane arguments
for the ndo_start_xmit method. All invocations of ndo_start_xmit()
should be with non-NULL SKB arguments.
Put back the direct call, but with a comment explaining how this
is not acceptable in the long term.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Romain Perier [Tue, 26 Aug 2014 13:14:51 +0000 (13:14 +0000)]
ethernet: arc: Add support for specific SoC layer device tree bindings
Some platforms have special bank registers which might be used to
select the correct clock or the right mode for Media Indepent Interface
controllers. Sometimes, it is also required to activate vcc regulators
in the right order to supply the ethernet controller at the right time.
This patch is an architecture refactoring of the arc-emac device driver.
It adds a new software design which allows to add specific platform
glue layer. Each platform has now its own module which performs custom
initialization and remove for the target and then calls to the
core driver.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Romain Perier [Tue, 26 Aug 2014 13:14:50 +0000 (13:14 +0000)]
ethernet: arc: mdio changes for future SoC glue layer devtree support
This is an api changes for the emac_mdio.c module.
It will be required later when arc_emac_probe/arc_emac_remove
will no longer use 'struct platform_device'.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Romain Perier [Tue, 26 Aug 2014 13:14:49 +0000 (13:14 +0000)]
ethernet: arc: remove use of 'struct platform_device'
This is a preparation of an api changes for the emac_main.c module.
The involved functions are arc_emac_probe and arc_emac_remove.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Tue, 26 Aug 2014 10:55:53 +0000 (12:55 +0200)]
tcp: syncookies: mark cookie_secret read_mostly
only written once.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Tue, 26 Aug 2014 07:24:41 +0000 (10:24 +0300)]
bnx2x: Fix static checker warning regarding `txdata_ptr'
Incorrect checking of array instead of array contents in panic_dump
flow - results of commit
e261199872a2 ("bnx2x: Safe bnx2x_panic_dump()").
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Tue, 26 Aug 2014 02:08:23 +0000 (10:08 +0800)]
r8152: replace strncpy with strlcpy
Replace the strncpy with strlcpy, and use sizeof to determine the
length.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 27 Aug 2014 21:39:04 +0000 (14:39 -0700)]
net: Update sk_buff flag bit availability comment.
We lost one when xmit_more was added.
Signed-off-by: David S. Miller <davem@davemloft.net>
Catherine Sullivan [Thu, 10 Jul 2014 07:58:26 +0000 (07:58 +0000)]
i40e/i40evf: Bump i40e & i40evf version
Bump versions for i40e to 1.0.4 and i40evf to 1.0.1.
Change-ID: I960c04da2c91bdf1d02f8e5011e68c34a634122d
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-By: Jim Young <jamesx.m.young@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Thu, 10 Jul 2014 07:58:25 +0000 (07:58 +0000)]
i40e/i40evf: Ignore a driver perceived Tx hang if the number of desc pending < 4
We are seeing situations where the driver sees a hang with less than 4
desc pending, if the driver chooses to ignore it the queue progresses
forward and the stack never experiences a real hang.
With this patch we will log a stat when this situation happens
"tx_sluggish" will increment and we can see some more details
at a higher debug level. Other than that we will ignore this
particular case of Tx hang.
Change-ID: I7d1d1666d990e2b12f4f6bed0d17d22e1b6410d5
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Tue, 29 Jul 2014 04:01:50 +0000 (04:01 +0000)]
i40e: quiet complaints when removing default MAC VLAN filter and make set_mac reversible
Older firmware has an incorrect MAC VLAN filter that needs to be replaced
at startup, and now newer firmware doesn't have this problem. With this
change we no longer complain if the remove fails, and we only add the
new filter if the remove succeeded.
Setting a new LAA worked the first time, but didn't work well in successive
operations, including returning to the HW default address. This simplifies
the code that was trying to be too smart.
Lastly, this pulls the hardware default mac address out into separate
handling code and keeps the broadcast filtering from getting munged.
Change-ID: I1f54b002def04ffef2546febb9a4044385452f85
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 10 Jul 2014 07:58:20 +0000 (07:58 +0000)]
i40e/i40evf: add max buf len to aq debug print helper
There is at least one case in the Firmware API where the response to a
command changes the buffer size field in the AQ descriptor to a larger
number than what the request's buffer size started as. This is in addition
to setting an error flag and is in order to tell the requester how much
larger a buffer is required for the answer. We need to be sure not to
use that number when dumping the contents of the data buffer because it
can send us into the weeds and generate an invalid pointer exception.
This patch adds a max buffer size parameter to the print helper to be
sure the code knows when to stop.
Change-ID: Ib84f7ed72140fe9d600086d8f2002fc5d8753092
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Carolyn Wyborny [Thu, 10 Jul 2014 07:58:19 +0000 (07:58 +0000)]
i40e: Add checks and message for Qualified Module info
This patch adds a check during handle_link_event for unqualified
module when link is down and there is a module plugged. If found,
print a message.
Change-ID: Ibd8666d77d3044c2a3dd4d762d3ae9ac6e18e943
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ashish Shah [Thu, 10 Jul 2014 07:58:15 +0000 (07:58 +0000)]
i40e: set num_queue_pairs to num configured by VF
Change vsi->num_queue_pairs to equal the number that are configured
by the VF. This, in turn, limits the number of queues that are
enable/disabled. This fixes the mismatched case for when a VF configures
fewer queues than is allocated to it by the PF.
Change other sections to use alloc_queue_pairs as warranted.
Change-ID: I0de1b55c9084e7be6acc818da8569f12128a82c2
Signed-off-by: Ashish Shah <ashish.n.shah@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Tue, 29 Jul 2014 04:01:03 +0000 (04:01 +0000)]
i40e: Enable l2tsel bit for VLAN tag control
Enable the l2tsel bit on Rx queue contexts that are assigned to VFs so
that the VF can get the stripped VLAN tag.
Change-ID: I7d9bc56238a9ea9baf5e8a97e69b9e27ebb9d169
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Ashish Shah <ashish.n.shah@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Wed, 9 Jul 2014 07:46:23 +0000 (07:46 +0000)]
i40e: Add a FD flush counter to ethtool
This helps know how many times the interface had to flush and replay FD
filter table, which gives an indication on how often we are getting FD
table full situation.
Also check on certain pf states before proceeding to add or delete
filters since we can't add or delete filters if we are in those states.
Change-ID: I97f5bbbea7146833ea61af0e08ea794fccba1780
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Thu, 10 Jul 2014 08:03:26 +0000 (08:03 +0000)]
i40e: ATR policy change to flush the table to clean stale ATR rules
Instead of disabling ATR when we get a programming error, we now
will wait it out to see if some room gets created by ATR rule deletion.
If we still have too many errors and ATR filter count did not change
much, its time to flush and replay. We no more auto-disable ATR when
we have errors in programming.
The disabling of ATR when we get programming error was buggy and
was still adding new rules and causing continuous errors. With this
policy change we flush instead when we see too many errors.
ATR is still disabled if we add a SB rule for TCP/IPv4 flow type,
more logic is added to re-enable it once all SB TCP/IPv4 rules are gone.
Change-ID: I77edcbeab9500c72a7e0bd7b5c5b113ced133a9c
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai Jain [Wed, 9 Jul 2014 07:46:16 +0000 (07:46 +0000)]
i40e: Some FD message fixes
Change the message that gets printed when adding/deleting a filter to
the SB, so that user can tell if a filter was added or deleted.
Print filter add failures only in case of SB filters. For ATR the
information is not useful to the user and hence suppress it unless in
higher debug mode.
Change-ID: I78d7a7a6ecfa82a38a582b0d7b4da038355e3735
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Carolyn Wyborny [Wed, 9 Jul 2014 07:46:12 +0000 (07:46 +0000)]
i40e: Update flow director error messages to reduce user confusion
This patch changes the wording of the flow director add/remove and
asynchronous failure messages to include fd_id to try and add some
way to track the operations on a given fd_id. Its not perfect, but
its better than what we had as PCTYPE can apply to several different
filter requests.
This patch also removes a redundant message when filter
addition fails due to full condition.
Change-ID: Icf58b0603d4f162d9fc542f11a74866a907049f2
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
stephen hemminger [Mon, 25 Aug 2014 22:05:30 +0000 (15:05 -0700)]
neigh: document gc_thresh2
Missing documentation for gc_thresh2 sysctl.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Mon, 25 Aug 2014 20:11:48 +0000 (13:11 -0700)]
net: bnx2x: fix build error with ptp
bnx2x uses ptp functions, so it should select the provider of
those functions (PTP_1588_CLOCK). Fixes these build errors:
drivers/built-in.o: In function `__bnx2x_remove':
/home/jim/linux/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:13409:
undefined reference to `ptp_clock_unregister'
drivers/built-in.o: In function `bnx2x_register_phc':
/home/jim/linux/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:13202:
undefined reference to `ptp_clock_register'
drivers/built-in.o: In function `bnx2x_get_ts_info':
/home/jim/linux/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c:3498:
undefined reference to `ptp_clock_index'
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 25 Aug 2014 19:38:27 +0000 (21:38 +0200)]
team: set IFF_TEAM_PORT priv_flag after rx_handler is registered
When one tries to add eth as a port into team and that eth is already in
use by other rx_handler device (macvlan, bond, bridge, ...) a bug in
team_port_add() causes that IFF_TEAM_PORT flag is set before rx_handler
is registered. In between, netdev nofifier is called and
team_device_event() sees IFF_TEAM_PORT and thinks that rx_handler_data
pointer is set to team_port. But it isn't.
Fix this by reordering rx_handler register and IFF_TEAM_PORT priv flag
set so it is very similar to how bonding does this.
Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Fixes: 3d249d4ca7 "net: introduce ethernet teaming device"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov [Mon, 25 Aug 2014 19:27:02 +0000 (12:27 -0700)]
bpf: x86: add missing 'shift by register' instructions to x64 eBPF JIT
'shift by register' operations are supported by eBPF interpreter, but were
accidently left out of x64 JIT compiler. Fix it and add a testcase.
Reported-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT")
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 26 Aug 2014 00:30:27 +0000 (17:30 -0700)]
Merge branch 'bnx2x-next'
Yuval Mintz says:
====================
bnx2x: `fixes' patch-series
This series contains mostly bug fixes, but never the less is intended
for `net-next' and not `net', as:
- Some of the fixes are quite insignificant [`VF clean statistics',
`ethtool -d might cause timeout in log'].
- Some only recently were submitted to `net-next' [`Fix timesync endianity'].
- Some are not usually compiled as part of the kernel [`Fix stop-on-error'].
Dave - please consider applying this series to `net-next'; If you prefer,
I can break this series into 2 parts [one for `net' and the other for
`net-next'] - but personally I don't see much benefit in it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kalderon [Mon, 25 Aug 2014 14:48:33 +0000 (17:48 +0300)]
bnx2x: Fix timesync endianity
Commit
eeed018cbfa30 ("bnx2x: Add timestamping and PTP hardware clock support")
has a missing conversion to LE32, which will prevent the feature from working
on big endian machines.
Signed-off-by: Michal Kalderon <Michal.Kalderon@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Mon, 25 Aug 2014 14:48:32 +0000 (17:48 +0300)]
bnx2x: Be more forgiving toward SW GRO
This introduces 2 new relaxations in the bnx2x driver regarding GRO:
1. Don't prevent SW GRO if HW GRO is disabled.
2. If all aggregations are disabled, when GRO configuration changes
there's no need to perform an inner-reload [since it will have no
actual effect].
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Mon, 25 Aug 2014 14:48:31 +0000 (17:48 +0300)]
bnx2x: VF clean statistics
During statistics initialization of a VF we need to clean its statistics.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Mon, 25 Aug 2014 14:48:30 +0000 (17:48 +0300)]
bnx2x: Fix stop-on-error
When STOP_ON_ERROR is set driver will not compile. Even if it did,
traffic will not pass without this patch as several fields which are
verified by FW/HW on the Tx path are not properly set.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Mon, 25 Aug 2014 14:48:29 +0000 (17:48 +0300)]
bnx2x: ethtool -d might cause timeout in log
This changes slightly the set of registers read during `ethtool -d'.
Without this change, it's possible the HW will generate a grc Attention which
will be logged into system logs as `grc timeout'.
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Tue, 26 Aug 2014 00:03:47 +0000 (17:03 -0700)]
net: make skb an optional parameter for__skb_flow_dissect()
Fixes: commit 690e36e726d00d2 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Tue, 26 Aug 2014 00:03:46 +0000 (17:03 -0700)]
net: fix comments for __skb_flow_get_ports()
Fixes: commit 690e36e726d00d2 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 24 Aug 2014 13:42:16 +0000 (15:42 +0200)]
ixgbe: support skb->xmit_more in netdev_ops->ndo_start_xmit()
This implements the deferred tail pointer flush API for the ixgbe
driver. Similar version also proposed longer time ago by Alexander Duyck.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Aug 2014 22:51:53 +0000 (15:51 -0700)]
net: Remove ndo_xmit_flush netdev operation, use signalling instead.
As reported by Jesper Dangaard Brouer, for high packet rates the
overhead of having another indirect call in the TX path is
non-trivial.
There is the indirect call itself, and then there is all of the
reloading of the state to refetch the tail pointer value and
then write the device register.
Move to a more passive scheme, which requires very light modifications
to the device drivers.
The signal is a new skb->xmit_more value, if it is non-zero it means
that more SKBs are pending to be transmitted on the same queue as the
current SKB. And therefore, the driver may elide the tail pointer
update.
Right now skb->xmit_more is always zero.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Aug 2014 22:42:25 +0000 (15:42 -0700)]
Merge branch 'is_kdump_kernel'
Amir Vadai says:
====================
Make is_kdump_kernel() accessible from modules
I'm re-spinning this patchset. At the begining it was suggested to use a
different name for the parameter, but at the end [3] the resolution was to
leave it as it is in this patch.
Drivers need to know if running from kdump kernel in order to change their
memory profile - since kdump environment is limited by available memory.
Currently there are drivers that are using reset_devices as suggested in [2].
In [2] it was suggested to use reset_devices, but the context was, to enable
driver to know when the hardware device is needed to be reset, and not if this
is a kdump environment. We think that is_kdump_kernel() is better suited to
select between different memory profiles.
The first patch in this patchset exports a needed symbol in order to make
is_kdump_kernel() accessible from the drivers. The rest of the patches change
from reset_devices to is_kdump_kernel() in 2 networking drivers.
The idea of this patchset was suggested by Vivek Goyal.
Tested (only build) and applied on top of commit
8fc54f6: ("net: use
reciprocal_scale() helper")
[1] -
ea1c1af: ("net/mlx4_en: Reduce memory consumption on kdump kernel")
[2] - https://lkml.org/lkml/2011/1/27/341
[3] - http://www.spinics.net/lists/netdev/msg291492.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Mon, 25 Aug 2014 13:06:54 +0000 (16:06 +0300)]
net/bnx2x: Use is_kdump_kernel() to detect kdump kernel
Use is_kdump_kernel() to detect kdump kernel, instead of
reset_devices.
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Mon, 25 Aug 2014 13:06:53 +0000 (16:06 +0300)]
net/mlx4: Use is_kdump_kernel() to detect kdump kernel
Use is_kdump_kernel() to detect kdump kernel, instead of reset_devices.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Mon, 25 Aug 2014 13:06:52 +0000 (16:06 +0300)]
crash_dump: Make is_kdump_kernel() accessible from modules
In order to make is_kdump_kernel() accessible from modules, need to
make elfcorehdr_addr exported.
This was rejected in the past [1] because reset_devices was prefered in
that context (reseting the device in kdump kernel), but now there are
some network drivers that need to reduce memory usage when loaded from
a kdump kernel. And in that context, is_kdump_kernel() suits better.
[1] - https://lkml.org/lkml/2011/1/27/341
CC: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Machek [Mon, 25 Aug 2014 11:31:16 +0000 (13:31 +0200)]
stmmac: simple cleanups
This adds simple cleanups for stmmac, removing test we know is always
true, fixing whitespace, and moving code out of if().
Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Mon, 25 Aug 2014 07:53:00 +0000 (15:53 +0800)]
r8152: check code with checkpatch.pl
626: CHECK: Alignment should match open parenthesis
646: CHECK: Alignment should match open parenthesis
655: CHECK: Alignment should match open parenthesis
695: CHECK: Alignment should match open parenthesis
729: CHECK: Alignment should match open parenthesis
739: CHECK: Alignment should match open parenthesis
976: WARNING: externs should be avoided in .c files
1314: CHECK: Alignment should match open parenthesis
1358: WARNING: networking block comments don't use an empty /* line, use /* Comment...
1402: WARNING: networking block comments don't use an empty /* line, use /* Comment...
1521: CHECK: multiple assignments should be avoided
1775: CHECK: Alignment should match open parenthesis
1838: CHECK: multiple assignments should be avoided
1843: CHECK: multiple assignments should be avoided
1847: CHECK: multiple assignments should be avoided
1850: WARNING: Missing a blank line after declarations
1864: CHECK: Alignment should match open parenthesis
1872: CHECK: braces {} should be used on all arms of this statement
1906: CHECK: usleep_range is preferred over udelay
2865: WARNING: networking block comments don't use an empty /* line, use /* Comment...
3088: CHECK: Alignment should match open parenthesis
total: 0 errors, 5 warnings, 16 checks, 3567 lines checked
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Aug 2014 06:02:53 +0000 (23:02 -0700)]
Merge branch 'ndo_xmit_flush'
Basic deferred TX queue flushing infrastructure.
Over time, and specifically and more recently at the Networking
Workshop during Kernel SUmmit in Chicago, we have discussed the idea
of having some way to optimize transmits of multiple TX packets at
a time.
There are several areas of overhead that could be amortized with such
schemes. One has to do with locking and transactional overhead, the
other has to do with device specific costs.
This patch set here is more aimed at device specific costs.
Typically a device queues up a packet in the TX queue and then has to
do something to have the device start processing that new entry.
Sometimes this is composed of doing an MMIO write to a "tail"
register, and in other cases it can involve something as expensive as
a hypervisor call.
The basic setup defined here is that when the driver supports deferred
TX queue flushing, ndo_start_xmit should no longer perform that
operation. Instead a new operation, ndo_xmit_flush, should do it.
I have converted IGB and virtio_net as example initial users. The IGB
conversion is tested, virtio_net is not but it does compile :-)
All ndo_start_xmit call sites have been abstracted behind a new helper
called netdev_start_xmit().
This just adds the infrastructure, it does not actually add any
instances of actually doing multiple ndo_start_xmit calls per
ndo_xmit_flush invocation.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 23 Aug 2014 20:18:10 +0000 (13:18 -0700)]
virtio_net: Support netdev_ops->ndo_xmit_flush()
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 23 Aug 2014 00:24:49 +0000 (17:24 -0700)]
igb: Support netdev_ops->ndo_xmit_flush()
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 22 Aug 2014 23:21:53 +0000 (16:21 -0700)]
net: Add ops->ndo_xmit_flush()
Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Morris [Sun, 24 Aug 2014 20:53:12 +0000 (21:53 +0100)]
ipv6: White-space cleansing : gaps between function and symbol export
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.
Both objdump and diff -w show no differences.
This patch removes some blank lines between the end of a function
definition and the EXPORT_SYMBOL_GPL macro in order to prevent
checkpatch warning that EXPORT_SYMBOL must immediately follow
a function.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Morris [Sun, 24 Aug 2014 20:53:11 +0000 (21:53 +0100)]
ipv6: White-space cleansing : Structure layouts
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.
Both objdump and diff -w show no differences.
This patch addresses structure definitions, specifically it cleanses the brace
placement and replaces spaces with tabs in a few places.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Morris [Sun, 24 Aug 2014 20:53:10 +0000 (21:53 +0100)]
ipv6: White-space cleansing : Line Layouts
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.
Both objdump and diff -w show no differences.
A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.
Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Darek Marcinkiewicz [Sun, 24 Aug 2014 18:40:16 +0000 (20:40 +0200)]
net: ec_bhf: remove excessive debug messages
This cuts down the number of debug information spit out by
the driver.
Signed-off-by: Dariusz Marcinkiewicz <reksio@newterm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 23 Aug 2014 15:03:28 +0000 (17:03 +0200)]
random32: improvements to prandom_bytes
This patch addresses a couple of minor items, mostly addesssing
prandom_bytes(): 1) prandom_bytes{,_state}() should use size_t
for length arguments, 2) We can use put_unaligned() when filling
the array instead of open coding it [ perhaps some archs will
further benefit from their own arch specific implementation when
GCC cannot make up for it ], 3) Fix a typo, 4) Better use unsigned
int as type for getting the arch seed, 5) Make use of
prandom_u32_max() for timer slack.
Regarding the change to put_unaligned(), callers of prandom_bytes()
which internally invoke prandom_bytes_state(), don't bother as
they expect the array to be filled randomly and don't have any
control of the internal state what-so-ever (that's also why we
have periodic reseeding there, etc), so they really don't care.
Now for the direct callers of prandom_bytes_state(), which
are solely located in test cases for MTD devices, that is,
drivers/mtd/tests/{oobtest.c,pagetest.c,subpagetest.c}:
These tests basically fill a test write-vector through
prandom_bytes_state() with an a-priori defined seed each time
and write that to a MTD device. Later on, they set up a read-vector
and read back that blocks from the device. So in the verification
phase, the write-vector is being re-setup [ so same seed and
prandom_bytes_state() called ], and then memcmp()'ed against the
read-vector to check if the data is the same.
Akinobu, Lothar and I also tested this patch and it runs through
the 3 relevant MTD test cases w/o any errors on the nandsim device
(simulator for MTD devs) for x86_64, ppc64, ARM (i.MX28, i.MX53
and i.MX6):
# modprobe nandsim first_id_byte=0x20 second_id_byte=0xac \
third_id_byte=0x00 fourth_id_byte=0x15
# modprobe mtd_oobtest dev=0
# modprobe mtd_pagetest dev=0
# modprobe mtd_subpagetest dev=0
We also don't have any users depending directly on a particular
result of the PRNG (except the PRNG self-test itself), and that's
just fine as it e.g. allowed us easily to do things like upgrading
from taus88 to taus113.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Akinobu Mita <akinobu.mita@gmail.com>
Tested-by: Lothar Waßmann <LW@KARO-electronics.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Aug 2014 01:09:58 +0000 (18:09 -0700)]
Merge branch 'csums-next'
Tom Herbert says:
====================
net: Checksum offload changes - Part V
I am working on overhauling RX checksum offload. Goals of this effort
are:
- Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
- Preserve CHECKSUM_COMPLETE through encapsulation layers
- Don't do skb_checksum more than once per packet
- Unify GRO and non-GRO csum verification as much as possible
- Unify the checksum functions (checksum_init)
- Simplify code
What is in this fifth patch set:
- Added GRO checksum validation functions
- Call the GRO validations functions from TCP and GRE gro_receive
- Perform checksum verification in the UDP gro_receive path using
GRO functions and add support for gro_receive in UDP6
Changes in V2:
- Change ip_summed to CHECKSUM_UNNECESSARY instead of moving it
to CHECKSUM_COMPLETE from GRO checksum validation. This avoids
performance penalty in checksumming bytes which are before the header
GRO is at.
Please review carefully and test if possible, mucking with basic
checksum functions is always a little precarious :-)
----
Test results with this patch set are below. I did not notice any
performace regression.
Tests run:
TCP_STREAM: super_netperf with 200 streams
TCP_RR: super_netperf with 200 streams and -r 1,1
Device bnx2x (10Gbps):
No GRE RSS hash (RX interrupts occur on one core)
UDP RSS port hashing enabled.
* GRE with checksum with IPv4 encapsulated packets
With fix:
TCP_STREAM
9.91% CPU utilization
5163.78 Mbps
TCP_RR
50.64% CPU utilization
219/347/502 90/95/99% latencies
834103 tps
Without fix:
TCP_STREAM
10.05% CPU utilization
5186.22 tps
TCP_RR
49.70% CPU utilization
227/338/486 90/95/99% latencies
813450 tps
* GRE without checksum with IPv4 encapsulated packets
With fix:
TCP_STREAM
10.18% CPU utilization
5159 Mbps
TCP_RR
51.86% CPU utilization
214/325/471 90/95/99% latencies
865943 tps
Without fix:
TCP_STREAM
10.26% CPU utilization
5307.87 Mbps
TCP_RR
50.59% CPU utilization
224/325/476 90/95/99% latencies
846429 tps
*** Simulate device returns CHECKSUM_COMPLETE
* VXLAN with checksum
With fix:
TCP_STREAM
13.03% CPU utilization
9093.9 Mbps
TCP_RR
95.96% CPU utilization
161/259/474 90/95/99% latencies
1.14806e+06 tps
Without fix:
TCP_STREAM
13.59% CPU utilization
9093.97 Mbps
TCP_RR
93.95% CPU utilization
160/259/484 90/95/99% latencies
1.10262e+06 tps
* VXLAN without checksum
With fix:
TCP_STREAM
13.28% CPU utilization
9093.87 Mbps
TCP_RR
95.04% CPU utilization
155/246/439 90/95/99% latencies
1.15e+06 tps
Without fix:
TCP_STREAM
13.37% CPU utilization
9178.45 Mbps
TCP_RR
93.74% CPU utilization
161/257/469 90/95/99% latencies
1.1068e+06 Mbps
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 22 Aug 2014 20:34:52 +0000 (13:34 -0700)]
gre: When GRE csum is present count as encap layer wrt csum
In GRE demux if the GRE checksum pop rcv encapsulation so that any
encapsulated checksums are treated as tunnel checksums.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 22 Aug 2014 20:34:44 +0000 (13:34 -0700)]
udp: additional GRO support
Implement GRO for UDPv6. Add UDP checksum verification in gro_receive
for both UDP4 and UDP6 calling skb_gro_checksum_validate_zero_check.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 22 Aug 2014 20:34:30 +0000 (13:34 -0700)]
tcp: Call skb_gro_checksum_validate
In tcp[64]_gro_receive call skb_gro_checksum_validate to validate TCP
checksum in the gro context.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 22 Aug 2014 20:34:22 +0000 (13:34 -0700)]
gre: call skb_gro_checksum_simple_validate
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 22 Aug 2014 20:34:04 +0000 (13:34 -0700)]
net: add gro_compute_pseudo functions
Add inet_gro_compute_pseudo and ip6_gro_compute_pseudo. These are
the logical equivalents of inet_compute_pseudo and ip6_compute_pseudo
for GRO path. The IP header is taken from skb_gro_network_header.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 22 Aug 2014 20:33:47 +0000 (13:33 -0700)]
net: skb_gro_checksum_* functions
Add skb_gro_checksum_validate, skb_gro_checksum_validate_zero_check,
and skb_gro_checksum_simple_validate, and __skb_gro_checksum_complete.
These are the cognates of the normal checksum functions but are used
in the gro_receive path and operate on GRO related fields in sk_buffs.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 23 Aug 2014 18:58:54 +0000 (20:58 +0200)]
net: use reciprocal_scale() helper
Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 23 Aug 2014 19:13:41 +0000 (12:13 -0700)]
net: Allow raw buffers to be passed into the flow dissector.
Drivers, and perhaps other entities we have not yet considered,
sometimes want to know how deep the protocol headers go before
deciding how large of an SKB to allocate and how much of the packet to
place into the linear SKB area.
For example, consider a driver which has a device which DMAs into
pools of pages and then tells the driver where the data went in the
DMA descriptor(s). The driver can then build an SKB and reference
most of the data via SKB fragments (which are page/offset/length
triplets).
However at least some of the front of the packet should be placed into
the linear SKB area, which comes before the fragments, so that packet
processing can get at the headers efficiently. The first thing each
protocol layer is going to do is a "pskb_may_pull()" so we might as
well aggregate as much of this as possible while we're building the
SKB in the driver.
Part of supporting this is that we don't have an SKB yet, so we want
to be able to let the flow dissector operate on a raw buffer in order
to compute the offset of the end of the headers.
So now we have a __skb_flow_dissect() which takes an explicit data
pointer and length.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 23 Aug 2014 18:39:24 +0000 (11:39 -0700)]
Merge branch 'bcm7xxx_apd_eee'
Florian Fainelli says:
====================
net: phy: bcm7xxx: APD and EEE support
This patch series enables Auto-power down and EEE for the BCM7xxx integrated
Gigabit PHYs.
I also put a fix for the fixed PHY that would allow clause 45 over clause 22
reads/writes but would return bogus data by using e.g: ethtool --show-eee
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 23 Aug 2014 01:55:45 +0000 (18:55 -0700)]
net: phy: bcm7xxx: enable EEE at the PHY level
The 28nm Gigabit PHY on BCM7xxx chips comes out of reset with absolutely
no EEE capabilities, such that we would actually return that we do not
support EEE when accessing 3.20 (MDIO_PCS_EEE_ABLE) registers.
Poke through the vendor-specific C45 register to enable EEE globally at
the PHY level, and advertise supported EEE modes.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 23 Aug 2014 01:55:44 +0000 (18:55 -0700)]
net: phy: allow phy_init_eee() to work with internal PHYs
Internal PHYs do not have any specific phy_interface_t defined because
they are within an Ethernet MAC or a larger IC, they will fail the early
check in phy_init_eee(). Allow these PHYs to proceed with EEE
initialization and report error/success by checking the standard C45
EEE-related registers.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 23 Aug 2014 01:55:43 +0000 (18:55 -0700)]
net: phy: export phy_{read,write}_mmd_indirect
Some PHY drivers might need to access Clause 45 registers in Clause 22
compatibility mode to e.g: properly advertise EEE support when disabled
by default.
Export these two helper functions: phy_read_mmd_indirect() and
phy_write_mmd_indirect() for drivers to use them.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 23 Aug 2014 01:55:42 +0000 (18:55 -0700)]
net: phy: fixed: return an error for Clause 45 over 22 reads
The fixed PHY driver does not properly emulate Clause 45 over Clause 22
MDIO reads, and as such, will return bogus values when we access such
registers.
Return an error when accessing these registers in order to prevent
advertising bogus capabilities such as EEE support and such.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 23 Aug 2014 01:55:41 +0000 (18:55 -0700)]
net: phy: bcm7xxx: enable auto power down
The 28nm process BCM7xxx internal Gigabit PHYs all support automatic
power down, turn on that feature as part of the configuration
initialization callback.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 23 Aug 2014 01:55:40 +0000 (18:55 -0700)]
net: phy: broadcom: move shadow 0x1C register accessors to brcmphy.h
The shadow register 0x1C is used both by the BCM54xxx PHYs and the
BCM7xxx internal PHYs, move the accessors to a common location so both
drivers can use them.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 23 Aug 2014 01:55:39 +0000 (18:55 -0700)]
net: phy: broadcom: extract all registers to brcmphy.h
Commit
439d39a9ac8fbbba9c04581361188f33f21ced50 ("net: phy: broadcom:
extract register definitions") added a bunch of registers to brcmphy.h
but left some to broadcom.c, move all of them to the header file since
the BCM54xx and BCM7xxx PHY drivers do share all of these registers.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 23 Aug 2014 18:18:41 +0000 (11:18 -0700)]
Merge branch 'tipc-next'
Jon Maloy says:
====================
tipc: Merge port and socket layer code
After the removal of the TIPC native interface, there is no reason to
keep a distinction between a "generic" port layer and a "specific"
socket layer in the code. Throughout the last months, we have posted
several series that aimed at facilitating removal of the port layer,
and in particular the port_lock spinlock, which in reality duplicates
the role normally kept by lock_sock()/bh_lock_sock().
In this series, we finalize this work, by making a significant number of
changes to the link, node, port and socket code, all with the aim of
reducing dependencies between the layers. In the final commits, we then
remove the port spinlock, port.c and port.h altogether.
After this series, we have a socket layer that has only few dependencies
to the rest of the stack, so that it should be possible to continue
cleanups of its code without significantly affecting other code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:20 +0000 (18:09 -0400)]
tipc: merge struct tipc_port into struct tipc_sock
We complete the merging of the port and socket layer by aggregating
the fields of struct tipc_port directly into struct tipc_sock, and
moving the combined structure into socket.c.
We also move all functions and macros that are not any longer
exposed to the rest of the stack into socket.c, and rename them
accordingly.
Despite the size of this commit, there are no functional changes.
We have only made such changes that are necessary due of the removal
of struct tipc_port.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:19 +0000 (18:09 -0400)]
tipc: remove files ref.h and ref.c
The reference table is now 'socket aware' instead of being generic,
and has in reality become a socket internal table. In order to be
able to minimize the API exposed by the socket layer towards the rest
of the stack, we now move the reference table definitions and functions
into the file socket.c, and rename the functions accordingly.
There are no functional changes in this commit.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:18 +0000 (18:09 -0400)]
tipc: remove include file port.h
We move the inline functions in the file port.h to socket.c, and modify
their names accordingly.
We move struct tipc_port and some macros to socket.h.
Finally, we remove the file port.h.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:17 +0000 (18:09 -0400)]
tipc: remove source file port.c
In this commit, we move the remaining functions in port.c to
socket.c, and give them new names that correspond to their new
location. We then remove the file port.c.
There are only cosmetic changes to the moved functions.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:16 +0000 (18:09 -0400)]
tipc: remove port_lock
In previous commits we have reduced usage of port_lock to a minimum,
and complemented it with usage of bh_lock_sock() at the remaining
locations. The purpose has been to remove this lock altogether, since
it largely duplicates the role of bh_lock_sock. We are now ready to do
this.
However, we still need to protect the BH callers from inadvertent
release of the socket while they hold a reference to it. We do this by
replacing port_lock by a combination of a rw-lock protecting the
reference table as such, and updating the socket reference counter while
the socket is referenced from BH. This technique is more standard and
comprehensible than the previous approach, and turns out to have a
positive effect on overall performance.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 22 Aug 2014 22:09:15 +0000 (18:09 -0400)]
tipc: replace port pointer with socket pointer in registry
In order to make tipc_sock the only entity referencable from other
parts of the stack, we add a tipc_sock pointer instead of a tipc_port
pointer to the registry. As a consequence, we also let the function
tipc_port_lock() return a pointer to a tipc_sock instead of a tipc_port.
We keep the function's name for now, since the lock still is owned by
the port.
This is another step in the direction of eliminating port_lock, replacing
its usage with lock_sock() and bh_lock_sock().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>