git.openwrt.org Git - openwrt/staging/blogic.git/log

tools, bpf_asm: simplify parser rule for BPF extensions

We can already use yylval in the lexer for encoding the BPF extension
number, so that the parser rules can be further reduced to a single one
for each B/H/W case.

Signed-off-by: Ray Bellis <ray@isc.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netcp: use pointer to fix build fail

While building keystone_defconfig of arm we are getting build failure
with the error:

drivers/net/ethernet/ti/netcp_core.c:1846:31: error: invalid type argument of '->' (have 'struct tc_to_netdev')
  if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
                               ^
drivers/net/ethernet/ti/netcp_core.c:1851:35: error: invalid type argument of '->' (have 'struct tc_to_netdev')
      (dev->real_num_tx_queues < tc->tc))
                                   ^
drivers/net/ethernet/ti/netcp_core.c:1855:8: error: invalid type argument of '->' (have 'struct tc_to_netdev')
  if (tc->tc) {
        ^
drivers/net/ethernet/ti/netcp_core.c:1856:28: error: invalid type argument of '->' (have 'struct tc_to_netdev')
   netdev_set_num_tc(dev, tc->tc);
                            ^
drivers/net/ethernet/ti/netcp_core.c:1857:21: error: invalid type argument of '->' (have 'struct tc_to_netdev')
   for (i = 0; i < tc->tc; i++)
                     ^
drivers/net/ethernet/ti/netcp_core.c: At top level:
drivers/net/ethernet/ti/netcp_core.c:1879:2: warning: initialization from incompatible pointer type
  .ndo_setup_tc  = netcp_setup_tc,
  ^

The callback of ndo_setup_tc should be:
int (*ndo_setup_tc)(struct net_device *dev, u32 handle, __be16 protocol,
                    struct tc_to_netdev *tc);

But we missed marking the last argument as a pointer.

Fixes: 16e5cc647173 ("net: rework setup_tc ndo op to consume general tc operand")
CC: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qed-next'

Yuval Mintz says:

====================
qed*: Driver updates

This contains various minor changes to driver - changing memory allocation,
fixing a small theoretical bug, as well as some mostly-semantic changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qed,qede: Bump driver versions to 8.7.0.0

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Introduce DMA_REGPAIR_LE

FW hsi contains regpairs, mostly for 64-bit address representations.
Since same paradigm is applied each time a regpair is filled, this
introduces a new utility macro for setting such regpairs.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Change metadata needed for SPQ entries

Each configuration element send via ramrod requires a Slow Path Queue
entry. This slightly changes the way such an entry is configured, but
contains mostly semantic changes [where more parameters are gathered
in a sub-struct instead of being directly passed].

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Handle possible race in SB config

Due to HW design, some of the memories are wide-bus and access to those
needs to be sequentialized on a per-HW-block level; Read/write to a
given HW-block might break other read/write to wide-bus memory done at
~same time.

Status blocks initialization in CAU is done into such a wide-bus memory.
This moves the initialization into using DMAE which is guaranteed to be
safe to use on such memories.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Turn most GFP_ATOMIC into GFP_KERNEL

Initial driver submission used GFP_ATOMIC almost inclusively when
allocating memory. We now remedy this point, using GFP_KERNEL where
it's possible.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipvlan-misc'

Mahesh Bandewar says:

====================
IPvlan misc patches

This is a collection of unrelated patches for IPvlan driver.
a. crub_skb() changes are added to ensure that the packets hit the
NF_HOOKS in masters' ns in L3 mode.
b. u16 change is bug fix while
c. the third patch is to group tx/rx variables in single cacheline
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipvlan: misc changes

1. scope correction for few functions that are used in single file.
2. Adjust variables that are used in fast-path to fit into single cacheline
3. Update rcv_frame() to skip shared check for frames coming over wire

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipvlan: mode is u16

The mode argument was erronusly defined as u32 but it has always
been u16. Also use ipvlan_set_mode() helper to set the mode instead
of assigning directly. This should avoid future erronus assignments /
updates.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipvlan: scrub skb before routing in L3 mode.

Scrub skb before hitting the iptable hooks to ensure packets hit
these hooks. Set the xnet param only when the packet is crossing the
ns boundry so if the IPvlan slave and master belong to the same ns,
the param will be set to false.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-4.6-20160220' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2016-02-20

this is a pull request of 9 patch for net-next/master.

The first 3 patches are from Damien Riegel, they add support for
Technologic Systems IP core to tje sja100 driver. The next patches 6 by
Marek Vasut (including one my me) first clean sort the CAN driver's
Kconfig and Makefiles and then add support for the IFI CANFD IP core.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-helper-improvements'

Daniel Borkmann says:

====================
BPF updates

This set contains various updates for eBPF, i.e. the addition of a
generic csum helper function and other misc bits that mostly improve
existing helpers and ease programming with eBPF on cls_bpf. For more
details, please see individual patches.

Set is rebased on top of http://patchwork.ozlabs.org/patch/584465/.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: don't emit mov A,A on return

While debugging with bpf_jit_disasm I noticed emissions of 'mov %eax,%eax',
and found that this comes from BPF_RET | BPF_A translations from classic
BPF. Emitting this is unnecessary as BPF_REG_A is mapped into BPF_REG_0
already, therefore only emit a mov when immediates are used as return value.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: fix csum update in bpf_l4_csum_replace helper for udp

When using this helper for updating UDP checksums, we need to extend
this in order to write CSUM_MANGLED_0 for csum computations that result
into 0 as sum. Reason we need this is because packets with a checksum
could otherwise become incorrectly marked as a packet without a checksum.
Likewise, if the user indicates BPF_F_MARK_MANGLED_0, then we should
not turn packets without a checksum into ones with a checksum.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: try harder on clones when writing into skb

When we're dealing with clones and the area is not writeable, try
harder and get a copy via pskb_expand_head(). Replace also other
occurences in tc actions with the new skb_try_make_writable().

Reported-by: Ashhad Sheikh <ashhadsheikh394@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: remove artificial bpf_skb_{load, store}_bytes buffer limitation

We currently limit bpf_skb_store_bytes() and bpf_skb_load_bytes()
helpers to only store or load a maximum buffer of 16 bytes. Thus,
loading, rewriting and storing headers require several bpf_skb_load_bytes()
and bpf_skb_store_bytes() calls.

Also here we can use a per-cpu scratch buffer instead in order to not
pressure stack space any further. I do suspect that this limit was mainly
set in place for this particular reason. So, ease program development
by removing this limitation and make the scratchpad generic, so it can
be reused.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add generic bpf_csum_diff helper

For L4 checksums, we currently have bpf_l4_csum_replace() helper. It's
currently limited to handle 2 and 4 byte changes in a header and feeds the
from/to into inet_proto_csum_replace{2,4}() helpers of the kernel. When
working with IPv6, for example, this makes it rather cumbersome to deal
with, similarly when editing larger parts of a header.

Instead, extend the API in a more generic way: For bpf_l4_csum_replace(),
add a case for header field mask of 0 to change the checksum at a given
offset through inet_proto_csum_replace_by_diff(), and provide a helper
bpf_csum_diff() that can generically calculate a from/to diff for arbitrary
amounts of data.

This can be used in multiple ways: for the bpf_l4_csum_replace() only
part, this even provides us with the option to insert precalculated diffs
from user space f.e. from a map, or from bpf_csum_diff() during runtime.

bpf_csum_diff() has a optional from/to stack buffer input, so we can
calculate a diff by using a scratchbuffer for scenarios where we're
inserting (from is NULL), removing (to is NULL) or diffing (from/to buffers
don't need to be of equal size) data. Also, bpf_csum_diff() allows to
feed a previous csum into csum_partial(), so the function can also be
cascaded.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add new arg_type that allows for 0 sized stack buffer

Currently, when we pass a buffer from the eBPF stack into a helper
function, the function proto indicates argument types as ARG_PTR_TO_STACK
and ARG_CONST_STACK_SIZE pair. If R<X> contains the former, then R<X+1>
must be of the latter type. Then, verifier checks whether the buffer
points into eBPF stack, is initialized, etc. The verifier also guarantees
that the constant value passed in R<X+1> is greater than 0, so helper
functions don't need to test for it and can always assume a non-NULL
initialized buffer as well as non-0 buffer size.

This patch adds a new argument types ARG_CONST_STACK_SIZE_OR_ZERO that
allows to also pass NULL as R<X> and 0 as R<X+1> into the helper function.
Such helper functions, of course, need to be able to handle these cases
internally then. Verifier guarantees that either R<X> == NULL && R<X+1> == 0
or R<X> != NULL && R<X+1> != 0 (like the case of ARG_CONST_STACK_SIZE), any
other combinations are not possible to load.

I went through various options of extending the verifier, and introducing
the type ARG_CONST_STACK_SIZE_OR_ZERO seems to have most minimal changes
needed to the verifier.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'geneve-vxlan-outer-checksum'

Alexander Duyck says:

====================
GENEVE/VXLAN: Enable outer Tx checksum by default

This patch series makes it so that we enable the outer Tx checksum for IPv4
tunnels by default.  This makes the behavior consistent with how we were
handling this for IPv6.  In addition I have updated the internal flags for
these tunnels so that we use a ZERO_CSUM_TX flag for IPv4 which should
match up will with the ZERO_CSUM6_TX flag which was already in use for
IPv6.

For most network devices this should be a net gain in terms of performance
as having the outer header checksum present allows for devices to report
CHECKSUM_UNNECESSARY which we can then convert to CHECKSUM_COMPLETE in order
to determine if the inner header checksum is valid.

Below is some data I collected with ixgbe with an X540 that demonstrates
this.  I located two PFs connected back to back in two different name
spaces and then setup a pair of tunnels on each, one with checksum enabled
and one without.

Recv   Send    Send                          Utilization
Socket Socket  Message  Elapsed              Send
Size   Size    Size     Time     Throughput  local
bytes  bytes   bytes    secs.    10^6bits/s  % S

noudpcsum:
87380  16384  16384    30.00      8898.67   12.80
udpcsum:
87380  16384  16384    30.00      9088.47   5.69

The one spot where this may cause a performance regression is if the
environment contains devices that can parse the inner headers and a device
supports NETIF_F_GSO_UDP_TUNNEL but not NETIF_F_GSO_UDP_TUNNEL_CSUM.  In
the case of such a device we have to fall back to using GSO to segment the
tunnel instead of TSO and as a result we may take a performance hit as seen
below with i40e.

Recv   Send    Send                          Utilization
Socket Socket  Message  Elapsed              Send
Size   Size    Size     Time     Throughput  local
bytes  bytes   bytes    secs.    10^6bits/s  % S

noudpcsum:
87380  16384  16384    30.00      9085.21   3.32
udpcsum:
87380  16384  16384    30.00      9089.23   5.54

In addition it will be necessary to update iproute2 so that we don't
provide the checksum attribute unless specified.  This way on older kernels
which don't have local checksum offload we will default to disabling the
outer checksum, and on newer kernels that have LCO we can default to
enabling it.

I also haven't investigated the effect this will have on OVS.  However I
suspect the impact should be minimal as the worst case scenario should be
that Tx checksumming will become enabled by default which should be
consistent with the existing behavior for IPv6.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

VXLAN: Support outer IPv4 Tx checksums by default

This change makes it so that if UDP CSUM is not specified we will default
to enabling it. The main motivation behind this is the fact that with the
use of outer checksum we can greatly improve the performance for VXLAN
tunnels on devices that don't know how to parse tunnel headers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

GENEVE: Support outer IPv4 Tx checksums by default

This change makes it so that if UDP CSUM is not specified we will default
to enabling it. The main motivation behind this is the fact that with the
use of outer checksum we can greatly improve the performance for GENEVE
tunnels on hardware that doesn't know how to parse them.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'lwt-autoload'

Robert Shearman says:

====================
lwtunnel: autoload of lwt modules

Changes since v1:
- remove "LWTUNNEL_ENCAP_" prefix for the string form of the encaps
   used when requesting the module to reduce duplication, and don't
   bother returning strings for lwt modules using netdevices, both
   suggested by Jiri.
- update commit message of first patch to clarify security
   implications, in response to Eric's comments.

The lwt implementations using net devices can autoload using the
existing mechanism using IFLA_INFO_KIND. However, there's no mechanism
that lwt modules not using net devices can use.

Therefore, these patches add the ability to autoload modules
registering lwt operations for lwt implementations not using a net
device so that users don't have to manually load the modules.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ila: autoload module

Avoid users having to manually load the module by adding a module
alias allowing it to be autoloaded by the lwt infra.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: autoload lwt module

Avoid users having to manually load the module by adding a module
alias allowing it to be autoloaded by the lwt infra.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lwtunnel: autoload of lwt modules

The lwt implementations using net devices can autoload using the
existing mechanism using IFLA_INFO_KIND. However, there's no mechanism
that lwt modules not using net devices can use.

Therefore, add the ability to autoload modules registering lwt
operations for lwt implementations not using a net device so that
users don't have to manually load the modules.

Only users with the CAP_NET_ADMIN capability can cause modules to be
loaded, which is ensured by rtnetlink_rcv_msg rejecting non-RTM_GETxxx
messages for users without this capability, and by
lwtunnel_build_state not being called in response to RTM_GETxxx
messages.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: turn on unicast filtering on vlan device

Currently vlan device inherits unicast filtering flag from underlying
device. If underlying device doesn't support unicast filter, this will
put vlan device into promiscuous mode when it's stacked.

Tun on IFF_UNICAST_FLT on the vlan device in any case so that it does
not go into promiscuous mode needlessly. If underlying device does not
support unicast filtering, that device will enter promiscuous mode.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

can: ifi: Add IFI CANFD IP support

The patch adds support for IFI CAN/FD controller [1]. This driver
currently supports sending and receiving both standard CAN and new
CAN/FD frames. Both ISO and BOSCH variant of CAN/FD is supported.

[1] http://www.ifi-pld.de/IP/CANFD/canfd.html

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: ifi: Add DT bindings for ifi,canfd

Add device tree bindings for the I/F/I CANFD controller IP core.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

of: Add vendor prefix for I/F/I

Add vendor prefix for I/F/I, Ingenieurbüro Für IC-Technologie
http://www.ifi-pld.de/

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: Makefile: Sort the Makefile

Just sort the drivers in the Makefile, no functional change.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: Kconfig: sort drivers alphabetically

Sort the drivers that are directly listed in the Kconfig alphabetically, no
functional change.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: Kconfig: Sort the Kconfig includes

Sort the Kconfig includes, no functional change.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000: of: add compatibility with Technologic Systems version

Technologic Systems provides an IP compatible with the SJA1000,
instantiated in an FPGA. Because of some bus widths issue, access to
registers is made through a "window" that works like this:

base + 0x0: address to read/write
base + 0x2: 8-bit register value

This commit adds a new compatible device, "technologic,sja1000", with
read and write functions using the window mechanism.

Signed-off-by: Damien Riegel <damien.riegel@savoirfairelinux.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000: add documentation for Technologic Systems version

This commit adds documentation for the Technologic Systems version of
SJA1000. The difference with the NXP version is in the way the registers
are accessed.

Signed-off-by: Damien Riegel <damien.riegel@savoirfairelinux.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: sja1000: of: add per-compatible init hook

This commit adds the capability to allocate and init private data
embedded in the sja1000_priv structure on a per-compatible basis. The
device node is passed as a parameter of the init callback to allow
parsing of custom device tree properties.

Signed-off-by: Damien Riegel <damien.riegel@savoirfairelinux.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge branch 'bpf-get-stackid'

Alexei Starovoitov says:

====================
bpf_get_stackid() and stack_trace map

This patch set introduces new map type to store stack traces and
corresponding bpf_get_stackid() helper.
BPF programs already can walk the stack via unrolled loop
of bpf_probe_read()s which is ok for simple analysis, but it's
not efficient and limited to <30 frames after that the programs
don't fit into MAX_BPF_STACK. With bpf_get_stackid() helper
the programs can collect up to PERF_MAX_STACK_DEPTH both
user and kernel frames.
Using stack traces as a key in a map turned out to be very useful
for generating flame graphs, off-cpu graphs, waker and chain graphs.
Patch 3 is a simplified version of 'offwaketime' tool which is
described in detail here:
http://brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html

Earlier version of this patch were using save_stack_trace() helper,
but 'unreliable' frames add to much noise and two equiavlent
stack traces produce different 'stackid's.
Using lockdep style of storing frames with MAX_STACK_TRACE_ENTRIES is
great for lockdep, but not acceptable for bpf, since the stack_trace
map needs to be freed when user Ctrl-C the tool.
The ftrace style with per_cpu(struct ftrace_stack) is great, but it's
tightly coupled with ftrace ring buffer and has the same 'unreliable'
noise. perf_event's perf_callchain() mechanism is also very efficient
and it only needed minor generalization which is done in patch 1
to be used by bpf stack_trace maps.
Peter, please take a look at patch 1.
If you're ok with it, I'd like to take the whole set via net-next.

Patch 1 - generalization of perf_callchain()
Patch 2 - stack_trace map done as lock-less hashtable without link list
  to avoid spinlock on insertion which is critical path when
  bpf_get_stackid() helper is called for every task switch event
Patch 3 - offwaketime example

After the patch the 'perf report' for artificial 'sched_bench'
benchmark that doing pthread_cond_wait/signal and 'offwaketime'
example is running in the background:
16.35%  swapper      [kernel.vmlinux]    [k] intel_idle
  2.18%  sched_bench  [kernel.vmlinux]    [k] __switch_to
  2.18%  sched_bench  libpthread-2.12.so  [.] pthread_cond_signal@@GLIBC_2.3.2
  1.72%  sched_bench  libpthread-2.12.so  [.] pthread_mutex_unlock
  1.53%  sched_bench  [kernel.vmlinux]    [k] bpf_get_stackid
  1.44%  sched_bench  [kernel.vmlinux]    [k] entry_SYSCALL_64
  1.39%  sched_bench  [kernel.vmlinux]    [k] __call_rcu.constprop.73
  1.13%  sched_bench  libpthread-2.12.so  [.] pthread_mutex_lock
  1.07%  sched_bench  libpthread-2.12.so  [.] pthread_cond_wait@@GLIBC_2.3.2
  1.07%  sched_bench  [kernel.vmlinux]    [k] hash_futex
  1.05%  sched_bench  [kernel.vmlinux]    [k] do_futex
  1.05%  sched_bench  [kernel.vmlinux]    [k] get_futex_key_refs.isra.13

The hotest part of bpf_get_stackid() is inlined jhash2, so we may consider
using some faster hash in the future, but it's good enough for now.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: offwaketime example

This is simplified version of Brendan Gregg's offwaketime:
This program shows kernel stack traces and task names that were blocked and
"off-CPU", along with the stack traces and task names for the threads that woke
them, and the total elapsed time from when they blocked to when they were woken
up. The combined stacks, task names, and total time is summarized in kernel
context for efficiency.

Example:
$ sudo ./offwaketime | flamegraph.pl > demo.svg
Open demo.svg in the browser as FlameGraph visualization.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: introduce BPF_MAP_TYPE_STACK_TRACE

add new map type to store stack traces and corresponding helper
bpf_get_stackid(ctx, map, flags) - walk user or kernel stack and return id
@ctx: struct pt_regs*
@map: pointer to stack_trace map
@flags: bits 0-7 - numer of stack frames to skip
        bit 8 - collect user stack instead of kernel
        bit 9 - compare stacks by hash only
        bit 10 - if two different stacks hash into the same stackid
                 discard old
        other bits - reserved
Return: >= 0 stackid on success or negative error

stackid is a 32-bit integer handle that can be further combined with
other data (including other stackid) and used as a key into maps.

Userspace will access stackmap using standard lookup/delete syscall commands to
retrieve full stack trace for given stackid.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

perf: generalize perf_callchain

. avoid walking the stack when there is no room left in the buffer
. generalize get_perf_callchain() to be called from bpf helper

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: use skb_postpush_rcsum instead of own implementations

Replace individual implementations with the recently introduced
skb_postpush_rcsum() helper.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tom Herbert <tom@herbertland.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: marvell/micrel: Fix Unpossible condition

commit 2b2427d06426 ("phy: micrel: Add ethtool statistics counters")
from Dec 30, 2015, leads to the following static checker
warning:

        drivers/net/phy/micrel.c:609 kszphy_get_stat()
        warn: unsigned 'val' is never less than zero.

drivers/net/phy/micrel.c
   602  static u64 kszphy_get_stat(struct phy_device *phydev, int i)
   603  {
   604          struct kszphy_hw_stat stat = kszphy_hw_stats[i];
   605          struct kszphy_priv *priv = phydev->priv;
   606          u64 val;
   607
   608          val = phy_read(phydev, stat.reg);
   609          if (val < 0) {
                    ^^^^^^^
Unpossible!

   610                  val = UINT64_MAX;
   611          } else {
   612                  val = val & ((1 << stat.bits) - 1);
   613                  priv->stats[i] += val;
   614                  val = priv->stats[i];
   615          }
   616
   617          return val;
   618  }

The same problem exists in the Marvell driver. Fix both.

Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Julia.Lawall <julia.lawall@lip6.fr>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ethtool-perqueue-params'

Kan Liang says:

====================
ethtool per queue parameters support

Modern network interface controllers usually support multiple receive
and transmit queues. Each queue may have its own parameters. For
example, Intel XL710/X710 hardware supports per queue interrupt
moderation. However, current ethtool does not support per queue
parameters option. User has to set parameters for the whole NIC.
This series extends ethtool to support per queue parameters option.

Since the support of per queue parameters vary with different cards,
it is impossible to address all cards in one patch. This series only
supports per queue coalesce options on i40e driver. The framework used
in the patch can be easily extended to other cards and parameters.

The lib bitmap needs to be extended to facilitate exchanging queue bitmaps
between user space and kernel space. Two patches from David's latest V8
patch series are also cited in this series. You may refer to
https://lkml.org/lkml/2016/2/9/919 for more details.

Changes since V6:
- Rebase on commit 76d13b568776. Did minor change in patch 6.

Changes since V5:
- Add test_bitmap.c and bitmap.sh in the series. They are forgot
   to be added previously.
- Update the first two patches to David's latest V8 version. The changes
   include
      - bitmap u32 API returns number of bits copied, unit tests updated
      - module_exit in test_bitmap
- Also change the mode of bitmap.sh to 755 according to Ben's suggestion

Changes since V4:
- Modify set/get_per_queue_coalesce function description
- Change the queue number to be u32
- Correct an error of calculating coalesce backup buffer address
- Rename queue_num to n_queues
- Don't log error message in __i40e_get_coalesce

Changes since V3:
- Based on David's lib bitmap.
- ETHTOOL_PERQUEUE should be handled before the containing switch
- Make the rollback code unconditional
- some minor changes according to Ben's feedback

Changes since V2:
- Add queue-specific settings for interrupt moderation in i40e

Changes since V1:
- Checking the sub-command number to determine whether the command
   requires CAP_NET_ADMIN
- Refine the struct ethtool_per_queue_op and improve the comments
- Use bitmap functions to parse queue mask
- Improve comments
- Use bitmap functions to parse queue mask
- Improve comments
- Add rollback support
- Correct the way to find the vector for specific queue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

i40e/ethtool: support coalesce setting by queue

This patch implements set_per_queue_coalesce for i40e driver.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e/ethtool: support coalesce getting by queue

This patch implements get_per_queue_coalesce for i40e driver.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: queue-specific settings for interrupt moderation

For i40e driver, each vector has its own ITR register. However, there
are no concept of queue-specific settings in the driver proper. Only
global variable is used to store ITR values. That will cause problems
especially when resetting the vector. The specific ITR values could be
lost.
This patch move rx_itr_setting and tx_itr_setting to i40e_ring to store
specific ITR register for each queue.
i40e_get_coalesce and i40e_set_coalesce are also modified accordingly to
support queue-specific settings. To make it compatible with old ethtool,
if user doesn't specify the queue number, i40e_get_coalesce will return
queue 0's value. While i40e_set_coalesce will apply value to all queues.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ethtool: support set coalesce per queue

This patch implements sub command ETHTOOL_SCOALESCE for ioctl
ETHTOOL_PERQUEUE. It introduces an interface set_per_queue_coalesce to
set coalesce of each masked queue to device driver. The wanted coalesce
information are stored in "data" for each masked queue, which can copy
from userspace.
If it fails to set coalesce to device driver, the value which already
set to specific queue will be tried to rollback.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ethtool: support get coalesce per queue

This patch implements sub command ETHTOOL_GCOALESCE for ioctl
ETHTOOL_PERQUEUE. It introduces an interface get_per_queue_coalesce to
get coalesce of each masked queue from device driver. Then the interrupt
coalescing parameters will be copied back to user space one by one.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ethtool: introduce a new ioctl for per queue setting

Introduce a new ioctl ETHTOOL_PERQUEUE for per queue parameters setting.
The following patches will enable some SUB_COMMANDs for per queue
setting.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

test_bitmap: unit tests for lib/bitmap.c

This is mainly testing bitmap construction and conversion to/from u32[]
for now.

Tested:
qemu i386, x86_64, ppc, ppc64 BE and LE, ARM.

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lib/bitmap.c: conversion routines to/from u32 array

Aimed at transferring bitmaps to/from user-space in a 32/64-bit agnostic
way.

Tested:
unit tests (next patch) on qemu i386, x86_64, ppc, ppc64 BE and LE,
ARM.

Signed-off-by: David Decotigny <decot@googlers.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: add software transmit timestamp support

Enable skb_tx_timestamp in hyperv netvsc.

Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: pass up EMSGSIZE msg for UDP socket in Ipv6

In ipv4, when the machine receives a ICMP_FRAG_NEEDED message, the
connected UDP socket will get EMSGSIZE message on its next read from the
socket.
However, this is not the case for ipv6.
This fix modifies the udp err handler in Ipv6 for ICMP6_PKT_TOOBIG to
make it similar to ipv4 behavior. That is when the machine gets an
ICMP6_PKT_TOOBIG message, the connected UDP socket will get EMSGSIZE
message on its next read from the socket.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: Fix pcie error recovery in case of NIC+RoCE adapters

Interrupts registered by RoCE driver are not unregistered when
msix interrupts are disabled during error recovery causing a
crash. Detach the adapter instance from RoCE driver when error
is detected to complete the cleanup. Attach the driver again after
the adapter is recovered from error.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: make magic-packet property generic

As requested by Rob Herring on patch
https://patchwork.ozlabs.org/patch/580862/.

This is a new property that it's still in net-next and has never been
used in production, so we are not breaking anything with the
incompatible binding change.

Signed-off-by: Sergio Prado <sergio.prado@e-labworks.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bridge-mdb-attrs'

Nikolay Aleksandrov says:

====================
bridge: mdb: add support for extended attributes

This small set allows to extend the per mdb entry exported attributes,
before this set we had only a structure exported which couldn't be changed
because we would've broken user-space, after this we extend the attribute
that was used for the structure and add per-mdb entry attributes after the
struct has been added (see patch 02 for more details). Note that the reason
we can't simply add an attribute after MDBA_MDB_ENTRY_INFO is that current
users (e.g. iproute2) walk over the attribute list directly without
checking for the attribute type.
Patch 01 is a simple change to reduce one indentation level in order to
avoid over 80 char lines.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: mdb: add support for more attributes and export timer

Currently mdb entries are exported directly as a structure inside
MDBA_MDB_ENTRY_INFO attribute, we can't really extend it without
breaking user-space. In order to export new mdb fields, I've converted
the MDBA_MDB_ENTRY_INFO into a nested attribute which starts like before
with struct br_mdb_entry (without header, as it's casted directly in
iproute2) and continues with MDBA_MDB_EATTR_ attributes. This way we
keep compatibility with older users and can export new data.
I've tested this with iproute2, both with and without support for the
added attribute and it works fine.
So basically we again have MDBA_MDB_ENTRY_INFO with struct br_mdb_entry
inside but it may contain also some additional MDBA_MDB_EATTR_ attributes
such as MDBA_MDB_EATTR_TIMER which can be parsed by user-space.

So the new structure is:
[MDBA_MDB] = {
     [MDBA_MDB_ENTRY] = {
         [MDBA_MDB_ENTRY_INFO]
         [MDBA_MDB_ENTRY_INFO] { <- Nested attribute
             struct br_mdb_entry <- nla_put_nohdr()
             [MDBA_MDB_ENTRY attributes] <- normal netlink attributes
         }
     }
}

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: mdb: reduce the indentation level in br_mdb_fill_info

Switch the port check and skip if it's null, this allows us to reduce one
indentation level.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: grab rcu read lock for bpf_percpu_hash_update

bpf_percpu_hash_update() expects rcu lock to be held and warns if it's not,
which pointed out a missing rcu read lock.

Fixes: 15a07b338 ("bpf: add lookup/update support for per-cpu hash and array maps")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-02-19

This series contains updates to i40e/i40evf only.

Alex Duyck splits up the descriptor count function from the function that
stops the ring to have access to the descriptor count used for the data
portion of the frame.  The rewrites the logic for how we determine if we
can transmit the frame or if it needs to be linearized.  Place the checksum
close to TSO since they have a lot in common and it can help to reduce the
decision tree for how to handle the frame as the first check in TSO is to
see if checksumming is offloaded.

Carolyn adds functions to blink leds on devices using 10GBaseT PHY since
MAC registers used in other designs do not work in this device configuration.
Fixes an issue where a previously removed message has returned.

Kevin increases the timeout when checking GLGEN_RSTAT_DEVSTATE bit since
linking with particular PHY types, the amount of time it takes for the
GLGEN_RSTAT_DEVSTATE to be set increases greatly.

Neerav changes the receive queues to not wait to be disabled before DCB
has been reconfigured, like transmit queues.

Anjali adds new register definitions for programming the parser, flow
director and RSS blocks in the hardware.

Shannon adds the new opcodes and structures used for asking the firmware
to update receive control registers that need extra care when being
accessed while under heavy traffic.  Integrates the new AdminQ functions
for safely accessing the receive control registers that may be affected
by heavy small packet traffic.

Mitch provides another colorful patch description on letting go of
the stale local VSI pointer when the VF resets.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

i40e/i40evf: Bump i40e to 1.4.25 and i40evf to 1.4.15

Bump.

Change-ID: Ifa19aadaa892ad103f1b96fe2361fa690912c6a3
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: let go of the past

If we reset a VF, its VSI goes away, and it gets a new one. So don't
hang on to the now-stale local VSI pointer. It just leads to suffering
and kernel panics.

Change-ID: Ia8823b4e85893e95e963acee284968022b29177a
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: suspend scheduling during driver unload

We need to suspend scheduling or any pending service task during driver
unload process, so that new task will not be scheduled. This patch sets
the suspend flag bit during reload which avoids service task execution.

Change-ID: I017c57b5d6656564556e3c5387da671369a572ac
Signed-off-by: Pandi Kumar Maharajan <pandi.maharajan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Use the new rx ctl register helpers. Don't use AQ calls from clear_hw.

Use the new AdminQ functions for safely accessing the Rx control
registers that may be affected by heavy small packet traffic.

We can't use AdminQ calls in i40e_clear_hw() because the HW is being
initialized and the AdminQ is not alive. We recently added an AQ
related replacement for reading PFLAN_QALLOC, and this patch puts
back the original register read.

Change-ID: Ib027168c954a5733299aa3a4ce5f8218c6bb5636
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: implement and use Rx CTL helper functions

Use the new AdminQ functions for safely accessing the Rx control
registers that may be affected by heavy small packet traffic.

Change-ID: Ibb00983e8dcba71f4b760222a609a5fcaa726f18
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: add adminq commands for Rx CTL registers

Add the new opcodes and struct used for asking the firmware to update Rx
control registers that need extra care when being accessed while under
heavy traffic - e.g. sustained 64byte packets at line rate on all ports.
The firmware will take extra steps to be sure the register accesses
are successful.

The registers involved are:
PFQF_CTL_0
PFQF_HENA
PFQF_FDALLOC
PFQF_HREGION
PFLAN_QALLOC
VPQF_CTL
VFQF_HENA
VFQF_HREGION
VSIQF_CTL
VSILAN_QBASE
VSILAN_QTABLE
VSIQF_TCREGION
PFQF_HKEY
VFQF_HKEY
PRTQF_CTL_0
GLFCOE_RCTL
GLFCOE_RSOF
GLQF_CTL
GLQF_SWAP
GLQF_HASH_MSK
GLQF_HASH_INSET
GLQF_HSYM
GLQF_FC_MSK
GLQF_FC_INSET
GLQF_FD_MSK
PRTQF_FD_INSET
PRTQF_FD_FLXINSET
PRTQF_FD_MSK

Change-ID: I56c8144000da66ad99f68948d8a184b2ec2aeb3e
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: add check for null VSI

Return from i40e_vsi_reinit_setup() if vsi param is NULL.
This makes this code consistent with all the other code that
checks for NULL before using one of the VSI pointers accessed
with an indexed variable. (Indexed VSI pointers are
intentionally set to NULL in i40e_vsi_clear() and
i40e_remove().

Change-ID: I3bc8b909c70fd2439334eeae994d151f61480985
Signed-off-by: John Underwood <johnx.underwood@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Expose some registers to program parser, FD and RSS logic

This patch adds 7 new register definitions for programming the
parser, flow director and RSS blocks in the HW.

Change-ID: I31e76673125275f3c69a14c646361919d04dc987
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix for unexpected messaging

This fixes an issue where a previously removed message
has returned. Changing the message type to dev_dbg
leaves the info, if desired, but takes it out of normal
everyday usage. Also changed call to only provide port
data when its valid and not when its not (delete case).

Change-ID: Ief6f33b915f6364c24fa8e5789c2fc3168b5e2ed
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Do not wait for Rx queue disable in DCB reconfig

Just like Tx queues don't wait for Rx queues to be disabled before
DCB has been reconfigured.
Check the queues are disabled only after the DCB configuration has
been applied to the VSI(s) managed by the PF driver.

In case of any timeout issue a PF reset to recover.

Change-ID: Ic51e94c25baf9a5480cee983f35d15575a88642c
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Increase timeout when checking GLGEN_RSTAT_DEVSTATE bit

When linking with particular PHY types (ex: copper PHY), the amount of
time it takes for the GLGEN_RSTAT_DEVSTATE to be set increases greatly,
which can lead to a timeout and failure to load the driver.

Change-ID: If02be0dfcd7c57fdde2d5c81cd63651260cd2029
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix led blink capability for 10GBaseT PHY

This patch fixes a problem where the ethtool identify adapter
functionality did not work for some copper PHY's. Without this
patch, the blink led functionality fails on some parts. This
patch adds PHY write code to blink led's on parts where this
functionality is contained in the PHY rather than the MAC.

Change-ID: Iee7b3453f61d5ffd0b3d03f720ee4f17f919fcc2
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Add functions to blink led on 10GBaseT PHY

This patch adds functions to blink led on devices using
10GBaseT PHY since MAC registers used in other designs
do not work in this device configuration.

Change-ID: Id4b88c93c649fd2b88073a00b42867a77c761ca3
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Move Tx checksum closer to TSO

On all of the other Intel drivers we place checksum close to TSO as they
have a significant amount in common and it can help to reduce the decision
tree for how to handle the frame as the first check in TSO is to see if
checksumming is offloaded, and if it is not we can skip _BOTH_ TSO and Tx
checksum offload based on a single check.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Rewrite logic for 8 descriptor per packet check

This patch is meant to rewrite the logic for how we determine if we can
transmit the frame or if it needs to be linearized.

The previous code for this function was using a mix of division and modulus
division as a part of computing if we need to take the slow path.  Instead
I have replaced this by simply working with a sliding window which will
tell us if the frame would be capable of causing a single packet to span
several descriptors.

The logic for the scan is fairly simple.  If any given group of 6 fragments
is less than gso_size - 1 then it is possible for us to have one byte
coming out of the first fragment, 6 fragments, and one or more bytes coming
out of the last fragment.  This gives us a total of 8 fragments
which exceeds what we can allow so we send such frames to be linearized.

Arguably the use of modulus might be more exact as the approach I propose
may generate some false positives.  However the likelihood of us taking much
of a hit for those false positives is fairly low, and I would rather not
add more overhead in the case where we are receiving a frame composed of 4K
pages.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Break up xmit_descriptor_count from maybe_stop_tx

In an upcoming patch I would like to have access to the descriptor count
used for the data portion of the frame. For this reason I am splitting up
the descriptor count function from the function that stops the ring.

Also in order to try and reduce unnecessary duplication of code I am moving
the slow-path portions of the code out of being inline calls so that we can
just jump to them and process them instead of having to build them into
each function that calls them.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-02-18

This series contains updates to i40e and i40evf only.

Alex Duyck provides all the patches in the series to update and fix the
drivers.  Fixed the driver to drop the outer checksum offload on UDP
tunnels, since the issue is that the upper levels of the stack never
requested such an offload and it results in possible errors.  Updates the
TSO function to just use u64 values, so we do not have to end up casting
u32 values.  In the TSO path, factored out the L4 header offsets allowing
us to ignore the L4 header offsets when dealing with the L3 checksum and
length update.  Consolidates all of the spots where we were updating
either the TCP or IP checksums in the TSO and checksum path into the TSO
function.  Fixed two issues by adding support for IPv4 encapsulated in
IPv6, first issue was the fact that iphdr(skb)->protocol was being used to
test for the outer transport protocol which breaks IPv6 support.  The second
was that we cleared the flag for v4 going to v6, but we did not take care
of txflags going the other way.  Added support for IPv6 extension headers
in setting up the Tx checksum.  Added exception handling to the Tx
checksum path so that we can handle cases of TSO where the frame is bad,
or Tx checksum where we did not recognize a protocol.  Fixed a number of
issues to make certain that we are using the correct protocols when
parsing both the inner and outer headers of a frame that is mixed between
IPv4 and IPv6 for inner and outer.  Updated the feature flags to reflect
the newly enabled/added features.

Sorry, no witty patch descriptions this time around, probably should
let Mitch help in writing patch descriptions for Alex. :-)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Add missing HSI for big-endian machines

Commit e5d3a51cefbb ("bnx2x: extend DCBx support") was missing HSI
changes for big-endian machine, breaking compilation on such
platforms.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: Add support for ATR w/ IPv6 extension headers

This patch updates the code for determining the L4 protocol and L3 header
length so that when IPv6 extension headers are being used we can determine
the offset and type of the L4 protocol.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: Update feature flags to reflect newly enabled features

Recent changes should have enabled support for IPv6 based tunnels and
support for TSO with outer UDP checksums.  As such we can update the
feature flags to reflect that.

In addition we can clean-up the flags that aren't needed such as SCTP and
RXCSUM since having the bits there doesn't add any value.

I also found one spot where we were setting the same flag twice.  It looks
like it was probably a git merge error that resulted in the line being
duplicated.  As such I have dropped it in this patch.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Update feature flags to reflect newly enabled features

Recent changes should have enabled support for IPv6 based tunnels and
support for TSO with outer UDP checksums. As such we can update the
feature flags to reflect that.

In addition we can clean-up the flags that aren't needed such as SCTP and
RXCSUM since having the bits there doesn't add any value.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Do not drop support for IPv6 VXLAN or GENEVE tunnels

All of the documentation in the datasheets for the XL710 do not call out
any reason to exclude support for IPv6 based tunnels. As such I am
dropping the code that was excluding these tunnel types from having their
port numbers recognized. This way we can take advantage of things such as
checksum offload for inner headers over IPv6 based VXLAN or GENEVE
tunnels.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix ATR in relation to tunnels

This patch contains a number of fixes to make certain that we are using
the correct protocols when parsing both the inner and outer headers of a
frame that is mixed between IPv4 and IPv6 for inner and outer.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Enable support for SKB_GSO_UDP_TUNNEL_CSUM

The XL722 has support for providing the outer UDP tunnel checksum on
transmits. Make use of this feature to support segmenting UDP tunnels with
outer checksums enabled.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Clean-up Rx packet checksum handling

This is mostly a minor clean-up for the Rx checksum path in order to avoid
some of the unnecessary conditional checks that were being applied.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'qed-vlan-filtering'

Yuval Mintz says:

====================
qed{,e}: Add vlan filtering offload

This series adds vlan filtering offload to qede.
First patch introduces small additional infrastructure needed in
qed to support it, while second contains the main bulk of driver changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qede: Add vlan filtering offload support

Device would start receiving only vlan-tagged traffic with tags matching
that of one of the configured vlan IDs, unless:
- Device is expliicly placed in PROMISC mode.
- Device exhausts its vlan filter credits.

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Lay infrastructure for vlan filtering offload

Today, interfaces are working in vlan-promisc mode; But once
vlan filtering offloaded would be supported, we'll need a method to
control it directly [e.g., when setting device to PROMISC, or when
running out of vlan credits].

This adds the necessary API for L2 client to manually choose whether to
accept all vlans or only those for which filters were configured.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83848: Fix sysfs naming collision warning

Files in sysfs are created using the name from the phy_driver struct,
when two names are the same we may get a duplicate filename warning,
fix this.

Reported-by: kernel test robot <ying.huang@linux.intel.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Optimize local checksum offload

This patch takes advantage of several assumptions we can make about the
headers of the frame in order to reduce overall processing overhead for
computing the outer header checksum.

First we can assume the entire header is in the region pointed to by
skb->head as this is what csum_start is based on.

Second, as a result of our first assumption, we can just call csum_partial
instead of making a call to skb_checksum which would end up having to
configure things so that we could walk through the frags list.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Annotate change of locking mechanism for np->opt

follows up commit 45f6fad84cc3 ("ipv6: add complete rcu protection around
np->opt") which added mixed rcu/refcount protection to np->opt.

Given the current implementation of rcu_pointer_handoff(), this has no
effect at runtime.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'iptunnel-pkt-scrub-consolidate'

Jiri Benc says:

====================
iptunnel: scrub packet in iptunnel_pull_header

As every IP tunnel has to scrub skb on decapsulation, iptunnel_pull_header
tried to do that and open coded part of skb_scrub_packet. Various tunneling
protocols (VXLAN, Geneve) then called full skb_scrub_packet on their own,
duplicating part of the scrubbing already done.

Consolidate the code, calling skb_scrub_packet from iptunnel_pull_header.
This will allow additional cleanups in VXLAN code, as the packet is scrubbed
early during rx processing after this patchset and VXLAN can start filling
out skb fields earlier.

The full picture of vxlan cleanup patches can be seen at:
https://github.com/jbenc/linux-vxlan/commits/master
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

iptunnel: scrub packet in iptunnel_pull_header

Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call
skb_scrub_packet directly instead.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: move vxlan device lookup before iptunnel_pull_header

This is in preparation for iptunnel_pull_header calling skb_scrub_packet.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

geneve: move geneve device lookup before iptunnel_pull_header

This is in preparation for iptunnel_pull_header calling skb_scrub_packet.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

geneve: implement geneve_get_sk_family helper

Similarly to the existing vxlan_get_sk_family.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: log port STP state on change

Remove the shared br_log_state function and print the info directly in
br_set_state, where the net_bridge_port state is actually changed.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-addr-sync'

Hariprasad Shenai says:

====================
cxgb4: Use __dev_[um]c_[un]sync for MAC address syncing

This patch series adds support to use __dev_uc_sync/__dev_mc_sync to add
MAC address and __dev_uc_unsync/__dev_mc_unsync to delete MAC address.

This patch series has been created against net-next tree and includes
patches on cxgb4 and cxgb4vf driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4vf: Use __dev_uc_sync/__dev_mc_sync to sync MAC address

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>