git.openwrt.org Git - openwrt/staging/blogic.git/log

Merge branch 'macb-next'

Michael Grzeschik says:

====================
net: macb: add error handling on probe and

This series adds more error handling to the macb driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: add of_node_put to error paths

We add the call of_node_put(bp->phy_node) to all associated error
paths for memory clean up.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: add of_phy_deregister_fixed_link to error paths

We add the call of_phy_deregister_fixed_link to all associated
error paths for memory clean up.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: fix GOP statistics loop start and stop conditions

GOP statistics from all ports of one instance of the driver are gathered
with one work recalled in loop in a workqueue. The loop is started when
a port is up, and stopped when a port is down. This last condition is
obviously wrong.

Fix this by having a work per port. This way, starting and stoping it
when the port is up or down will be fine, while minimizing unnecessary
CPU usage.

Fixes: 118d6298f6f0 ("net: mvpp2: add ethtool GOP statistics")
Reported-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-bug-fixes'

Lipeng says:

====================
net: hns3: Bug fixes & Code improvements in HNS3 driver

This patch-set introduces some bug fixes and code improvements.
As [patch 1/2] depends on the patch {5392902 net: hns3: Consistently using
GENMASK in hns3 driver}, which exists in net-next, not exists in net, so
push this serise to nex-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: cleanup mac auto-negotiation state query in hclge_update_speed_duplex

When checking whether auto-negotiation is on, driver only needs to
check the value of mac.autoneg(SW) directly, and does not need to
query it from hardware. Because this value is always synchronized
with the auto-negotiation state of hardware.

This patch removes mac auto-negotiation state query in
hclge_update_speed_duplex().

Fixes: 46a3df9f9718 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: fix a bug when getting phy address from NCL_config file

Driver gets phy address from NCL_config file and uses the phy address
to initialize phydev. There are 5 bits for phy address. And C22 phy
address has 5 bits. So 0-31 are all valid address for phy. If there
is no phy, it will crash. Because driver always get a valid phy address.

This patch fixes the phy address to 8 bits, and use 0xff to indicate
invalid phy address.

Fixes: 46a3df9f9718 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support)
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netlink: Update attr validation to require exact length for some types

Attributes using NLA_U* and NLA_S* (where * is 8, 16,32 and 64) are
expected to be an exact length. Split these data types from
nla_attr_minlen into nla_attr_len and update validate_nla to require
the attribute to have exact length for them.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6: sysctl to specify IPv6 ND traffic class

Add a per-device sysctl to specify the default traffic class to use for
kernel originated IPv6 Neighbour Discovery packets.

Currently this includes:

  - Router Solicitation (ICMPv6 type 133)
    ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Neighbour Solicitation (ICMPv6 type 135)
    ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Neighbour Advertisement (ICMPv6 type 136)
    ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr()

  - Redirect (ICMPv6 type 137)
    ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr()

and if the kernel ever gets around to generating RA's,
it would presumably also include:

  - Router Advertisement (ICMPv6 type 134)
    (radvd daemon could pick up on the kernel setting and use it)

Interface drivers may examine the Traffic Class value and translate
the DiffServ Code Point into a link-layer appropriate traffic
prioritization scheme.  An example of mapping IETF DSCP values to
IEEE 802.11 User Priority values can be found here:

    https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11

The expected primary use case is to properly prioritize ND over wifi.

Testing:
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  0
  jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  -bash: echo: write error: Invalid argument
  jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  -bash: echo: write error: Invalid argument
  jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  255
  jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  34

  jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
  jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1
  tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
  IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24)
  jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement,
  length 24, tgt is jzem22.pgc, Flags [solicited]

(based on original change written by Erik Kline, with minor changes)

v2: fix 'suspicious rcu_dereference_check() usage'
    by explicitly grabbing the rcu_read_lock.

Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ncsi: Don't return error on normal response

Several response handlers return EBUSY if the data corresponding to the
command/response pair is already set. There is no reason to return an
error here; the channel is advertising something as enabled because we
told it to enable it, and it's possible that the feature has been
enabled previously.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ncsi: Improve general state logging

The NCSI driver is mostly silent which becomes a headache when trying to
determine what has occurred on the NCSI connection. This adds additional
logging in a few key areas such as state transitions and calling out
certain errors more visibly.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpftool-show-filenames-of-pinned-objects'

Prashant Bhole says:

====================
tools: bpftool: show filenames of pinned objects

This patchset adds support to show pinned objects in object details.

Patch1 adds a funtionality to open a path in bpf-fs regardless of its object
type.

Patch2 adds actual functionality by scanning the bpf-fs once and adding
object information in hash table, with object id as a key. One object may be
associated with multiple paths because an object can be pinned multiple times

Patch3 adds command line option to enable this functionality. Making it optional
because scanning bpf-fs can be costly.
====================

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

tools: bpftool: optionally show filenames of pinned objects

Making it optional to show file names of pinned objects because
it scans complete bpf-fs filesystem which is costly.
Added option -f|--bpffs. Documentation updated.

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: show filenames of pinned objects

Added support to show filenames of pinned objects.

For example:

root@test# ./bpftool prog
3: tracepoint  name tracepoint__irq  tag f677a7dd722299a3
    loaded_at Oct 26/11:39  uid 0
    xlated 160B  not jited  memlock 4096B  map_ids 4
    pinned /sys/fs/bpf/softirq_prog

4: tracepoint  name tracepoint__irq  tag ea5dc530d00b92b6
    loaded_at Oct 26/11:39  uid 0
    xlated 392B  not jited  memlock 4096B  map_ids 4,6

root@test# ./bpftool --json --pretty prog
[{
        "id": 3,
        "type": "tracepoint",
        "name": "tracepoint__irq",
        "tag": "f677a7dd722299a3",
        "loaded_at": "Oct 26/11:39",
        "uid": 0,
        "bytes_xlated": 160,
        "jited": false,
        "bytes_memlock": 4096,
        "map_ids": [4
        ],
        "pinned": ["/sys/fs/bpf/softirq_prog"
        ]
    },{
        "id": 4,
        "type": "tracepoint",
        "name": "tracepoint__irq",
        "tag": "ea5dc530d00b92b6",
        "loaded_at": "Oct 26/11:39",
        "uid": 0,
        "bytes_xlated": 392,
        "jited": false,
        "bytes_memlock": 4096,
        "map_ids": [4,6
        ],
        "pinned": []
    }
]

root@test# ./bpftool map
4: hash  name start  flags 0x0
    key 4B  value 16B  max_entries 10240  memlock 1003520B
    pinned /sys/fs/bpf/softirq_map1
5: hash  name iptr  flags 0x0
    key 4B  value 8B  max_entries 10240  memlock 921600B

root@test# ./bpftool --json --pretty map
[{
        "id": 4,
        "type": "hash",
        "name": "start",
        "flags": 0,
        "bytes_key": 4,
        "bytes_value": 16,
        "max_entries": 10240,
        "bytes_memlock": 1003520,
        "pinned": ["/sys/fs/bpf/softirq_map1"
        ]
    },{
        "id": 5,
        "type": "hash",
        "name": "iptr",
        "flags": 0,
        "bytes_key": 4,
        "bytes_value": 8,
        "max_entries": 10240,
        "bytes_memlock": 921600,
        "pinned": []
    }
]

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: bpftool: open pinned object without type check

This was needed for opening any file in bpf-fs without knowing
its object type

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'BPF-directed-error-injection'

Josef Bacik says:

====================
Add the ability to do BPF directed error injection

I'm sending this through Dave since it'll conflict with other BPF changes in his
tree, but since it touches tracing as well Dave would like a review from
somebody on the tracing side.

v4->v5:
- disallow kprobe_override programs from being put in the prog map array so we
  don't tail call into something we didn't check.  This allows us to make the
  normal path still fast without a bunch of percpu operations.

v3->v4:
- fix a build error found by kbuild test bot (I didn't wait long enough
  apparently.)
- Added a warning message as per Daniels suggestion.

v2->v3:
- added a ->kprobe_override flag to bpf_prog.
- added some sanity checks to disallow attaching bpf progs that have
  ->kprobe_override set that aren't for ftrace kprobes.
- added the trace_kprobe_ftrace helper to check if the trace_event_call is a
  ftrace kprobe.
- renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read this
  value in the kprobe path, and thus only write to it if we're overriding or
  clearing the override.

v1->v2:
- moved things around to make sure that bpf_override_return could really only be
  used for an ftrace kprobe.
- killed the special return values from trace_call_bpf.
- renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
  it was being called from an ftrace kprobe context.
- reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
- updated the test as per Alexei's review.

- Original message -

A lot of our error paths are not well tested because we have no good way of
injecting errors generically.  Some subystems (block, memory) have ways to
inject errors, but they are random so it's hard to get reproduceable results.

With BPF we can add determinism to our error injection.  We can use kprobes and
other things to verify we are injecting errors at the exact case we are trying
to test.  This patch gives us the tool to actual do the error injection part.
It is very simple, we just set the return value of the pt_regs we're given to
whatever we provide, and then override the PC with a dummy function that simply
returns.

Right now this only works on x86, but it would be simple enough to expand to
other architectures.  Thanks,
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: add a test for bpf_override_return

This adds a basic test for bpf_override_return to verify it works. We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add a bpf_override_function helper

Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix incorrect comment with regard to VLAN packet handling

The commit bcc6d4790361 ("net: vlan: make non-hw-accel rx path similar
to hw-accel") unified accel and non-accel path for VLAN RX. With that
fix we do not register any packet_type handler for VLANs anymore, so fix
the incorrect comment.

Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'act_vlan-rcu'

Manish Kurup says:

====================
net_sched actions: act_vlan now uses RCU

This commit consists of 3 patches:

patch1 (1/3):
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

patch2 (2/3):
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing/referencing TC act_vlan private structures directly.

patch3 (3/3):
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.
All functions now use an RCU dereferenced pointer to access the VLAN action
context. Modified helper functions used by other modules, to use the RCU as
opposed to directly accessing the structure.

As part of this review, there were some changes suggested by reviewers.
I have incorporated all the changes that were requested.

Here're the changes:
v2: Fixed all helper functions to use RCU (rtnl_dereference) - Eric, Jamal
v2: Fixed indentation, extra line nits - Jamal, Jiri
v2: Moved rcu_head to the end of the struct - Jiri
v2: Re-formatted locals to reverse-christmas-tree - Jiri
v2: Removed mismatched spin_lock() - Cong
v2: Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should
    suffice - Cong, Jiri
v4: Modified the nfp flower action code to use the VLAN helper functions
    instead of referencing the structure directly. Isolated this into a
    separate patch - Pieter Jansen
v5: Got rid of the unlikely() for the allocation case - Simon Horman
v6: Had forgotten cleanup functions for RCU alloc, added them - Dave Miller
v7: Re-formatted more locals to reverse-christmas-tree - Pieter V
v8: Reverted reverse-christmas-tree(v7), not required when dependencies
    make it difficult to implement - Alexander D
v9: Cover letter subject change - Jamal
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

act_vlan: VLAN action rewrite to use RCU lock/unlock and update

Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.
All functions now use an RCU dereferenced pointer to access the VLAN action
context. Modified helper functions used by other modules, to use the RCU as
opposed to directly accessing the structure.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp flower action: Modified to use VLAN helper functions

Modified netronome nfp flower action to use VLAN helper functions instead
of accessing/referencing TC act_vlan private structures directly.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

act_vlan: Change stats update to use per-core stats

The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Manish Kurup <manish.kurup@verizon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: don't warn on successful change of MAC

Fixes: 535a61777f44e ("sfc: suppress handled MCDI failures when changing the MAC address")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: vxge: remove redundant assignments and pointers

There are several pointers that are being assigned but never read
so remove these as they are redundant. Also remove an assignment
to function_mode that is never read. Cleans up several clang
warnings:

vxge-main.c:1139:2: warning: Value stored to 'hldev' is never read
vxge-main.c:1294:2: warning: Value stored to 'hldev' is never read
vxge-main.c:2188:2: warning: Value stored to 'dev' is never read
vxge-main.c:2188:2: warning: Value stored to 'dev' is never read
vxge-main.c:2723:2: warning: Value stored to 'function_mode' is
never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ip_gre-flags-update'

Xin Long says:

====================
ip_gre: add support for i/o_flags update

ip_gre is using as many ip_tunnel apis as possible, newlink works
fine as gre would do it's own part in .ndo_init. But when changing
link, ip_tunnel_changelink doesn't even update i/o_flags, and also
the update of these flags would cause some other gre's properties
need to be updated or recalculated.

These two patch are to add i/o_flags update and then do adjustment
on some gre's properties according to the new i/o_flags.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ip_gre: add the support for i/o_flags update via ioctl

As patch 'ip_gre: add the support for i/o_flags update via netlink'
did for netlink, we also need to do the same job for these update
via ioctl.

This patch is to update i/o_flags and call ipgre_link_update to
recalculate these gre properties after ip_tunnel_ioctl does the
common update.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip_gre: add the support for i/o_flags update via netlink

Now ip_gre is using ip_tunnel_changelink to update it's properties, but
ip_tunnel_changelink in ip_tunnel doesn't update i/o_flags as a common
function.

o_flags updates would cause that tunnel->tun_hlen / hlen and dev->mtu /
needed_headroom need to be recalculated, and dev->(hw_)features need to
be updated as well.

Therefore, we can't just add the update into ip_tunnel_update called
in ip_tunnel_changelink, and it's also better not to touch ip_tunnel
codes.

This patch updates i/o_flags and calls ipgre_link_update to recalculate
these gre properties after ip_tunnel_changelink does the common update.

Note that since ipgre_link_update doesn't know the lower dev, it will
update gre->hlen, dev->mtu and dev->needed_headroom with the value of
'new tun_hlen - old tun_hlen'. In this way, we can avoid many redundant
codes, unlike ip6_gre.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-ns-rmem-wmem'

Eric Dumazet says:

====================
net: Namespace-ify sysctl_tcp_rmem and sysctl_tcp_wmem

We need to get per netns sysctl for sysctl_[proto]_rmem and sysctl_[proto]_wmem

This patch series adds the basic infrastructure allowing per proto
conversion, and takes care of TCP.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Namespace-ify sysctl_tcp_rmem and sysctl_tcp_wmem

Note that when a new netns is created, it inherits its
sysctl_tcp_rmem and sysctl_tcp_wmem from initial netns.

This change is needed so that we can refine TCP rcvbuf autotuning,
to take RTT into consideration.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: allow per netns sysctl_rmem and sysctl_wmem for protos

As we want to gradually implement per netns sysctl_rmem and sysctl_wmem
on per protocol basis, add two new fields in struct proto,
and two new helpers : sk_get_wmem0() and sk_get_rmem0()

First user will be TCP. Then UDP and SCTP can be easily converted,
while DECNET probably wont get this support.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Don't add vlans when vlan filtering is disabled

The software bridge can be build with vlan filtering support
included. However, by default it is turned off. In its turned off
state, it still passes VLANs via switchev, even though they are not to
be used. Don't pass these VLANs to the hardware. Only do so when vlan
filtering is enabled.

This fixes at least one corner case. There are still issues in other
corners, such as when vlan_filtering is later enabled.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2017-11-09' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-11-09

This series introduces vlan offloads related improvements for mlx5
ethernet netdev driver, from Gal Pressman.

- Add support for 802.1ad vlan filter
- Add support for 802.1ad vlan insertion
- Add vlan offloads statistics to ethtool (inserted/stripped vlans)
- CHECKSUM_COMPLETE support for vlan traffic when vlan stripping is off! (Finally)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'IGMP-snooping-for-local-traffic'

Andrew Lunn says:

====================
IGMP snooping for local traffic

The linux bridge supports IGMP snooping. It will listen to IGMP
reports on bridge ports and keep track of which groups have been
joined on an interface. It will then forward multicast based on this
group membership.

When the bridge adds or removed groups from an interface, it uses
switchdev to request the hardware add an mdb to a port, so the
hardware can perform the selective forwarding between ports.

What is not covered by the current bridge code, is IGMP joins/leaves
from the host on the brX interface. These are not reported via
switchdev so that hardware knows the local host is interested in the
multicast frames.

Luckily, the bridge does track joins/leaves on the brX interface. The
code is obfusticated, which is why i missed it with my first attempt.
So the first patch tries to remove this obfustication. Currently,
there is no notifications sent when the bridge interface joins a
group. The second patch adds them. bridge monitor then shows
joins/leaves in the same way as for other ports of the bridge.

Then starts the work passing down to the hardware that the host has
joined/left a group. The existing switchdev mdb object cannot be used,
since the semantics are different. The existing
SWITCHDEV_OBJ_ID_PORT_MDB is used to indicate a specific multicast
group should be forwarded out that port of the switch. However here we
require the exact opposite. We want multicast frames for the group
received on the port to the forwarded to the host. Hence add a new
object SWITCHDEV_OBJ_ID_HOST_MDB, a multicast database entry to
forward to the host. This new object is then propagated through the
DSA layers. No DSA driver changes should be needed, this should just
work...

This version fixes up the nitpick from Nikolay, removes an unrelated
white space change, and adds in a patch adding a few const attributes
to a couple of functions taking a port parameter, in order to stop the
following patch produces warnings.
====================

Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: switch: Don't add CPU port to an mdb by default

Now that the host indicates when a multicast group should be forwarded
from the switch to the host, don't do it by default.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: add more const attributes

The notify mechanism does not need to modify the port it is notifying.
So make the parameter const.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: slave: Handle switchdev host mdb add/del

Add code to handle switchdev host mdb add/del. Since DSA uses one of
the switch ports as a transport to the host, we just need to add an
MDB on this port.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Add/del switchdev object on host join/leave

When the host joins or leaves a multicast group, use switchdev to add
an object to the hardware to forward traffic for the group to the
host.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Send notification when host join/leaves a group

The host can join or leave a multicast group on the brX interface, as
indicated by IGMP snooping. This is tracked within the bridge
multicast code. Send a notification when this happens, in the same way
a notification is sent when a port of the bridge joins/leaves a group
because of IGMP snooping.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: Rename mglist to host_joined

The boolean mglist indicates the host has joined a particular
multicast group on the bridge interface. It is badly named, obscuring
what is means. Rename it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/davem/net

Simple cases of overlapping changes in the packet scheduler.

Must easier to resolve this time.

Which probably means that I screwed it up somehow.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'pm-final-4.14' of git://git./linux/kernel/git/rafael/linux-pm

Pull final power management fixes from Rafael Wysocki:
"These fix a regression in the schedutil cpufreq governor introduced by
  a recent change and blacklist Dell XPS13 9360 from using the Low Power
  S0 Idle _DSM interface which triggers serious problems on one of these
  machines.

  Specifics:

   - Prevent the schedutil cpufreq governor from using the utilization
     of a wrong CPU in some cases which started to happen after one of
     the recent changes in it (Chris Redpath).

   - Blacklist Dell XPS13 9360 from using the Low Power S0 Idle _DSM
     interface as that causes serious issue (related to NVMe) to appear
     on one of these machines, even though the other Dells XPS13 9360 in
     somewhat different HW configurations behave correctly (Rafael
     Wysocki)"

* tag 'pm-final-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / PM: Blacklist Low Power S0 Idle _DSM for Dell XPS13 9360
  cpufreq: schedutil: Examine the correct CPU when we update util

Merge tag 'sound-4.14' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"The amount of the changes isn't as quite small as wished, nevertheless
  they are straight fixes that deserve merging to 4.14 final.

  Most of fixes are about ALSA core bugs spotted by fuzzer: a follow-up
  fix for the previous nested rwsem patch, a fix to avoid the resource
  hogs due to too many concurrent ALSA timer invocations, and a fix for
  a crash with SYSEX MIDI transfer over OSS sequencer emulation that is
  used by none but fuzzer.

  The rest are usual HD-audio and USB-audio device-specific quirks,
  which are safe to apply"

* tag 'sound-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - fix headset mic problem for Dell machines with alc274
  ALSA: seq: Fix OSS sysex delivery in OSS emulation
  ALSA: seq: Avoid invalid lockdep class warning
  ALSA: timer: Limit max instances per timer
  ALSA: usb-audio: support new Amanero Combo384 firmware version

Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix use-after-free in IPSEC input parsing, desintation address
    pointer was loaded before pskb_may_pull() which can change the SKB
    data pointers. From Florian Westphal.

2) Stack out-of-bounds read in xfrm_state_find(), from Steffen
    Klassert.

3) IPVS state of SKB is not properly reset when moving between
    namespaces, from Ye Yin.

4) Fix crash in asix driver suspend and resume, from Andrey Konovalov.

5) Don't deliver ipv6 l2tp tunnel packets to ipv4 l2tp tunnels, and
    vice versa, from Guillaume Nault.

6) Fix DSACK undo on non-dup ACKs, from Priyaranjan Jha.

7) Fix regression in bond_xmit_hash()'s behavior after the TCP port
    selection changes back in 4.2, from Hangbin Liu.

8) Two divide by zero bugs in USB networking drivers when parsing
    descriptors, from Bjorn Mork.

9) Fix bonding slaves being stuck in BOND_LINK_FAIL state, from Jay
    Vosburgh.

10) Missing skb_reset_mac_header() in qmi_wwan, from Kristian Evensen.

11) Fix the destruction of tc action object races properly, from Cong
    Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
  cls_u32: use tcf_exts_get_net() before call_rcu()
  cls_tcindex: use tcf_exts_get_net() before call_rcu()
  cls_rsvp: use tcf_exts_get_net() before call_rcu()
  cls_route: use tcf_exts_get_net() before call_rcu()
  cls_matchall: use tcf_exts_get_net() before call_rcu()
  cls_fw: use tcf_exts_get_net() before call_rcu()
  cls_flower: use tcf_exts_get_net() before call_rcu()
  cls_flow: use tcf_exts_get_net() before call_rcu()
  cls_cgroup: use tcf_exts_get_net() before call_rcu()
  cls_bpf: use tcf_exts_get_net() before call_rcu()
  cls_basic: use tcf_exts_get_net() before call_rcu()
  net_sched: introduce tcf_exts_get_net() and tcf_exts_put_net()
  Revert "net_sched: hold netns refcnt for each action"
  net: usb: asix: fill null-ptr-deref in asix_suspend
  Revert "net: usb: asix: fill null-ptr-deref in asix_suspend"
  qmi_wwan: Add missing skb_reset_mac_header-call
  bonding: fix slave stuck in BOND_LINK_FAIL state
  qrtr: Move to postcore_initcall
  net: qmi_wwan: fix divide by 0 on bad descriptors
  net: cdc_ether: fix divide by 0 on bad descriptors
  ...

ALSA: hda - fix headset mic problem for Dell machines with alc274

Confirmed with Kailang of Realtek, the pin 0x19 is for Headset Mic, and
the pin 0x1a is for Headphone Mic, he suggested to apply
ALC269_FIXUP_DELL1_MIC_NO_PRESENCE to fix this problem. And we
verified applying this FIXUP can fix this problem.

Cc: <stable@vger.kernel.org>
Cc: Kailang Yang <kailang@realtek.com>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

net/mlx5e: CHECKSUM_COMPLETE offload for VLAN/QinQ packets

When the VLAN tag is present in the packet buffer (i.e VLAN stripping disabled, QinQ)
the driver will currently report CHECKSUM_UNNECESSARY.
Instead of using CHECKSUM_COMPLETE offload for packets with first
ethertype of IPv4/6, use it for packets with last ethertype of IPv4/6 to
cover the former cases as well.

The checksum field present in the CQE is calculated from the IP header
until the end of the packet. When the first ethertype is different than
IPv4/6 (for ex. 802.1Q VLAN) a checksum of the VLAN header/s should be
added. The small header/s checksum calculation will allow us to use
CHECKSUM_COMPLETE instead of CHECKSUM_UNNECESSARY.

Testing bandwidth of one and 8 TCP streams to a single RQ,
LRO and VLAN stripping offloads disabled:
CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
NIC: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

Before:
+--------------+--------------------+---------------------+----------------------+
| Traffic type | 1 Stream BW [Mbps] | 8 Streams BW [Mbps] |   Checksum offload   |
+--------------+--------------------+---------------------+----------------------+
| Untagged     |          28,247.35 |           24,716.88 | CHECKSUM_COMPLETE    |
| VLAN         |          27,516.69 |           23,752.26 | CHECKSUM_UNNECESSARY |
| QinQ         |           6,961.30 |           20,667.04 | CHECKSUM_UNNECESSARY |
+--------------+--------------------+---------------------+----------------------+

Now:
+--------------+--------------------+---------------------+-------------------+
| Traffic type | 1 Stream BW [Mbps] | 8 Streams BW [Mbps] | Checksum offload  |
+--------------+--------------------+---------------------+-------------------+
| Untagged     |          28,521.28 |           24,926.32 | CHECKSUM_COMPLETE |
| VLAN         |          27,389.37 |           23,715.34 | CHECKSUM_COMPLETE |
| QinQ         |           6,901.77 |           20,845.73 | CHECKSUM_COMPLETE |
+--------------+--------------------+---------------------+-------------------+

No performance degradation observed.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add VLAN offloads statistics

The following counters are now exposed through ethtool -S:
rx[i]_removed_vlan_packets (per channel)
rx_removed_vlan_packets
tx[i]_added_vlan_packets (per channel)
tx_added_vlan_packets

rx_removed_vlan_packets: The number of packets that had their
outer VLAN header stripped to the CQE by the hardware.
tx_added_vlan_packets: The number of packets that had their
outer VLAN header inserted by the hardware.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add 802.1ad VLAN insertion support

Report VLAN insertion support for S-tagged packets and add support by
choosing the correct VLAN type in the WQE.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add 802.1ad VLAN filter steering rules

When a user chooses to use 802.1ad VLAN the proper steering rules will
be added to the VLAN flow table (matching the specific S-tag VID).
Due to current hardware limitation, when using 802.1ad, we must disable
C-tag VLAN stripping on the RQs.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Declare bitmap using kernel macro

Replace explicit declaration of bitmap with DECLARE_BITMAP kernel macro.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net: Introduce netdev_*_once functions

Extend the net device error logging with netdev_*_once macros.
netdev_*_once are the equivalents of the dev_*_once macros which are
useful for messages that should only be logged once.

Also add netdev_WARN_ONCE, which is the "once" extension for the already
existing netdev_WARN macro.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add rollback on add VLAN failure

When add VLAN rule fails the active vlan bit should be cleared.

Fixes: afb736e9330a ("net/mlx5: Ethernet resource handling files")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Rename VLAN related variables and functions

Rename VLAN related symbols to better reflect the fact that they
are associated to C-tag VLAN.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2017-11-09

1) Fix a use after free due to a reallocated skb head.
   From Florian Westphal.

2) Fix sporadic lookup failures on labeled IPSEC.
   From Florian Westphal.

3) Fix a stack out of bounds when a socket policy is applied
   to an IPv6 socket that sends IPv4 packets.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-sched-race-fix'

Cong Wang says:

====================
net_sched: close the race between call_rcu() and cleanup_net()

This patchset tries to fix the race between call_rcu() and
cleanup_net() again. Without holding the netns refcnt the
tc_action_net_exit() in netns workqueue could be called before
filter destroy works in tc filter workqueue. This patchset
moves the netns refcnt from tc actions to tcf_exts, without
breaking per-netns tc actions.

Patch 1 reverts the previous fix, patch 2 introduces two new
API's to help to address the bug and the rest patches switch
to the new API's. Please see each patch for details.

I was not able to reproduce this bug, but now after adding
some delay in filter destroy work I manage to trigger the
crash. After this patchset, the crash is not reproducible
any more and the debugging printk's show the order is expected
too.
====================

Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Reported-by: Lucas Bates <lucasb@mojatatu.com>
Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_u32: use tcf_exts_get_net() before call_rcu()

Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.

Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_tcindex: use tcf_exts_get_net() before call_rcu()

Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.

Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_rsvp: use tcf_exts_get_net() before call_rcu()

Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.

Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_route: use tcf_exts_get_net() before call_rcu()

Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.

Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_matchall: use tcf_exts_get_net() before call_rcu()

Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_fw: use tcf_exts_get_net() before call_rcu()

Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.

Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_flower: use tcf_exts_get_net() before call_rcu()

Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.

Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_flow: use tcf_exts_get_net() before call_rcu()

Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.

Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_cgroup: use tcf_exts_get_net() before call_rcu()

Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.

Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_bpf: use tcf_exts_get_net() before call_rcu()

Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.

Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_basic: use tcf_exts_get_net() before call_rcu()

Hold netns refcnt before call_rcu() and release it after
the tcf_exts_destroy() is done.

Note, on ->destroy() path we have to respect the return value
of tcf_exts_get_net(), on other paths it should always return
true, so we don't need to care.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: introduce tcf_exts_get_net() and tcf_exts_put_net()

Instead of holding netns refcnt in tc actions, we can minimize
the holding time by saving it in struct tcf_exts instead. This
means we can just hold netns refcnt right before call_rcu() and
release it after tcf_exts_destroy() is done.

However, because on netns cleanup path we call tcf_proto_destroy()
too, obviously we can not hold netns for a zero refcnt, in this
case we have to do cleanup synchronously. It is fine for RCU too,
the caller cleanup_net() already waits for a grace period.

For other cases, refcnt is non-zero and we can safely grab it as
normal and release it after we are done.

This patch provides two new API for each filter to use:
tcf_exts_get_net() and tcf_exts_put_net(). And all filters now can
use the following pattern:

void __destroy_filter() {
  tcf_exts_destroy();
  tcf_exts_put_net();  // <== release netns refcnt
  kfree();
}
void some_work() {
  rtnl_lock();
  __destroy_filter();
  rtnl_unlock();
}
void some_rcu_callback() {
  tcf_queue_work(some_work);
}

if (tcf_exts_get_net())  // <== hold netns refcnt
  call_rcu(some_rcu_callback);
else
  __destroy_filter();

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "net_sched: hold netns refcnt for each action"

This reverts commit ceffcc5e254b450e6159f173e4538215cebf1b59.
If we hold that refcnt, the netns can never be destroyed until
all actions are destroyed by user, this breaks our netns design
which we expect all actions are destroyed when we destroy the
whole netns.

Cc: Lucas Bates <lucasb@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-setup-stage'

Vivien Didelot says:

====================
net: dsa: setup stage

When probing a DSA switch, there is basically two stages.

The first stage is the parsing of the switch device, from either device
tree or platform data. It fetches the DSA tree to which it belongs, and
validates its ports. The switch device is then added to the tree, and
the second stage is called if this was the last switch of the tree.

The second stage is the setup of the tree, which validates that the tree
is complete, sets up the routing tables, the default CPU port for user
ports, sets up the switch drivers and finally the master interfaces,
which makes the whole switch fabric functional.

This patch series covers the second setup stage. The setup and teardown
of a switch tree have been separated into logical steps, and the probing
of a switch now simply parses and adds a switch to a tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: rename probe and remove switch functions

This commit brings no functional changes. It gets rid of the underscore
prefixed _dsa_register_switch and _dsa_unregister_switch functions in
favor of dsa_switch_probe() which parses and adds a switch to a tree and
dsa_switch_remove() which removes a switch from a tree.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: setup a tree when adding a switch to it

Now that the tree setup is centralized, we can simplify the code a bit
more by setting up or tearing down the tree directly when adding or
removing a switch to/from it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: setup routing table

The *_complete() functions take too much arguments to do only one thing:
they try to fetch the dsa_port structures corresponding to device nodes
under the "link" list property of DSA ports, and use them to setup the
routing table of switches.

This patch simplifies them by providing instead simpler
dsa_{port,switch,tree}_setup_routing_table functions which return a
boolean value, true if the tree is complete.

dsa_tree_setup_routing_table is called inside dsa_tree_setup which
simplifies the switch registering function as well.

A switch's routing table is now initialized before its setup.

This also makes dsa_port_is_valid obsolete, remove it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: use of_for_each_phandle

The OF code provides a of_for_each_phandle() helper to iterate over
phandles. Use it instead of arbitrary iterating ourselves over the list
of phandles hanging to the "link" property of the port's device node.

The of_phandle_iterator_next() helper calls of_node_put() itself on
it.node. Thus We must only do it ourselves if we break the loop.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: add find port by node helper

Instead of having two dsa_ds_find_port_dn (which returns a bool) and
dsa_dst_find_port_dn (which returns a switch) functions, provide a more
explicit dsa_tree_find_port_by_node function which returns a matching
port.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: setup and teardown ports

The dsa_dsa_port_apply and dsa_cpu_port_apply functions do exactly the
same. The dsa_user_port_apply function does not try to register a fixed
link but try to create a slave.

This commit factorizes and scopes all that in two convenient
dsa_port_setup and dsa_port_teardown functions.

It won't hurt to register a devlink_port for unused port as well.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: setup and teardown switches

This patches brings no functional changes. It removes the unused dst
argument from the dsa_ds_apply and dsa_ds_unapply functions, rename them
to dsa_switch_setup and dsa_switch_teardown for a more explicit scope.

This clarifies the steps of the setup or teardown of a switch fabric.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: setup and teardown tree

This commit provides better scope for the DSA tree setup and teardown
functions. It renames the "applied" bool to "setup" and print a message
when the tree is setup, as it is done during teardown.

At the same time, check dst->setup in dsa_tree_setup, where it is set to
true.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: setup and teardown master device

Add DSA helpers to setup and teardown a master net device wired to its
CPU port. This centralizes the dsa_ptr assignment.

This also makes the master ethtool helpers static at the same time.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: setup and teardown default CPU port

The dsa_dst_parse function called just before dsa_dst_apply does not
parse the tree but does only one thing: it assigns the default CPU port
to dst->cpu_dp and to each user ports.

This patch simplifies this by calling a dsa_tree_setup_default_cpu
function at the beginning of dsa_dst_apply directly.

A dsa_port_is_user helper is added for convenience.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: constify cpu_dp member of dsa_port

A DSA port has a dedicated CPU port assigned to it, stored in the cpu_dp
member. It is not meant to be modified by a port, thus make it const.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: usb: asix: fill null-ptr-deref in asix_suspend

When asix_suspend() is called dev->driver_priv might not have been
assigned a value, so we need to check that it's not NULL.

Similar issue is present in asix_resume(), this patch fixes it as well.

Found by syzkaller.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
Modules linked in:
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc4-43422-geccacdd69a8c #400
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
task: ffff88006bb36300 task.stack: ffff88006bba8000
RIP: 0010:asix_suspend+0x76/0xc0 drivers/net/usb/asix_devices.c:629
RSP: 0018:ffff88006bbae718 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff880061ba3b80 RCX: 1ffff1000c34d644
RDX: 0000000000000001 RSI: 0000000000000402 RDI: 0000000000000008
RBP: ffff88006bbae738 R08: 1ffff1000d775cad R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800630a8b40
R13: 0000000000000000 R14: 0000000000000402 R15: ffff880061ba3b80
FS: 0000000000000000(0000) GS:ffff88006c600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff33cf89000 CR3: 0000000061c0a000 CR4: 00000000000006f0
Call Trace:
usb_suspend_interface drivers/usb/core/driver.c:1209
usb_suspend_both+0x27f/0x7e0 drivers/usb/core/driver.c:1314
usb_runtime_suspend+0x41/0x120 drivers/usb/core/driver.c:1852
__rpm_callback+0x339/0xb60 drivers/base/power/runtime.c:334
rpm_callback+0x106/0x220 drivers/base/power/runtime.c:461
rpm_suspend+0x465/0x1980 drivers/base/power/runtime.c:596
__pm_runtime_suspend+0x11e/0x230 drivers/base/power/runtime.c:1009
pm_runtime_put_sync_autosuspend ./include/linux/pm_runtime.h:251
usb_new_device+0xa37/0x1020 drivers/usb/core/hub.c:2487
hub_port_connect drivers/usb/core/hub.c:4903
hub_port_connect_change drivers/usb/core/hub.c:5009
port_event drivers/usb/core/hub.c:5115
hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195
process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
worker_thread+0x221/0x1850 kernel/workqueue.c:2253
kthread+0x3a1/0x470 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
Code: 8d 7c 24 20 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5b 48 b8 00 00
00 00 00 fc ff df 4d 8b 6c 24 20 49 8d 7d 08 48 89 fa 48 c1 ea 03 <80>
3c 02 00 75 34 4d 8b 6d 08 4d 85 ed 74 0b e8 26 2b 51 fd 4c
RIP: asix_suspend+0x76/0xc0 RSP: ffff88006bbae718
---[ end trace dfc4f5649284342c ]---

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "net: usb: asix: fill null-ptr-deref in asix_suspend"

This reverts commit baedf68a068ca29624f241426843635920f16e1d.

There is an updated version of this fix which covers
the problem more thoroughly.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'pm-cpufreq-sched'

* pm-cpufreq-sched:
cpufreq: schedutil: Examine the correct CPU when we update util

x86/mm: Unbreak modules that rely on external PAGE_KERNEL availability

Commit 7744ccdbc16f0 ("x86/mm: Add Secure Memory Encryption (SME)
support") as a side-effect made PAGE_KERNEL all of a sudden unavailable
to modules which can't make use of EXPORT_SYMBOL_GPL() symbols.

This is because once SME is enabled, sme_me_mask (which is introduced as
EXPORT_SYMBOL_GPL) makes its way to PAGE_KERNEL through _PAGE_ENC,
causing imminent build failure for all the modules which make use of all
the EXPORT-SYMBOL()-exported API (such as vmap(), __vmalloc(),
remap_pfn_range(), ...).

Exporting (as EXPORT_SYMBOL()) interfaces (and having done so for ages)
that take pgprot_t argument, while making it impossible to -- all of a
sudden -- pass PAGE_KERNEL to it, feels rather incosistent.

Restore the original behavior and make it possible to pass PAGE_KERNEL
to all its EXPORT_SYMBOL() consumers.

[ This is all so not wonderful. We shouldn't need that "sme_me_mask"
  access at all in all those places that really don't care about that
  level of detail, and just want _PAGE_KERNEL or whatever.

  We have some similar issues with _PAGE_CACHE_WP and _PAGE_NOCACHE,
  both of which hide a "cachemode2protval()" call, and which also ends
  up using another EXPORT_SYMBOL(), but at least that only triggers for
  the much more rare cases.

  Maybe we could move these dynamic page table bits to be generated much
  deeper down in the VM layer, instead of hiding them in the macros that
  everybody uses.

  So this all would merit some cleanup. But not today.   - Linus ]

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Despised-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'fixes-v4.14-rc8' of git://git./linux/kernel/git/jmorris/linux-security

Pull key handling fix from James Morris:
"Fix by Eric Biggers for the keys subsystem"

* 'fixes-v4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
KEYS: fix NULL pointer dereference during ASN.1 parsing [ver #2]

apparmor: fix off-by-one comparison on MAXMAPPED_SIG

This came in yesterday, and I have verified our regression tests
were missing this and it can cause an oops. Please apply.

There is a an off-by-one comparision on sig against MAXMAPPED_SIG
that can lead to a read outside the sig_map array if sig
is MAXMAPPED_SIG. Fix this.

Verified that the check is an out of bounds case that can cause an oops.

Revised: add comparison fix to second case
Fixes: cd1dbf76b23d ("apparmor: add the ability to mediate signals")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

KEYS: fix NULL pointer dereference during ASN.1 parsing [ver #2]

syzkaller reported a NULL pointer dereference in asn1_ber_decoder().  It
can be reproduced by the following command, assuming
CONFIG_PKCS7_TEST_KEY=y:

        keyctl add pkcs7_test desc '' @s

The bug is that if the data buffer is empty, an integer underflow occurs
in the following check:

        if (unlikely(dp >= datalen - 1))
                goto data_overrun_error;

This results in the NULL data pointer being dereferenced.

Fix it by checking for 'datalen - dp < 2' instead.

Also fix the similar check for 'dp >= datalen - n' later in the same
function.  That one possibly could result in a buffer overread.

The NULL pointer dereference was reproducible using the "pkcs7_test" key
type but not the "asymmetric" key type because the "asymmetric" key type
checks for a 0-length payload before calling into the ASN.1 decoder but
the "pkcs7_test" key type does not.

The bug report was:

    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: asn1_ber_decoder+0x17f/0xe60 lib/asn1_decoder.c:233
    PGD 7b708067 P4D 7b708067 PUD 7b6ee067 PMD 0
    Oops: 0000 [#1] SMP
    Modules linked in:
    CPU: 0 PID: 522 Comm: syz-executor1 Not tainted 4.14.0-rc8 #7
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.3-20171021_125229-anatol 04/01/2014
    task: ffff9b6b3798c040 task.stack: ffff9b6b37970000
    RIP: 0010:asn1_ber_decoder+0x17f/0xe60 lib/asn1_decoder.c:233
    RSP: 0018:ffff9b6b37973c78 EFLAGS: 00010216
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000021c
    RDX: ffffffff814a04ed RSI: ffffb1524066e000 RDI: ffffffff910759e0
    RBP: ffff9b6b37973d60 R08: 0000000000000001 R09: ffff9b6b3caa4180
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    FS:  00007f10ed1f2700(0000) GS:ffff9b6b3ea00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000007b6f3000 CR4: 00000000000006f0
    Call Trace:
     pkcs7_parse_message+0xee/0x240 crypto/asymmetric_keys/pkcs7_parser.c:139
     verify_pkcs7_signature+0x33/0x180 certs/system_keyring.c:216
     pkcs7_preparse+0x41/0x70 crypto/asymmetric_keys/pkcs7_key_type.c:63
     key_create_or_update+0x180/0x530 security/keys/key.c:855
     SYSC_add_key security/keys/keyctl.c:122 [inline]
     SyS_add_key+0xbf/0x250 security/keys/keyctl.c:62
     entry_SYSCALL_64_fastpath+0x1f/0xbe
    RIP: 0033:0x4585c9
    RSP: 002b:00007f10ed1f1bd8 EFLAGS: 00000216 ORIG_RAX: 00000000000000f8
    RAX: ffffffffffffffda RBX: 00007f10ed1f2700 RCX: 00000000004585c9
    RDX: 0000000020000000 RSI: 0000000020008ffb RDI: 0000000020008000
    RBP: 0000000000000000 R08: ffffffffffffffff R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000216 R12: 00007fff1b2260ae
    R13: 00007fff1b2260af R14: 00007f10ed1f2700 R15: 0000000000000000
    Code: dd ca ff 48 8b 45 88 48 83 e8 01 4c 39 f0 0f 86 a8 07 00 00 e8 53 dd ca ff 49 8d 46 01 48 89 85 58 ff ff ff 48 8b 85 60 ff ff ff <42> 0f b6 0c 30 89 c8 88 8d 75 ff ff ff 83 e0 1f 89 8d 28 ff ff
    RIP: asn1_ber_decoder+0x17f/0xe60 lib/asn1_decoder.c:233 RSP: ffff9b6b37973c78
    CR2: 0000000000000000

Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # v3.7+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>

openvswitch: enable NSH support

v16->17
- Fixed disputed check code: keep them in nsh_push and nsh_pop
   but also add them in __ovs_nla_copy_actions

v15->v16
- Add csum recalculation for nsh_push, nsh_pop and set_nsh
   pointed out by Pravin
- Move nsh key into the union with ipv4 and ipv6 and add
   check for nsh key in match_validate pointed out by Pravin
- Add nsh check in validate_set and __ovs_nla_copy_actions

v14->v15
- Check size in nsh_hdr_from_nlattr
- Fixed four small issues pointed out By Jiri and Eric

v13->v14
- Rename skb_push_nsh to nsh_push per Dave's comment
- Rename skb_pop_nsh to nsh_pop per Dave's comment

v12->v13
- Fix NSH header length check in set_nsh

v11->v12
- Fix missing changes old comments pointed out
- Fix new comments for v11

v10->v11
- Fix the left three disputable comments for v9
   but not fixed in v10.

v9->v10
- Change struct ovs_key_nsh to
       struct ovs_nsh_key_base base;
       __be32 context[NSH_MD1_CONTEXT_SIZE];
- Fix new comments for v9

v8->v9
- Fix build error reported by daily intel build
   because nsh module isn't selected by openvswitch

v7->v8
- Rework nested value and mask for OVS_KEY_ATTR_NSH
- Change pop_nsh to adapt to nsh kernel module
- Fix many issues per comments from Jiri Benc

v6->v7
- Remove NSH GSO patches in v6 because Jiri Benc
   reworked it as another patch series and they have
   been merged.
- Change it to adapt to nsh kernel module added by NSH
   GSO patch series

v5->v6
- Fix the rest comments for v4.
- Add NSH GSO support for VxLAN-gpe + NSH and
   Eth + NSH.

v4->v5
- Fix many comments by Jiri Benc and Eric Garver
   for v4.

v3->v4
- Add new NSH match field ttl
- Update NSH header to the latest format
   which will be final format and won't change
   per its author's confirmation.
- Fix comments for v3.

v2->v3
- Change OVS_KEY_ATTR_NSH to nested key to handle
   length-fixed attributes and length-variable
   attriubte more flexibly.
- Remove struct ovs_action_push_nsh completely
- Add code to handle nested attribute for SET_MASKED
- Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
   to transfer NSH header data.
- Fix comments and coding style issues by Jiri and Eric

v1->v2
- Change encap_nsh and decap_nsh to push_nsh and pop_nsh
- Dynamically allocate struct ovs_action_push_nsh for
   length-variable metadata.

OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in compat mode by porting this.

Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Eric Garver <e@erig.me>
Acked-by: Pravin Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

qmi_wwan: Add missing skb_reset_mac_header-call

When we receive a packet on a QMI device in raw IP mode, we should call
skb_reset_mac_header() to ensure that skb->mac_header contains a valid
offset in the packet. While it shouldn't really matter, the packets have
no MAC header and the interface is configured as-such, it seems certain
parts of the network stack expects a "good" value in skb->mac_header.

Without the skb_reset_mac_header() call added in this patch, for example
shaping traffic (using tc) triggers the following oops on the first
received packet:

[  303.642957] skbuff: skb_under_panic: text:8f137918 len:177 put:67 head:8e4b0f00 data:8e4b0eff tail:0x8e4b0fb0 end:0x8e4b1520 dev:wwan0
[  303.655045] Kernel bug detected[#1]:
[  303.658622] CPU: 1 PID: 1002 Comm: logd Not tainted 4.9.58 #0
[  303.664339] task: 8fdf05e0 task.stack: 8f15c000
[  303.668844] $ 0   : 00000000 00000001 0000007a 00000000
[  303.674062] $ 4   : 8149a2fc 8149a2fc 8149ce20 00000000
[  303.679284] $ 8   : 00000030 3878303a 31623465 20303235
[  303.684510] $12   : ded731e3 2626a277 00000000 03bd0000
[  303.689747] $16   : 8ef62b40 00000043 8f137918 804db5fc
[  303.694978] $20   : 00000001 00000004 8fc13800 00000003
[  303.700215] $24   : 00000001 8024ab10
[  303.705442] $28   : 8f15c000 8fc19cf0 00000043 802cc920
[  303.710664] Hi    : 00000000
[  303.713533] Lo    : 74e58000
[  303.716436] epc   : 802cc920 skb_panic+0x58/0x5c
[  303.721046] ra    : 802cc920 skb_panic+0x58/0x5c
[  303.725639] Status: 11007c03 KERNEL EXL IE
[  303.729823] Cause : 50800024 (ExcCode 09)
[  303.733817] PrId  : 0001992f (MIPS 1004Kc)
[  303.737892] Modules linked in: rt2800pci rt2800mmio rt2800lib qcserial ppp_async option usb_wwan rt2x00pci rt2x00mmio rt2x00lib rndis_host qmi_wwan ppp_generic nf_nat_pptp nf_conntrack_pptp nf_conntrack_ipv6 mt76x2i
Process logd (pid: 1002, threadinfo=8f15c000, task=8fdf05e0, tls=77b3eee4)
[  303.962509] Stack : 00000000 80408990 8f137918 000000b1 00000043 8e4b0f00 8e4b0eff 8e4b0fb0
[  303.970871]         8e4b1520 8fec1800 00000043 802cd2a4 6e000045 00000043 00000000 8ef62000
[  303.979219]         8eef5d00 8ef62b40 8fea7300 8f137918 00000000 00000000 0002bb01 793e5664
[  303.987568]         8ef08884 00000001 8fea7300 00000002 8fc19e80 8eef5d00 00000006 00000003
[  303.995934]         00000000 8030ba90 00000003 77ab3fd0 8149dc80 8004d1bc 8f15c000 8f383700
[  304.004324]         ...
[  304.006767] Call Trace:
[  304.009241] [<802cc920>] skb_panic+0x58/0x5c
[  304.013504] [<802cd2a4>] skb_push+0x78/0x90
[  304.017783] [<8f137918>] 0x8f137918
[  304.021269] Code: 00602825  0c02a3b4  24842888 <000c000d> 8c870060  8c8200a0  0007382b  00070336  8c88005c
[  304.031034]
[  304.032805] ---[ end trace b778c482b3f0bda9 ]---
[  304.041384] Kernel panic - not syncing: Fatal exception in interrupt
[  304.051975] Rebooting in 3 seconds..

While the oops is for a 4.9-kernel, I was able to trigger the same oops with
net-next as of yesterday.

Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode")
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: fix slave stuck in BOND_LINK_FAIL state

The bonding miimon logic has a flaw, in that a failure of the
rtnl_trylock can cause a slave to become permanently stuck in
BOND_LINK_FAIL state.

The sequence of events to cause this is as follows:

1) bond_miimon_inspect finds that a slave's link is down, and so
calls bond_propose_link_state, setting slave->new_link_state to
BOND_LINK_FAIL, then sets slave->new_link to BOND_LINK_DOWN and returns
non-zero.

2) In bond_mii_monitor, the rtnl_trylock fails, and the timer is
rescheduled.  No change is committed.

3) bond_miimon_inspect is called again, but this time the slave
from step 1 has recovered.  slave->new_link is reset to NOCHANGE, and, as
slave->link was never changed, the switch enters the BOND_LINK_UP case,
and does nothing.  The pending BOND_LINK_FAIL state from step 1 remains
pending, as new_link_state is not reset.

4) The state from step 3 persists until another slave changes link
state and causes bond_miimon_inspect to return non-zero.  At this point,
the BOND_LINK_FAIL state change on the slave from steps 1-3 is committed,
and the slave will remain stuck in BOND_LINK_FAIL state even though it
is actually link up.

The remedy for this is to initialize new_link_state on each entry
to bond_miimon_inspect, as is already done with new_link.

Fixes: fb9eb899a6dc ("bonding: handle link transition from FAIL to UP correctly")
Reported-by: Alex Sidorenko <alexandre.sidorenko@hpe.com>
Reviewed-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: document 32-bit timestamp overflow

Timestamps in pktgen are currently retrieved using the deprecated
do_gettimeofday() function that wraps its signed 32-bit seconds in 2038
(on 32-bit architectures) and requires a division operation to calculate
microseconds.

The pktgen header is also defined with the same limitations, hardcoding
to a 32-bit seconds field that can be interpreted as unsigned to produce
times that only wrap in 2106. Whatever code reads the timestamps should
be aware of that problem in general, but probably doesn't care too
much as we are mostly interested in the time passing between packets,
and that is correctly represented.

Using 64-bit nanoseconds would be cheaper and good for 584 years. Using
monotonic times would also make this unambiguous by avoiding the overflow,
but would make it harder to correlate to the times with those on remote
machines. Either approach would require adding a new runtime flag and
implementing the same thing on the remote side, which we probably don't
want to do unless someone sees it as a real problem. Also, this should
be coordinated with other pktgen implementations and might need a new
magic number.

For the moment, I'm documenting the overflow in the source code, and
changing the implementation over to an open-coded ktime_get_real_ts64()
plus division, so we don't have to look at it again while scanning for
deprecated time interfaces.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

qrtr: Move to postcore_initcall

Registering qrtr with module_init makes the ability of typical platform
code to create AF_QIPCRTR socket during probe a matter of link order
luck. Moving qrtr to postcore_initcall() avoids this.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree, they are:

1) Speed up table replacement on busy systems with large tables
   (and many cores) in x_tables. Now xt_replace_table() synchronizes by
   itself by waiting until all cpus had an even seqcount and we use no
   use seqlock when fetching old counters, from Florian Westphal.

2) Add nf_l4proto_log_invalid() and nf_ct_l4proto_log_invalid() to speed
   up packet processing in the fast path when logging is not enabled, from
   Florian Westphal.

3) Precompute masked address from configuration plane in xt_connlimit,
   from Florian.

4) Don't use explicit size for set selection if performance set policy
   is selected.

5) Allow to get elements from an existing set in nf_tables.

6) Fix incorrect check in nft_hash_deactivate(), from Florian.

7) Cache netlink attribute size result in l4proto->nla_size, from
   Florian.

8) Handle NFPROTO_INET in nf_ct_netns_get() from conntrack core.

9) Use power efficient workqueue in conntrack garbage collector, from
   Vincent Guittot.

10) Remove unnecessary parameter, in conntrack l4proto functions, also
    from Florian.

11) Constify struct nf_conntrack_l3proto definitions, from Florian.

12) Remove all typedefs in nf_conntrack_h323 via coccinelle semantic
    patch, from Harsha Sharma.

13) Don't store address in the rbtree nodes in xt_connlimit, they are
    never used, from Florian.

14) Fix out of bound access in the conntrack h323 helper, patch from
    Eric Sesterhenn.

15) Print symbols for the address returned with %pS in IPVS, from
    Helge Deller.

16) Proc output should only display its own netns in IPVS, from
    KUWAZAWA Takuya.

17) Small clean up in size_entry_mwt(), from Colin Ian King.

18) Use test_and_clear_bit from nf_nat_proto_clean() instead of separated
    non-atomic test and then clear bit, from Florian Westphal.

19) Consolidate prefix length maps in ipset, from Aaron Conole.

20) Fix sparse warnings in ipset, from Jozsef Kadlecsik.

21) Simplify list_set_memsize(), from simran singhal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add ethtool GOP statistics

Add ethtool statistics support by reading the GOP statistics from the
hardware counters. Also implement a workqueue to gather the statistics
every second or some 32-bit counters could overflow.

Suggested-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fsl-fman-next'

Christophe JAILLET says:

====================
fsl/fman: Fix some error handling code in mac_probe

Commit c6e26ea8c893 ("dpaa_eth: change device used") generated some
conflicts in my patches waiting for submission. So I took a closer look at
it.

So here is a serie of 4 patches.

The 1st one is just about a spurious call to 'dev_set_drvdata()', which is
done in only 1 error handling path in the function.

The 2nd one removes some devm_iounmap/release/kfree functions which look
useless to me.

The 3rd one fixes a missing of_node_put.

The 4th one is just cosmetic and removes a useless message.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

fsl/fman: Remove a useless 'dev_err()' call

Memory allocation functions already display some informaton in case of
memory allocation failure. There is no need to add an extra 'dev_err' here.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

fsl/fman: Add a missing 'of_node_put()' call in an error handling path

If 'of_phy_find_device()' fails, we must undo the previous 'of_node_get()'
call, as done the the following error handling code.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

fsl/fman: Remove some useless code

There is no need to release explicitly some devm_ allocated resources.
If the 'mac_probe()' probe function fails, they will be released
automatically, as already done in the other error handling paths of
this function.

Also goto '_return_of_get_parent' as in the other error handling paths.
This is useless (priv->fixed_link is NULL at this point), but at least
it is consistent.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

fsl/fman: Remove a useless call to 'dev_set_drvdata()'

Commit c6e26ea8c893 ("dpaa_eth: change device used") has removed usage of
'dev_set_drvdata()' in the 'mac_probe() function.

This call should also be axed.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: fix missing size for IFLA_IF_NETNSID

The size for IFLA_IF_NETNSID is missing from the size calculation
because the proceeding semicolon was not removed. Fix this by removing
the semicolon.

Detected by CoverityScan, CID#1461135 ("Structurally dead code")

Fixes: 79e1ad148c84 ("rtnetlink: use netnsid to query interface")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>