Harald Welte [Sun, 12 Nov 2017 22:21:34 +0000 (07:21 +0900)]
net: Mention net-next status web page in netdev-FAQ.txt
According to
https://www.mail-archive.com/netdev@vger.kernel.org/msg177411.html
there is a status page available at
http://vger.kernel.org/~davem/net-next.html
to obtain the current status of the net-next tree. Let's add this
information to the netdev FAQ.
Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Sun, 12 Nov 2017 22:18:45 +0000 (07:18 +0900)]
net: Extend Kernel GTP-U tunneling documentation
* clarify specification references for v0/v1
* add section "APN vs. Network device"
* add section "Local GTP-U entity and tunnel identification"
Signed-off-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Nov 2017 07:38:46 +0000 (16:38 +0900)]
Merge branch 'net-devname_alloc_cleanups'
Rasmus Villemoes says:
====================
net: core: devname allocation cleanups
It's somewhat confusing to have both dev_alloc_name and
dev_get_valid_name. I can't see why the former is less strict than the
latter, so make them (or rather dev_alloc_name_ns and
dev_get_valid_name) equivalent, hardening dev_alloc_name() a little.
Obvious follow-up patches would be to only export one function, and
make dev_alloc_name a static inline wrapper for that (whichever name
is chosen for the exported interface). But maybe there is a good
reason the two exported interfaces do different checking, so I'll
refrain from including the trivial but tree-wide renaming in this
series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rasmus Villemoes [Sun, 12 Nov 2017 23:15:10 +0000 (00:15 +0100)]
net: core: dev_get_valid_name is now the same as dev_alloc_name_ns
If name contains a %, it's easy to see that this patch doesn't change
anything (other than eliminate the duplicate dev_valid_name
call). Otherwise, we'll now just spend a little time in snprintf()
copying name to the stack buffer allocated in dev_alloc_name_ns, and do
the __dev_get_by_name using that buffer rather than name.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rasmus Villemoes [Sun, 12 Nov 2017 23:15:09 +0000 (00:15 +0100)]
net: core: maybe return -EEXIST in __dev_alloc_name
If we're given format string with no %d, -EEXIST is a saner error code.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rasmus Villemoes [Sun, 12 Nov 2017 23:15:08 +0000 (00:15 +0100)]
net: core: check dev_valid_name in __dev_alloc_name
We currently only exclude non-sysfs-friendly names via
dev_get_valid_name; there doesn't seem to be a reason to allow such
names when we're called via dev_alloc_name.
This does duplicate the dev_valid_name check in the dev_get_valid_name()
case; we'll fix that shortly.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rasmus Villemoes [Sun, 12 Nov 2017 23:15:07 +0000 (00:15 +0100)]
net: core: drop pointless check in __dev_alloc_name
The only caller passes a stack buffer as buf, so it won't equal the
passed-in name. Moreover, we're already using buf as a scratch buffer
inside the if (p) {} block, so if buf and name were the same, that
snprintf() call would be overwriting its own format string.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rasmus Villemoes [Sun, 12 Nov 2017 23:15:06 +0000 (00:15 +0100)]
net: core: eliminate dev_alloc_name{,_ns} code duplication
dev_alloc_name contained a BUG_ON(), which I moved to dev_alloc_name_ns;
the only other caller of that already has the same BUG_ON.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rasmus Villemoes [Sun, 12 Nov 2017 23:15:05 +0000 (00:15 +0100)]
net: core: move dev_alloc_name_ns a little higher
No functional change.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rasmus Villemoes [Sun, 12 Nov 2017 23:15:04 +0000 (00:15 +0100)]
net: core: improve sanity checking in __dev_alloc_name
__dev_alloc_name is called from the public (and exported)
dev_alloc_name(), so we don't have a guarantee that strlen(name) is at
most IFNAMSIZ. If somebody manages to get __dev_alloc_name called with a
% char beyond the 31st character, we'd be making a snprintf() call that
will very easily crash the kernel (using an appropriate %p extension,
we'll likely dereference some completely bogus pointer).
In the normal case where strlen() is sane, we don't even save anything
by limiting to IFNAMSIZ, so just use strchr().
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Nov 2017 07:26:35 +0000 (16:26 +0900)]
Merge branch 'tls-misc-fixes'
Ilya Lesokhin says:
====================
tls: Miscellaneous fixes
Here's a set of miscellaneous fix patches.
Patch 1 makes sure aead_request is initailized properly.
Patches 2-3 Fix a memory leak we've encountered.
patch 4 moves tls_make_aad to allow sharing it in the future.
Patch 5 fixes a TOCTOU issue reported here:
https://www.spinics.net/lists/kernel/msg2608603.html
Patch 6 Avoids callback overriding when tls_set_sw_offload fails.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Lesokhin [Mon, 13 Nov 2017 08:22:49 +0000 (10:22 +0200)]
tls: don't override sk_write_space if tls_set_sw_offload fails.
If we fail to enable tls in the kernel we shouldn't override
the sk_write_space callback
Fixes: 3c4d7559159b ('tls: kernel TLS support')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Lesokhin [Mon, 13 Nov 2017 08:22:48 +0000 (10:22 +0200)]
tls: Avoid copying crypto_info again after cipher_type check.
Avoid copying crypto_info again after cipher_type check
to avoid a TOCTOU exploits.
The temporary array on the stack is removed as we don't really need it
Fixes: 3c4d7559159b ('tls: kernel TLS support')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Lesokhin [Mon, 13 Nov 2017 08:22:47 +0000 (10:22 +0200)]
tls: Move tls_make_aad to header to allow sharing
move tls_make_aad as it is going to be reused
by the device offload code and rx path.
Remove unused recv parameter.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Lesokhin [Mon, 13 Nov 2017 08:22:46 +0000 (10:22 +0200)]
tls: Fix TLS ulp context leak, when TLS_TX setsockopt is not used.
Previously the TLS ulp context would leak if we attached a TLS ulp
to a socket but did not use the TLS_TX setsockopt,
or did use it but it failed.
This patch solves the issue by overriding prot[TLS_BASE_TX].close
and fixing tls_sk_proto_close to work properly
when its called with ctx->tx_conf == TLS_BASE_TX.
This patch also removes ctx->free_resources as we can use ctx->tx_conf
to obtain the relevant information.
Fixes: 3c4d7559159b ('tls: kernel TLS support')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Lesokhin [Mon, 13 Nov 2017 08:22:45 +0000 (10:22 +0200)]
tls: Add function to update the TLS socket configuration
The tx configuration is now stored in ctx->tx_conf.
And sk->sk_prot is updated trough a function
This will simplify things when we add rx
and support for different possible
tx and rx cross configurations.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Lesokhin [Mon, 13 Nov 2017 08:22:44 +0000 (10:22 +0200)]
tls: Use kzalloc for aead_request allocation
Use kzalloc for aead_request allocation as
we don't set all the bits in the request.
Fixes: 3c4d7559159b ('tls: kernel TLS support')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Nov 2017 07:20:04 +0000 (16:20 +0900)]
Merge branch 'bpf-improve-verifier-ARG_CONST_SIZE_OR_ZERO-semantics'
Yonghong Song says:
====================
bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics
This patch set intends to change verifier ARG_CONST_SIZE_OR_ZERO
semantics so that simpler bpf programs can be written with verifier
acceptance. Patch #1 comment provided the detailed examples and
the patch itself implements the new semantics. Patch #2
changes bpf_probe_read helper arg2 type from
ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO. Patch #3 fixed a few
test cases and added some for better coverage.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Sun, 12 Nov 2017 22:49:11 +0000 (14:49 -0800)]
bpf: fix and add test cases for ARG_CONST_SIZE_OR_ZERO semantics change
Fix a few test cases to allow non-NULL map/packet/stack pointer
with size = 0. Change a few tests using bpf_probe_read to use
bpf_probe_write_user so ARG_CONST_SIZE arg can still be properly
tested. One existing test case already covers size = 0 with non-NULL
packet pointer, so add additional tests so all cases of
size = 0 and 0 <= size <= legal_upper_bound with non-NULL
map/packet/stack pointer are covered.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Sun, 12 Nov 2017 22:49:10 +0000 (14:49 -0800)]
bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO
The helper bpf_probe_read arg2 type is changed
from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO to permit
size-0 buffer. Together with newer ARG_CONST_SIZE_OR_ZERO
semantics which allows non-NULL buffer with size 0,
this allows simpler bpf programs with verifier acceptance.
The previous commit which changes ARG_CONST_SIZE_OR_ZERO semantics
has details on examples.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Sun, 12 Nov 2017 22:49:09 +0000 (14:49 -0800)]
bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics
For helpers, the argument type ARG_CONST_SIZE_OR_ZERO permits the
access size to be 0 when accessing the previous argument (arg).
Right now, it requires the arg needs to be NULL when size passed
is 0 or could be 0. It also requires a non-NULL arg when the size
is proved to be non-0.
This patch changes verifier ARG_CONST_SIZE_OR_ZERO behavior
such that for size-0 or possible size-0, it is not required
the arg equal to NULL.
There are a couple of reasons for this semantics change, and
all of them intends to simplify user bpf programs which
may improve user experience and/or increase chances of
verifier acceptance. Together with the next patch which
changes bpf_probe_read arg2 type from ARG_CONST_SIZE to
ARG_CONST_SIZE_OR_ZERO, the following two examples, which
fail the verifier currently, are able to get verifier acceptance.
Example 1:
unsigned long len = pend - pstart;
len = len > MAX_PAYLOAD_LEN ? MAX_PAYLOAD_LEN : len;
len &= MAX_PAYLOAD_LEN;
bpf_probe_read(data->payload, len, pstart);
It does not have test for "len > 0" and it failed the verifier.
Users may not be aware that they have to add this test.
Converting the bpf_probe_read helper to have
ARG_CONST_SIZE_OR_ZERO helps the above code get
verifier acceptance.
Example 2:
Here is one example where llvm "messed up" the code and
the verifier fails.
......
unsigned long len = pend - pstart;
if (len > 0 && len <= MAX_PAYLOAD_LEN)
bpf_probe_read(data->payload, len, pstart);
......
The compiler generates the following code and verifier fails:
......
39: (79) r2 = *(u64 *)(r10 -16)
40: (1f) r2 -= r8
41: (bf) r1 = r2
42: (07) r1 += -1
43: (25) if r1 > 0xffe goto pc+3
R0=inv(id=0) R1=inv(id=0,umax_value=4094,var_off=(0x0; 0xfff))
R2=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=4095,imm=0) R7=inv(id=0)
R8=inv(id=0) R9=inv0 R10=fp0
44: (bf) r1 = r6
45: (bf) r3 = r8
46: (85) call bpf_probe_read#45
R2 min value is negative, either use unsigned or 'var &= const'
......
The compiler optimization is correct. If r1 = 0,
r1 - 1 = 0xffffffffffffffff > 0xffe. If r1 != 0, r1 - 1 will not wrap.
r1 > 0xffe at insn #43 can actually capture
both "r1 > 0" and "len <= MAX_PAYLOAD_LEN".
This however causes an issue in verifier as the value range of arg2
"r2" does not properly get refined and lead to verification failure.
Relaxing bpf_prog_read arg2 from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO
allows the following simplied code:
unsigned long len = pend - pstart;
if (len <= MAX_PAYLOAD_LEN)
bpf_probe_read(data->payload, len, pstart);
The llvm compiler will generate less complex code and the
verifier is able to verify that the program is okay.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 11 Nov 2017 23:54:12 +0000 (15:54 -0800)]
tcp: allow drivers to tweak TSQ logic
I had many reports that TSQ logic breaks wifi aggregation.
Current logic is to allow up to 1 ms of bytes to be queued into qdisc
and drivers queues.
But Wifi aggregation needs a bigger budget to allow bigger rates to
be discovered by various TCP Congestion Controls algorithms.
This patch adds an extra socket field, allowing wifi drivers to select
another log scale to derive TCP Small Queue credit from current pacing
rate.
Initial value is 10, meaning that this patch does not change current
behavior.
We expect wifi drivers to set this field to smaller values (tests have
been done with values from 6 to 9)
They would have to use following template :
if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
skb->sk->sk_pacing_shift = MY_PACING_SHIFT;
Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/
1670041
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Cc: Kir Kolyshkin <kir@openvz.org>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Nov 2017 07:17:38 +0000 (16:17 +0900)]
Merge tag 'rxrpc-next-
20171111' of git://git./linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Fixes
Here are some patches that fix some things in AF_RXRPC:
(1) Prevent notifications from being passed to a kernel service for a call
that it has ended.
(2) Fix a null pointer deference that occurs under some circumstances when an
ACK is generated.
(3) Fix a number of things to do with call expiration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yanjun [Sat, 11 Nov 2017 15:42:03 +0000 (10:42 -0500)]
bnx2x: fix slowpath null crash
When "NETDEV WATCHDOG: em4 (bnx2x): transmit queue 2 timed out" occurs,
BNX2X_SP_RTNL_TX_TIMEOUT is set. In the function bnx2x_sp_rtnl_task,
bnx2x_nic_unload and bnx2x_nic_load are executed to shutdown and open
NIC. In the function bnx2x_nic_load, bnx2x_alloc_mem allocates dma
failure. The message "bnx2x: [bnx2x_alloc_mem:8399(em4)]Can't
allocate memory" pops out. The variable slowpath is set to NULL.
When shutdown the NIC, the function bnx2x_nic_unload is called. In
the function bnx2x_nic_unload, the following functions are executed.
bnx2x_chip_cleanup
bnx2x_set_storm_rx_mode
bnx2x_set_q_rx_mode
bnx2x_set_q_rx_mode
bnx2x_config_rx_mode
bnx2x_set_rx_mode_e2
In the function bnx2x_set_rx_mode_e2, the variable slowpath is operated.
Then the crash occurs.
To fix this crash, the variable slowpath is checked. And in the function
bnx2x_sp_rtnl_task, after dma memory allocation fails, another shutdown
and open NIC is executed.
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Ariel Elior <aelior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Nov 2017 07:14:07 +0000 (16:14 +0900)]
Merge branch 'cxgb4-collect-LE-TCAM-and-SGE-queue-contexts'
Rahul Lakkireddy says:
====================
cxgb4: collect LE-TCAM and SGE queue contexts
Collect hardware dumps via ethtool --get-dump facility.
Patch 1 collects LE-TCAM dump.
Patch 2 collects SGE queue context dumps.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Sat, 11 Nov 2017 14:18:16 +0000 (19:48 +0530)]
cxgb4: collect SGE queue context dump
Collect SGE freelist queue and congestion manager contexts.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Sat, 11 Nov 2017 14:18:15 +0000 (19:48 +0530)]
cxgb4: collect LE-TCAM dump
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 11 Nov 2017 11:58:50 +0000 (19:58 +0800)]
vxlan: fix the issue that neigh proxy blocks all icmpv6 packets
Commit
f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport
header offset") removed icmp6_code and icmp6_type check before calling
neigh_reduce when doing neigh proxy.
It means all icmpv6 packets would be blocked by this, not only ns packet.
In Jianlin's env, even ping6 couldn't work through it.
This patch is to bring the icmp6_code and icmp6_type check back and also
removed the same check from neigh_reduce().
Fixes: f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport header offset")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Sun, 12 Nov 2017 19:34:03 +0000 (22:34 +0300)]
xfrm6_tunnel: exit_net cleanup check added
Be sure that spi_byaddr and spi_byspi arrays initialized in net_init hook
were return to initial state
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Sun, 12 Nov 2017 19:33:22 +0000 (22:33 +0300)]
ppp: exit_net cleanup checks added
Be sure that lists initialized in net_init hook were return
to initial state.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Sun, 12 Nov 2017 19:32:47 +0000 (22:32 +0300)]
phonet: exit_net cleanup check added
Be sure that pndevs.list initialized in net_init hook was return
to initial state.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Sun, 12 Nov 2017 19:30:31 +0000 (22:30 +0300)]
l2tp: exit_net cleanup check added
Be sure that l2tp_session_hlist array initialized in net_init hook
was return to initial state.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Sun, 12 Nov 2017 19:30:01 +0000 (22:30 +0300)]
fib_rules: exit_net cleanup check added
Be sure that rules_ops list initialized in net_init hook was return
to initial state.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Sun, 12 Nov 2017 19:29:33 +0000 (22:29 +0300)]
fib_notifier: exit_net cleanup check added
Be sure that fib_notifier_ops list initilized in net_init hook was return
to initial state.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Sun, 12 Nov 2017 19:28:46 +0000 (22:28 +0300)]
netdev: exit_net cleanup check added
Be sure that dev_base_head list initialized in net_init hook was return
to initial state
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Sun, 12 Nov 2017 19:28:10 +0000 (22:28 +0300)]
vxlan: exit_net cleanup checks added
Be sure that sock_list array initialized in net_init hook was return
to initial state
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Sun, 12 Nov 2017 19:27:49 +0000 (22:27 +0300)]
packet: exit_net cleanup check added
Be sure that packet.sklist initialized in net_init hook was return
to initial state.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Sun, 12 Nov 2017 19:27:19 +0000 (22:27 +0300)]
geneve: exit_net cleanup check added
Be sure that sock_list initialized in net_init hook was return
to initial state.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Averin [Sun, 12 Nov 2017 19:26:53 +0000 (22:26 +0300)]
af_key: replace BUG_ON on WARN_ON in net_exit hook
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 11 Nov 2017 15:29:41 +0000 (16:29 +0100)]
net: dsa: Fix dependencies on bridge
DSA now uses one of the symbols exported by the bridge,
br_vlan_enabled(). This has a stub, if the bridge is not
enabled. However, if the bridge is enabled, we cannot have DSA built
in and the bridge as a module, otherwise we get undefined symbols at
link time:
net/dsa/port.o: In function `dsa_port_vlan_add':
net/dsa/port.c:255: undefined reference to `br_vlan_enabled'
net/dsa/port.o: In function `dsa_port_vlan_del':
net/dsa/port.c:270: undefined reference to `br_vlan_enabled'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Nov 2017 01:44:06 +0000 (10:44 +0900)]
Merge branch 'net-improve-the-process-of-redirect-and-toobig-for-ipv6-tunnels'
Xin Long says:
====================
net: improve the process of redirect and toobig for ipv6 tunnels
Now let's say there are 3 kinds of icmp packets to process for tunnels,
toobig(needfrag), redirect, others, their process should be:
- toobig(needfrag)
update the lower dst's pmtu by route cache, also update sk dst's pmtu
if possible, or it will be fine if sk dst pmtu will get updated on tx
path.
- redirect
update the lower dst's gw by route cache and return, no need to send
this redirect packet to user sk.
- others
send the packet to user's sk, or it will also be fine to use err_count
to count it and report fail link on tx path.
All ipv4 tunnels basically follow this while some of ipv6 tunnels are
doing in different ways, like ip6gre and ip6_tunnels update tnl dev's
mtu instead of updating lower dst pmtu, no redirect process on their
err_handlers, which doesn't make any sense and even causes performance
problems.
This patchset is to improve the process of redirect and toobig for ip6gre
ip4ip6, ip6ip6 tunnels, as in ipv4 tunnels.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 11 Nov 2017 11:06:53 +0000 (19:06 +0800)]
ip6_tunnel: clean up ip4ip6 and ip6ip6's err_handlers
This patch is to remove some useless codes of redirect and fix some
indents on ip4ip6 and ip6ip6's err_handlers.
Note that redirect icmp packet is already processed in ip6_tnl_err,
the old redirect codes in ip4ip6_err actually never worked even
before this patch. Besides, there's no need to send redirect to
user's sk, it's for lower dst, so just remove it in this patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 11 Nov 2017 11:06:52 +0000 (19:06 +0800)]
ip6_tunnel: process toobig in a better way
The same improvement in "ip6_gre: process toobig in a better way"
is needed by ip4ip6 and ip6ip6 as well.
Note that ip4ip6 and ip6ip6 will also update sk dst pmtu in their
err_handlers. Like I said before, gre6 could not do this as it's
inner proto is not certain. But for all of them, sk dst pmtu will
be updated in tx path if in need.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 11 Nov 2017 11:06:51 +0000 (19:06 +0800)]
ip6_tunnel: add the process for redirect in ip6_tnl_err
The same process for redirect in "ip6_gre: add the process for redirect
in ip6gre_err" is needed by ip4ip6 and ip6ip6 as well.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 11 Nov 2017 11:06:50 +0000 (19:06 +0800)]
ip6_gre: process toobig in a better way
Now ip6gre processes toobig icmp packet by setting gre dev's mtu in
ip6gre_err, which would cause few things not good:
- It couldn't set mtu with dev_set_mtu due to it's not in user context,
which causes route cache and idev->cnf.mtu6 not to be updated.
- It has to update sk dst pmtu in tx path according to gredev->mtu for
ip6gre, while it updates pmtu again according to lower dst pmtu in
ip6_tnl_xmit.
- To change dev->mtu by toobig icmp packet is not a good idea, it should
only work on pmtu.
This patch is to process toobig by updating the lower dst's pmtu, as later
sk dst pmtu will be updated in ip6_tnl_xmit, the same way as in ip4gre.
Note that gre dev's mtu will not be updated any more, it doesn't make any
sense to change dev's mtu after receiving a toobig packet.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 11 Nov 2017 11:06:49 +0000 (19:06 +0800)]
ip6_gre: add the process for redirect in ip6gre_err
This patch is to add redirect icmp packet process for ip6gre by
calling ip6_redirect() in ip6gre_err(), as in vti6_err.
Prior to this patch, there's even no route cache generated after
receiving redirect.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yanjun [Sat, 11 Nov 2017 02:10:00 +0000 (21:10 -0500)]
forcedeth: remove redudant assignments in xmit
In xmit process, the variables are set many times. In fact,
it is enough for these variables to be set once.
After a long time test, the throughput performance is better
than before.
CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Nov 2017 01:39:12 +0000 (10:39 +0900)]
Merge tag 'nfc-next-4.15-1' of git://git./linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:
====================
NFC 4.15 pull request
This is the NFC pull request for 4.15. We have:
- A new netlink command for explicitly deactivating NFC targets
- i2c constification for all NFC drivers
- One NFC device allocation error path fix
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Nov 2017 01:37:08 +0000 (10:37 +0900)]
Merge branch 'Openvswitch-meter-action'
Andy Zhou says:
====================
Openvswitch meter action
This patch series is the first attempt to add openvswitch
meter support. We have previously experimented with adding
metering support in nftables. However 1) It was not clear
how to expose a named nftables object cleanly, and 2)
the logic that implements metering is quite small, < 100 lines
of code.
With those two observations, it seems cleaner to add meter
support in the openvswitch module directly.
---
v1(RFC)->v2: remove unused code improve locking
and other review comments
v2 -> v3: rebase
v3 -> v4: fix undefined "__udivdi3" references on 32 bit builds.
use div_u64() instead.
v4 -> v5: rebase
====================
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Zhou [Fri, 10 Nov 2017 20:09:43 +0000 (12:09 -0800)]
openvswitch: Add meter action support
Implements OVS kernel meter action support.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Zhou [Fri, 10 Nov 2017 20:09:42 +0000 (12:09 -0800)]
openvswitch: Add meter infrastructure
OVS kernel datapath so far does not support Openflow meter action.
This is the first stab at adding kernel datapath meter support.
This implementation supports only drop band type.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Zhou [Fri, 10 Nov 2017 20:09:41 +0000 (12:09 -0800)]
openvswitch: export get_dp() API.
Later patches will invoke get_dp() outside of datapath.c. Export it.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Zhou [Fri, 10 Nov 2017 20:09:40 +0000 (12:09 -0800)]
openvswitch: Add meter netlink definitions
Meter has its own netlink family. Define netlink messages and attributes
for communicating with the user space programs.
Signed-off-by: Andy Zhou <azhou@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Nov 2017 01:34:55 +0000 (10:34 +0900)]
Merge branch 'dsa-b53-Support-prepended-Broadcom-tags'
Florian Fainelli says:
====================
net: dsa: b53: Support prepended Broadcom tags
This patch series adds support for prepended 4-bytes Broadcom tags that we
already support. This type of tag will typically be used when interfaced to
a SoC like BCM58xx (NorthStar Plus) which supports a Flow Accelerator (WIP).
In that case, we need to support a slightly different tagging format.
The first patch does a bit of re-factoring and passes a port index to
the get_tag_protocol() function since at least two different drivers need
that type of information (mt7530, b53) to support tagging or not.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 10 Nov 2017 23:22:55 +0000 (15:22 -0800)]
net: dsa: b53: Support prepended Broadcom tags
On BCM58xx devices (Northstar Plus), there is an accelerator attached to
port 8 which would only work if we use prepended Broadcom tags. Resolve
that difference in our get_tag_protocol() function by setting the
appropriate tagging protocol in that case. We need to change
b53_brcm_hdr_setup() a little bit now since we can deal with two types
of Broadcom tags.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 10 Nov 2017 23:22:54 +0000 (15:22 -0800)]
net: dsa: Support prepended Broadcom tag
Add a new type: DSA_TAG_PROTO_PREPEND which allows us to support for the
4-bytes Broadcom tag that we already support, but in a format where it
is pre-pended to the packet instead of located between the MAC SA and
the Ethertyper (DSA_TAG_PROTO_BRCM).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 10 Nov 2017 23:22:53 +0000 (15:22 -0800)]
net: dsa: tag_brcm: Prepare for supporting prepended tag
In preparation for supporting the same Broadcom tag format, but instead
of inserted between the MAC SA and EtherType, prepended to the Ethernet
frame, restructure the code a little bit to make that possible and take
an offset parameter.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 10 Nov 2017 23:22:52 +0000 (15:22 -0800)]
net: dsa: Pass a port to get_tag_protocol()
A number of drivers want to check whether the configured CPU port is a
possible configuration for enabling tagging, pass down the CPU port
number so they verify that.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Morton [Fri, 10 Nov 2017 23:09:53 +0000 (15:09 -0800)]
net/sched/sch_red.c: work around gcc-4.4.4 anon union initializer issue
gcc-4.4.4 (at lest) has issues with initializers and anonymous unions:
net/sched/sch_red.c: In function 'red_dump_offload':
net/sched/sch_red.c:282: error: unknown field 'stats' specified in initializer
net/sched/sch_red.c:282: warning: initialization makes integer from pointer without a cast
net/sched/sch_red.c:283: error: unknown field 'stats' specified in initializer
net/sched/sch_red.c:283: warning: initialization makes integer from pointer without a cast
net/sched/sch_red.c: In function 'red_dump_stats':
net/sched/sch_red.c:352: error: unknown field 'xstats' specified in initializer
net/sched/sch_red.c:352: warning: initialization makes integer from pointer without a cast
Work around this.
Fixes: 602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc")
Cc: Nogah Frankel <nogahf@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Simon Horman <simon.horman@netronome.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Slava Shwartsman [Fri, 10 Nov 2017 07:10:29 +0000 (09:10 +0200)]
net/mlx4: Use Kconfig flag to remove support of old gen2 Mellanox devices
Since Mellanox focus is on newer adapters, we would like to have the
ability to disable the support for old gen2 adapters.
This can be done by turning off the MLX4_CORE_GEN2 Kconfig flag.
We keep it turned on by default.
Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Thu, 9 Nov 2017 04:04:44 +0000 (13:04 +0900)]
af_netlink: ensure that NLMSG_DONE never fails in dumps
The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.
However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:
nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.
In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.
This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Nov 2017 01:15:47 +0000 (10:15 +0900)]
Merge branch 'netem-add-nsec-scheduling-and-slot-feature'
Dave Taht says:
====================
netem: add nsec scheduling and slot feature
This patch series converts netem away from the old "ticks" interface and
userspace API, and adds support for a new "slot" feature intended to
emulate bursty macs such as WiFi and LTE better.
Changes since v2:
Use u64 for packet_len_sched_time()
Use simpler max(time_to_send,q->slot.slot_next)
Changes since v1:
Always pass new nanosecond APIs to userspace
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Taht [Wed, 8 Nov 2017 23:12:28 +0000 (15:12 -0800)]
netem: support delivering packets in delayed time slots
Slotting is a crude approximation of the behaviors of shared media such
as cable, wifi, and LTE, which gather up a bunch of packets within a
varying delay window and deliver them, relative to that, nearly all at
once.
It works within the existing loss, duplication, jitter and delay
parameters of netem. Some amount of inherent latency must be specified,
regardless.
The new "slot" parameter specifies a minimum and maximum delay between
transmission attempts.
The "bytes" and "packets" parameters can be used to limit the amount of
information transferred per slot.
Examples of use:
tc qdisc add dev eth0 root netem delay 200us \
slot 800us 10ms bytes 64k packets 42
A more correct example, using stacked netem instances and a packet limit
to emulate a tail drop wifi queue with slots and variable packet
delivery, with a 200Mbit isochronous underlying rate, and 20ms path
delay:
tc qdisc add dev eth0 root handle 1: netem delay 20ms rate 200mbit \
limit 10000
tc qdisc add dev eth0 parent 1:1 handle 10:1 netem delay 200us \
slot 800us 10ms bytes 64k packets 42 limit 512
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Taht [Wed, 8 Nov 2017 23:12:27 +0000 (15:12 -0800)]
netem: add uapi to express delay and jitter in nanoseconds
netem userspace has long relied on a horrible /proc/net/psched hack
to translate the current notion of "ticks" to nanoseconds.
Expressing latency and jitter instead, in well defined nanoseconds,
increases the dynamic range of emulated delays and jitter in netem.
It will also ease a transition where reducing a tick to nsec
equivalence would constrain the max delay in prior versions of
netem to only 4.3 seconds.
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Taht [Wed, 8 Nov 2017 23:12:26 +0000 (15:12 -0800)]
netem: convert to qdisc_watchdog_schedule_ns
Upgrade the internal netem scheduler to use nanoseconds rather than
ticks throughout.
Convert to and from the std "ticks" userspace api automatically,
while allowing for finer grained scheduling to take place.
Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Francesco Ruggeri [Wed, 8 Nov 2017 19:23:46 +0000 (11:23 -0800)]
ipv6: try not to take rtnl_lock in ip6mr_sk_done
Avoid traversing the list of mr6_tables (which requires the
rtnl_lock) in ip6mr_sk_done(), when we know in advance that
a match will not be found.
This can happen when rawv6_close()/ip6mr_sk_done() is invoked
on non-mroute6 sockets.
This patch helps reduce rtnl_lock contention when destroying
a large number of net namespaces, each having a non-mroute6
raw socket.
v2: same patch, only fixed subject line and expanded comment.
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Wed, 8 Nov 2017 13:23:23 +0000 (13:23 +0000)]
net: realtek: r8169: remove redundant assignment to giga_ctrl
The variable giga_ctrl is being assigned to zero however this is
never read and hence the assignment is redundant, so remove it.
Cleans up clang warning:
drivers/net/ethernet/realtek/r8169.c:1978:3: warning: Value stored
to 'giga_ctrl' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Wed, 8 Nov 2017 10:55:14 +0000 (11:55 +0100)]
net: dsa: lan9303: Documentation: Add missing word "Mbps"
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Wed, 8 Nov 2017 10:44:36 +0000 (11:44 +0100)]
net: dsa: lan9303: Fix lan9303_alr_del_port()
Fix embarrassing bug in lan9303_alr_del_port(): Instead of zeroing
entr->mac_addr, I destroyed the next cache entry. Affected .port_fdb_del and
.port_mdb_del.
Fixes: 0620427ea0d6 ("net: dsa: lan9303: Add fdb/mdb manipulation")
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 12 Nov 2017 00:17:05 +0000 (09:17 +0900)]
Merge git://git./linux/kernel/git/davem/net
Linus Torvalds [Sat, 11 Nov 2017 17:10:39 +0000 (09:10 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Use after free in vlan, from Cong Wang.
2) Handle NAPI poll with a zero budget properly in mlx5 driver, from
Saeed Mahameed.
3) If DMA mapping fails in mlx5 driver, NULL out page, from Inbar
Karmy.
4) Handle overrun in RX FIFO of sun4i CAN driver, from Gerhard
Bertelsmann.
5) Missing return in mdb and vlan prepare phase of DSA layer, from
Vivien Didelot.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
vlan: fix a use-after-free in vlan_device_event()
net: dsa: return after vlan prepare phase
net: dsa: return after mdb prepare phase
can: ifi: Fix transmitter delay calculation
tcp: fix tcp_fastretrans_alert warning
tcp: gso: avoid refcount_t warning from tcp_gso_segment()
can: peak: Add support for new PCIe/M2 CAN FD interfaces
can: sun4i: handle overrun in RX FIFO
can: c_can: don't indicate triple sampling support for D_CAN
net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs
net/mlx5e: Set page to null in case dma mapping fails
net/mlx5e: Fix napi poll with zero budget
net/mlx5: Cancel health poll before sending panic teardown command
net/mlx5: Loop over temp list to release delay events
rds: ib: Fix NULL pointer dereference in debug code
David S. Miller [Sat, 11 Nov 2017 13:37:22 +0000 (22:37 +0900)]
Merge tag 'wireless-drivers-next-for-davem-2017-11-11' of git://git./linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.15
Last minute patches before the merge window. Not really anything
special standing out, mostly fixes or cleanup and some minor new
features.
Major changes:
iwlwifi
* some new PCI IDs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mat Martineau [Fri, 10 Nov 2017 22:03:51 +0000 (14:03 -0800)]
net: Remove unused skb_shared_info member
ip6_frag_id was only used by UFO, which has been removed.
ipv6_proxy_select_ident() only existed to set ip6_frag_id and has no
in-tree callers.
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Nov 2017 13:08:24 +0000 (22:08 +0900)]
Merge branch 'l2tp-avoid-aliasing-tunnels-socket-pointer'
Guillaume Nault says:
====================
l2tp: avoid aliasing tunnels socket pointer
We don't need to copy the tunnel's socket pointer in the pseudo-wire
specific session structures. This uselessly complicates the code
and hampers evolution.
This series was part of an effort to protect tunnels socket pointer
with RCU. But since it provides nice cleanup, I submit it separately.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 10 Nov 2017 21:06:37 +0000 (06:06 +0900)]
l2tp: remove the .tunnel_sock field from struct pppol2tp_session
The last user of .tunnel_sock is pppol2tp_connect() which defensively
uses it to verify internal data consistency.
This check isn't necessary: l2tp_session_get() guarantees that the
returned session belongs to the tunnel passed as parameter. And
.tunnel_sock is never updated, so checking that it still points to
the parent tunnel socket is useless; that test can never fail.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 10 Nov 2017 21:06:31 +0000 (06:06 +0900)]
l2tp: avoid using ->tunnel_sock for getting session's parent tunnel
Sessions don't need to use l2tp_sock_to_tunnel(xxx->tunnel_sock) for
accessing their parent tunnel. They have the .tunnel field in the
l2tp_session structure for that. Furthermore, in all these cases, the
session is registered, so we're guaranteed that .tunnel isn't NULL and
that the session properly holds a reference on the tunnel.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 10 Nov 2017 21:06:29 +0000 (06:06 +0900)]
l2tp: remove .tunnel_sock from struct l2tp_eth
This field has never been used.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Nov 2017 12:55:16 +0000 (21:55 +0900)]
Merge branch 'dsa-b53-Turn-on-Broadcom-tags'
Florian Fainelli says:
====================
net: dsa: b53: Turn on Broadcom tags
This was long overdue, with this patch series, the b53 driver now
turns on Broadcom tags except for 5325 and 5365 which use an older
format that we do not support yet (TBD).
First patch is necessary in order for bgmac, used on BCM5301X and Northstar
Plus to work correctly and successfully send ARP packets back to the requsester.
Second patch is actually a bug fix, but because net/master and net-next/master
diverge in that area, I am targeting net-next/master here.
Finally, the last patch enables Broadcom tags after checking that the CPU port
selected is either, 5, 7 or 8, since those are the only valid combinations
given currently supported HW.
Changes in v3:
- guarded padding with netdev_uses_dsa() to let the non-DSA use cases
not have a performance hit for smaller packets
- added missing select NET_DSA_TAG_BRCM to drivers/net/dsa/b53/Kconfig
Changes in v2:
- moved a hunk between patch 2 and patch 3 to avoid a bisectability issue
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 10 Nov 2017 19:33:27 +0000 (11:33 -0800)]
net: dsa: b53: Turn on Broadcom tags
Enable Broadcom tags for b53 devices, except 5325 and 5365 which use a
different Broadcom tag format not yet supported by net/dsa/tag_brcm.c.
We also make sure that we can turn on Broadcom tags on a CPU port number
that is capable of that: 5, 7 or 8.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 10 Nov 2017 19:33:26 +0000 (11:33 -0800)]
net: dsa: b53: Stop using dev->cpu_port incorrectly
dev->cpu_port is the driver local information that should only be used
to look up register offsets for a particular port, when they differ
(e.g: IMP port override), but it should certainly not be used in place
of the DSA configured CPU port.
Since the DSA switch layer calls port_vlan_{add,del}() on the CPU port
as well, we can remove the specific setting of the CPU port within
port_vlan_{add,del}.
Fixes: ff39c2d68679 ("net: dsa: b53: Add bridge support")
Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 10 Nov 2017 19:33:25 +0000 (11:33 -0800)]
net: bgmac: Pad packets to a minimum size
In preparation for enabling Broadcom tags with b53, pad packets to a
minimum size of 64 bytes (sans FCS) in order for the Broadcom switch to
accept ingressing frames. Without this, we would typically be able to
DHCP, but not resolve with ARP because packets are too small and get
rejected by the switch.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Nov 2017 12:52:01 +0000 (21:52 +0900)]
Merge tag 'linux-can-fixes-for-4.14-
20171110' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2017-11-10
this is a pull request for net/master.
The first patch by Richard Schütz for the c_can driver removes the false
indication to support triple sampling for d_can. Gerhard Bertelsmann's
patch for the sun4i driver improves the RX overrun handling. The patch
by Stephane Grosjean for the peak_canfd driver adds the PCI ids for
various new PCIe/M2 interfaces. Marek Vasut's patch for the ifi driver
fix transmitter delay calculation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Nov 2017 12:50:15 +0000 (21:50 +0900)]
Merge branch 'dsa-lan9303-IGMP-handling'
Egil Hjelmeland says:
====================
net: dsa: lan9303: IGMP handling
Set up the HW switch to trap IGMP packets to CPU port.
And make sure skb->offload_fwd_mark is cleared for incoming IGMP packets.
skb->offload_fwd_mark calculation is a candidate for consolidation into the
DSA core. The calculation can probably be more polished when done at a point
where DSA has updated skb.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Fri, 10 Nov 2017 11:54:35 +0000 (12:54 +0100)]
net: dsa: lan9303: Clear offload_fwd_mark for IGMP
Now that IGMP packets no longer is flooded in HW, we want the SW bridge to
forward packets based on bridge configuration. To make that happen,
IGMP packets must have skb->offload_fwd_mark = 0.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Fri, 10 Nov 2017 11:54:34 +0000 (12:54 +0100)]
net: dsa: lan9303: Set up trapping of IGMP to CPU port
IGMP packets should be trapped to the CPU port. The SW bridge knows
whether to forward to other ports.
With "IGMP snooping for local traffic" merged, IGMP trapping is also
required for stable IGMPv2 operation.
LAN9303 does not trap IGMP packets by default.
Enable IGMP trapping in lan9303_setup.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Fri, 10 Nov 2017 07:33:37 +0000 (13:03 +0530)]
cxgb4: collect vpd info directly from hardware
Collect vpd information directly from hardware instead of software
adapter context. Move EEPROM physical address to virtual address
translation logic to t4_hw.c and update relevant files.
Fixes: 6f92a6544f1a ("cxgb4: collect hardware misc dumps")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Nov 2017 10:40:05 +0000 (19:40 +0900)]
Merge tag 'mlx5-fixes-2017-11-08' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2017-11-08
The following series includes some fixes for mlx5 core and etherent
driver.
Sorry for the late submission but as you can see i have some very
critical fixes below that i would like them merged into this RC.
Please pull and let me know if there is any problem.
For -stable:
('net/mlx5e: Set page to null in case dma mapping fails') kernels >= 4.13
('net/mlx5: FPGA, return -EINVAL if size is zero') kernels >= 4.13
('net/mlx5: Cancel health poll before sending panic teardown command') kernels >= 4.13
V1->V2:
- Fix Reviewed-by tag of the 2nd patch.
- Drop the FPGA 0 size fix, it needs some more change log info.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Fri, 10 Nov 2017 00:43:13 +0000 (16:43 -0800)]
vlan: fix a use-after-free in vlan_device_event()
After refcnt reaches zero, vlan_vid_del() could free
dev->vlan_info via RCU:
RCU_INIT_POINTER(dev->vlan_info, NULL);
call_rcu(&vlan_info->rcu, vlan_info_rcu_free);
However, the pointer 'grp' still points to that memory
since it is set before vlan_vid_del():
vlan_info = rtnl_dereference(dev->vlan_info);
if (!vlan_info)
goto out;
grp = &vlan_info->grp;
Depends on when that RCU callback is scheduled, we could
trigger a use-after-free in vlan_group_for_each_dev()
right following this vlan_vid_del().
Fix it by moving vlan_vid_del() before setting grp. This
is also symmetric to the vlan_vid_add() we call in
vlan_device_event().
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct")
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 9 Nov 2017 23:36:41 +0000 (00:36 +0100)]
net: dsa: mv88e6xxx: Fix stats histogram mode
The statistics histogram mode was not being explicitly initialized on
devices other than the 6390 family. Clearing the statistics then
overwrote the default setting, setting the histogram to a reserved
mode.
Explicitly set the histogram mode for all devices. Change the
statistics clear into a read/modify/write, and since it is now more
complex, move it into global1.c.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Nov 2017 10:33:11 +0000 (19:33 +0900)]
Merge branch 'mv88e6xxx-broadcast-flooding-in-hardware'
Andrew Lunn says:
====================
mv88e6xxx broadcast flooding in hardware
This patchset makes the mv88e6xxx driver perform flooding in hardware,
rather than let the software bridge perform the flooding. This is a
prerequisite for IGMP snooping on the bridge interface.
In order to make hardware broadcasting work, a few other issues need
fixing or improving. SWITCHDEV_ATTR_ID_PORT_PARENT_ID is broken, which
is apparent when testing on the ZII devel board with multiple
switches.
Some of these patches are taken from a previous RFC patchset of IGMP
support.
Rebased onto net-next, with fixup for Vivien's refactoring.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 9 Nov 2017 21:29:56 +0000 (22:29 +0100)]
net: dsa: mv88e6xxx: Flood broadcast frames in hardware
By default, the switch does not flood broadcast frames. Instead the
broadcast address is unknown in the ATU, so the frame gets forwarded
out the cpu port. The software bridge then floods it back to the
individual switch ports which are members of the bridge.
Add an ATU entry in the switch so that it floods broadcast frames out
ports, rather than have the software bridge do it. Also, send a copy
out the cpu port and any dsa ports. Rely on the port vectors to
prevent broadcast frames leaking between bridges, and separated ports.
Additionally, when a VLAN is added, a new FID is allocated. This
represents a new table of ATU entries. A broadcast entry is added to
the new FID.
With offload_fwd_mark being set, the software bridge will not flood
the frames it receives back to the switch.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 9 Nov 2017 21:29:55 +0000 (22:29 +0100)]
net: dsa: mv88e6xxx: Move mv88e6xxx_port_db_load_purge()
This function is going to be needed by a soon to be added new
function. Move it earlier so we can avoid a forward declaration.
No functional changes.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 9 Nov 2017 21:29:54 +0000 (22:29 +0100)]
net: dsa: mv88e6xxx: Print offending port when vlan check fails
When testing if a VLAN is one more than one bridge, we print an error
message that the VLAN is already in use somewhere else. Print both the
new port which would like the VLAN, and the port which already has it,
to aid debugging.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 9 Nov 2017 21:29:53 +0000 (22:29 +0100)]
net: dsa: mv88e6xxx: Fixed port netdev check for VLANs
Having the same VLAN on multiple bridges is currently unsupported as
an offload. mv88e6xxx_port_check_hw_vlan() is used to ensure that a
VLAN is not on multiple bridges when adding a VLAN range to a port. It
loops the ports and checks to see if there are ports in a different
bridge with the same VLAN.
While walking all switch ports, the code was checking if the new port
has a netdev slave attached to it. If not, skip checking the port
being walked. This seems like a typ0. If the new port does not have a
slave, how has a VLAN been added to it in the first place, requiring
this check be performed at all? More likely, we should be checking if
the port being walked has a slave. Without the port having a slave, it
cannot have a VLAN on it, so there is no need to check further for
that particular port.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 9 Nov 2017 21:29:52 +0000 (22:29 +0100)]
net: dsa: {e}dsa: set offload_fwd_mark on received packets
The software bridge needs to know if a packet has already been bridged
by hardware offload to ports in the same hardware offload, in order
that it does not re-flood them, causing duplicates. This is
particularly true for broadcast and multicast traffic which the host
has requested.
By setting offload_fwd_mark in the skb the bridge will only flood to
ports in other offloads and other netifs. Set this flag in the DSA and
EDSA tag driver.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 9 Nov 2017 21:29:51 +0000 (22:29 +0100)]
net: dsa: Fix SWITCHDEV_ATTR_ID_PORT_PARENT_ID
SWITCHDEV_ATTR_ID_PORT_PARENT_ID is used by the software bridge when
determining which ports to flood a packet out. If the packet
originated from a switch, it assumes the switch has already flooded
the packet out the switches ports, so the bridge should not flood the
packet itself out switch ports. Ports on the same switch are expected
to return the same parent ID when SWITCHDEV_ATTR_ID_PORT_PARENT_ID is
called.
DSA gets this wrong with clusters of switches. As far as the software
bridge is concerned, the cluster is all one switch. A packet from any
switch in the cluster can be assumed to have been flooded as needed
out of all ports of the cluster, not just the switch it originated
from. Hence all ports of a cluster should return the same parent. The
old implementation did not, each switch in the cluster had its own ID.
Also wrong was that the ID was not unique if multiple DSA instances
are in operation.
Use the tree ID as the parent ID, which is the same for all switches
in a cluster and unique across switch clusters.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 11 Nov 2017 10:30:16 +0000 (19:30 +0900)]
Merge branch 'ieee802154-for-davem-2017-11-09' of git://git./linux/kernel/git/sschmidt/wpan-next
Stefan Schmidt says:
====================
pull-request: net-next: ieee802154 2017-11-09
A small update on ieee802154 patches for net-next. Nothing dramatic, but simply
housekeeping this time around.
A fix for the correct mask to be applied in the mrf24j40 driver by Gustavo A. R. Silva
Removal of a non existing email user for the ca8210 driver by Harry Morris
A bunch of checkpatch cleanups across the subsystem from myself
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Thu, 9 Nov 2017 17:09:26 +0000 (18:09 +0100)]
bindings: net: stmmac: correctify note about LPI interrupt
There are two different combined signal for various interrupt events:
In EQOS-CORE and EQOS-MTL configurations, mci_intr_o is the interrupt
signal.
In EQOS-DMA, EQOS-AHB and EQOS-AXI configurations, these interrupt events
are combined with the events in the DMA on the sbd_intr_o signal.
Depending on configuration, the device tree irq "macirq" will refer to
either mci_intr_o or sbd_intr_o.
The databook states:
"The MAC generates the LPI interrupt when the Tx or Rx side enters or exits
the LPI state. The interrupt mci_intr_o (sbd_intr_o in certain
configurations) is asserted when the LPI interrupt status is set.
When the MAC exits the Rx LPI state, then in addition to the mci_intr_o
(sbd_intr_o in certain configurations), the sideband signal lpi_intr_o is
asserted.
If you do not want to gate-off the application clock during the Rx LPI
state, you can leave the lpi_intr_o signal unconnected and use the
mci_intr_o (sbd_intr_o in certain configurations) signal to detect Rx LPI
exit."
Since the "macirq" is always raised when Tx or Rx enters/exits the LPI
state, "eth_lpi" must therefore refer to lpi_intr_o, which is only raised
when Rx exits the LPI state. Update the DT binding description to reflect
reality.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keefe Liu [Thu, 9 Nov 2017 12:09:31 +0000 (20:09 +0800)]
ipvlan: fix ipv6 outbound device
When process the outbound packet of ipv6, we should assign the master
device to output device other than input device.
Signed-off-by: Keefe Liu <liuqifa@huawei.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aleksey Makarov [Thu, 9 Nov 2017 11:58:57 +0000 (14:58 +0300)]
net: thunderx: fix double free error
This patch fixes an error in memory allocation/freeing in
ThunderX PF driver.
I moved the allocation to the probe() function and made it managed.
>From the Colin's email:
While running static analysis on linux-next with CoverityScan I found 3
double free errors in the Cavium thunder driver.
The issue occurs on the err_disable_device: label of function nic_probe
when nic_free_lmacmem(nic) is called and a double free occurs on
nic->duplex, nic->link and nic->speed. This occurs when nic_init_hw()
fails:
/* Initialize hardware */
err = nic_init_hw(nic);
if (err)
goto err_release_regions;
nic_init_hw() calls nic_get_hw_info() and this calls nic_free_lmacmem()
if any of the allocations fail. This free'ing occurs again by the call
to nic_free_lmacmem() on the err_release_regions exit path in nic_probe().
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>