Joe Stringer [Mon, 26 Oct 2015 03:21:48 +0000 (20:21 -0700)]
openvswitch: Fix double-free on ip_defrag() errors
If ip_defrag() returns an error other than -EINPROGRESS, then the skb is
freed. When handle_fragments() passes this back up to
do_execute_actions(), it will be freed again. Prevent this double free
by never freeing the skb in do_execute_actions() for errors returned by
ovs_ct_execute. Always free it in ovs_ct_execute() error paths instead.
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 27 Oct 2015 22:06:45 +0000 (15:06 -0700)]
fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key
We were computing the child index in cases where the key value we were
looking for was actually less than the base key of the tnode. As a result
we were getting incorrect index values that would cause us to skip over
some children.
To fix this I have added a test that will force us to use child index 0 if
the key we are looking for is less than the key of the current tnode.
Fixes: 8be33e955cb9 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
Reported-by: Brian Rak <brak@gameservers.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Oct 2015 05:08:22 +0000 (22:08 -0700)]
Merge branch 'net_of_node_put'
Julia Lawall says:
====================
add missing of_node_put
The various for_each device_node iterators performs an of_node_get on each
iteration, so a break out of the loop requires an of_node_put.
The complete semantic patch that fixes this problem is
(http://coccinelle.lip6.fr):
// <smpl>
@r@
local idexpression n;
expression e1,e2;
iterator name for_each_node_by_name, for_each_node_by_type,
for_each_compatible_node, for_each_matching_node,
for_each_matching_node_and_match, for_each_child_of_node,
for_each_available_child_of_node, for_each_node_with_property;
iterator i;
statement S;
expression list [n1] es;
@@
(
(
for_each_node_by_name(n,e1) S
|
for_each_node_by_type(n,e1) S
|
for_each_compatible_node(n,e1,e2) S
|
for_each_matching_node(n,e1) S
|
for_each_matching_node_and_match(n,e1,e2) S
|
for_each_child_of_node(e1,n) S
|
for_each_available_child_of_node(e1,n) S
|
for_each_node_with_property(n,e1) S
)
&
i(es,n,...) S
)
@@
local idexpression r.n;
iterator r.i;
expression e;
expression list [r.n1] es;
@@
i(es,n,...) {
...
(
of_node_put(n);
|
e = n
|
return n;
|
+ of_node_put(n);
? return ...;
)
...
}
@@
local idexpression r.n;
iterator r.i;
expression e;
expression list [r.n1] es;
@@
i(es,n,...) {
...
(
of_node_put(n);
|
e = n
|
+ of_node_put(n);
? break;
)
...
}
... when != n
@@
local idexpression r.n;
iterator r.i;
expression e;
identifier l;
expression list [r.n1] es;
@@
i(es,n,...) {
...
(
of_node_put(n);
|
e = n
|
+ of_node_put(n);
? goto l;
)
...
}
...
l: ... when != n// </smpl>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sun, 25 Oct 2015 13:57:07 +0000 (14:57 +0100)]
net: mv643xx_eth: add missing of_node_put
for_each_available_child_of_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.
A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):
// <smpl>
@@
expression root,e;
local idexpression child;
@@
for_each_available_child_of_node(root, child) {
... when != of_node_put(child)
when != e = child
(
return child;
|
+ of_node_put(child);
? return ...;
)
...
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sun, 25 Oct 2015 13:57:06 +0000 (14:57 +0100)]
ath6kl: add missing of_node_put
for_each_compatible_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.
A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):
// <smpl>
@@
expression e;
local idexpression n;
@@
for_each_compatible_node(n,...) {
... when != of_node_put(n)
when != e = n
(
return n;
|
+ of_node_put(n);
? return ...;
)
...
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sun, 25 Oct 2015 13:57:03 +0000 (14:57 +0100)]
net: phy: mdio: add missing of_node_put
for_each_available_child_of_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.
A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):
// <smpl>
@@
expression root,e;
local idexpression child;
@@
for_each_available_child_of_node(root, child) {
... when != of_node_put(child)
when != e = child
(
return child;
|
+ of_node_put(child);
? return ...;
)
...
}
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sun, 25 Oct 2015 13:57:02 +0000 (14:57 +0100)]
netdev/phy: add missing of_node_put
for_each_available_child_of_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.
A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):
// <smpl>
@@
local idexpression r.n;
expression r,e;
@@
for_each_available_child_of_node(r,n) {
...
(
of_node_put(n);
|
e = n
|
+ of_node_put(n);
? break;
)
...
}
... when != n
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sun, 25 Oct 2015 13:57:01 +0000 (14:57 +0100)]
net: netcp: add missing of_node_put
for_each_child_of_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.
A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):
// <smpl>
@@
local idexpression r.n;
expression r,e;
@@
for_each_child_of_node(r,n) {
...
(
of_node_put(n);
|
e = n
|
+ of_node_put(n);
? break;
)
...
}
... when != n
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Sun, 25 Oct 2015 13:57:00 +0000 (14:57 +0100)]
net: thunderx: add missing of_node_put
for_each_child_of_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.
A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):
// <smpl>
@@
local idexpression r.n;
expression r,e;
@@
for_each_child_of_node(r,n) {
...
(
of_node_put(n);
|
e = n
|
+ of_node_put(n);
? break;
)
...
}
... when != n
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 24 Oct 2015 12:47:44 +0000 (05:47 -0700)]
ipv6: gre: support SIT encapsulation
gre_gso_segment() chokes if SIT frames were aggregated by GRO engine.
Fixes: 61c1db7fae21e ("ipv6: sit: add GSO/TSO support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 27 Oct 2015 01:32:22 +0000 (18:32 -0700)]
Merge branch 'sh_eth-fixes'
Sergei Shtylyov says:
====================
sh_eth: RX buffer alignment fixes
Here's a set of 2 patches against DaveM's 'net.git' repo which are the
fixes to the RX buffer size calculation.
[1/2] sh_eth: fix RX buffer size alignment
[2/2] sh_eth: fix RX buffer size calculation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 23 Oct 2015 21:46:40 +0000 (00:46 +0300)]
sh_eth: fix RX buffer size calculation
The RX buffer size calulation failed to account for the length granularity
(which is now 32 bytes)...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 23 Oct 2015 21:46:03 +0000 (00:46 +0300)]
sh_eth: fix RX buffer size alignment
Both Renesas R-Car and RZ/A1 manuals state that RX buffer length must be
a multiple of 32 bytes, while the driver only uses 16 byte granularity...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 26 Oct 2015 01:28:23 +0000 (18:28 -0700)]
Merge branch 'gianfar-fixes'
Claudiu Manoil says:
====================
gianfar: Misc. fixes and updates
Various fixes for some older issues, including having a
MAINTAINERS entry for this driver.
I'd recommend applying them on top of net, thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 23 Oct 2015 08:42:01 +0000 (11:42 +0300)]
MAINTAINERS: Add entry for gianfar ethernet driver
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 23 Oct 2015 08:42:00 +0000 (11:42 +0300)]
gianfar: Fix Rx BSY error handling
The Rx BSY error interrupt indicates that a frame was
received and discarded due to lack of buffers, so it's
a rx ring overflow condition and has nothing to do with
with bad rx packets. Use the right counter.
BSY conditions happen when the SoC is under performance
stress. Doing *more* work in stress situations by trying
to schedule NAPI is not a good idea as the stressed system
becomes still more stressed. The Rx interrupt is already
at work making sure the NAPI is scheduled.
So calling gfar_receive() here does not help. This issue
was present since day 1.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 23 Oct 2015 08:41:59 +0000 (11:41 +0300)]
gianfar: Don't enable the Filer w/o the Parser
Under one unusual circumstance it's possible to wrongly set
FILREN without enabling PRSDEP as well in the RCTRL register,
against the hardware specifications. With the default config
this does not happen because the default Rx offloads (Rx csum
and Rx VLAN) properly enable PRSDEP. But if anyone disables
all these offloads (via ethtool), we get a wrong configuration
were the Rx flow classification and hashing, and other Filer
based features (e.g. wake-on-filer interrupt) won't work.
This patch fixes the issue.
Also, account for Rx FCB insertion which happens every time
PRSDEP is set.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 23 Oct 2015 08:41:58 +0000 (11:41 +0300)]
gianfar: Remove duplicated argument to bitwise OR
RQFCR_AND is duplicated.
Add missing space as well.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 26 Oct 2015 01:13:37 +0000 (18:13 -0700)]
Merge branch 'thunderx-fixes'
David Daney says:
====================
net: thunderx: Support pass-2 revision hardware.
With the availability of a new revision of the ThunderX NIC hardware a
few changes to the driver are required. With these, the driver works
on all currently available hardware revisions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanneeru Srinivasulu [Sat, 24 Oct 2015 00:14:10 +0000 (17:14 -0700)]
net: thunderx: Incorporate pass2 silicon CPI index configuration changes
Add support for ThunderX pass2 CPI and MPI configuration changes.
MPI_ALG is not enabled i.e MCAM parsing is disabled.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Sat, 24 Oct 2015 00:14:09 +0000 (17:14 -0700)]
net: thunderx: Rewrite silicon revision tests.
The test for pass-1 silicon was incorrect, it should be for all
revisions less than 8. Also the revision is already present in the
pci_dev, so there is no need to read and keep a private copy.
Remove rev_id and code to read it from struct nicpf. Create new
static inline function pass1_silicon() to be used to testing the
silicon version. Use pass1_silicon() for revision checks, this will
be more widely used in follow on patches.
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sat, 24 Oct 2015 00:14:08 +0000 (17:14 -0700)]
net: thunderx: Fix incorrect subsystem devid of VF on pass2 silicon
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sat, 24 Oct 2015 00:14:07 +0000 (17:14 -0700)]
net: thunderx: Remove PF soft reset.
In some silicon revisions, the soft reset clobbers PCI config space,
so quit doing the reset.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Fri, 23 Oct 2015 12:59:49 +0000 (20:59 +0800)]
net: sysctl: fix a kmemleak warning
the returned buffer of register_sysctl() is stored into net_header
variable, but net_header is not used after, and compiler maybe
optimise the variable out, and lead kmemleak reported the below warning
comm "swapper/0", pid 1, jiffies
4294937448 (age 267.270s)
hex dump (first 32 bytes):
90 38 8b 01 c0 ff ff ff 00 00 00 00 01 00 00 00 .8..............
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
ffffffc00020f134>] create_object+0x10c/0x2a0
[<
ffffffc00070ff44>] kmemleak_alloc+0x54/0xa0
[<
ffffffc0001fe378>] __kmalloc+0x1f8/0x4f8
[<
ffffffc00028e984>] __register_sysctl_table+0x64/0x5a0
[<
ffffffc00028eef0>] register_sysctl+0x30/0x40
[<
ffffffc00099c304>] net_sysctl_init+0x20/0x58
[<
ffffffc000994dd8>] sock_init+0x10/0xb0
[<
ffffffc0000842e0>] do_one_initcall+0x90/0x1b8
[<
ffffffc000966bac>] kernel_init_freeable+0x218/0x2f0
[<
ffffffc00070ed6c>] kernel_init+0x1c/0xe8
[<
ffffffc000083bfc>] ret_from_fork+0xc/0x50
[<
ffffffffffffffff>] 0xffffffffffffffff <<end check kmemleak>>
Before fix, the objdump result on ARM64:
0000000000000000 <net_sysctl_init>:
0:
a9be7bfd stp x29, x30, [sp,#-32]!
4:
90000001 adrp x1, 0 <net_sysctl_init>
8:
90000000 adrp x0, 0 <net_sysctl_init>
c:
910003fd mov x29, sp
10:
91000021 add x1, x1, #0x0
14:
91000000 add x0, x0, #0x0
18:
a90153f3 stp x19, x20, [sp,#16]
1c:
12800174 mov w20, #0xfffffff4 // #-12
20:
94000000 bl 0 <register_sysctl>
24:
b4000120 cbz x0, 48 <net_sysctl_init+0x48>
28:
90000013 adrp x19, 0 <net_sysctl_init>
2c:
91000273 add x19, x19, #0x0
30:
9101a260 add x0, x19, #0x68
34:
94000000 bl 0 <register_pernet_subsys>
38:
2a0003f4 mov w20, w0
3c:
35000060 cbnz w0, 48 <net_sysctl_init+0x48>
40:
aa1303e0 mov x0, x19
44:
94000000 bl 0 <register_sysctl_root>
48:
2a1403e0 mov w0, w20
4c:
a94153f3 ldp x19, x20, [sp,#16]
50:
a8c27bfd ldp x29, x30, [sp],#32
54:
d65f03c0 ret
After:
0000000000000000 <net_sysctl_init>:
0:
a9bd7bfd stp x29, x30, [sp,#-48]!
4:
90000000 adrp x0, 0 <net_sysctl_init>
8:
910003fd mov x29, sp
c:
a90153f3 stp x19, x20, [sp,#16]
10:
90000013 adrp x19, 0 <net_sysctl_init>
14:
91000000 add x0, x0, #0x0
18:
91000273 add x19, x19, #0x0
1c:
f90013f5 str x21, [sp,#32]
20:
aa1303e1 mov x1, x19
24:
12800175 mov w21, #0xfffffff4 // #-12
28:
94000000 bl 0 <register_sysctl>
2c:
f9002260 str x0, [x19,#64]
30:
b40001a0 cbz x0, 64 <net_sysctl_init+0x64>
34:
90000014 adrp x20, 0 <net_sysctl_init>
38:
91000294 add x20, x20, #0x0
3c:
9101a280 add x0, x20, #0x68
40:
94000000 bl 0 <register_pernet_subsys>
44:
2a0003f5 mov w21, w0
48:
35000080 cbnz w0, 58 <net_sysctl_init+0x58>
4c:
aa1403e0 mov x0, x20
50:
94000000 bl 0 <register_sysctl_root>
54:
14000004 b 64 <net_sysctl_init+0x64>
58:
f9402260 ldr x0, [x19,#64]
5c:
94000000 bl 0 <unregister_sysctl_table>
60:
f900227f str xzr, [x19,#64]
64:
2a1503e0 mov w0, w21
68:
f94013f5 ldr x21, [sp,#32]
6c:
a94153f3 ldp x19, x20, [sp,#16]
70:
a8c37bfd ldp x29, x30, [sp],#48
74:
d65f03c0 ret
Add the possible error handle to free the net_header to remove the
kmemleak warning
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Thu, 22 Oct 2015 14:57:10 +0000 (16:57 +0200)]
ppp: fix pppoe_dev deletion condition in pppoe_release()
We can't rely on PPPOX_ZOMBIE to decide whether to clear po->pppoe_dev.
PPPOX_ZOMBIE can be set by pppoe_disc_rcv() even when po->pppoe_dev is
NULL. So we have no guarantee that (sk->sk_state & PPPOX_ZOMBIE) implies
(po->pppoe_dev != NULL).
Since we're releasing a PPPoE socket, we want to release the pppoe_dev
if it exists and reset sk_state to PPPOX_DEAD, no matter the previous
value of sk_state. So we can just check for po->pppoe_dev and avoid any
assumption on sk->sk_state.
Fixes: 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Thu, 22 Oct 2015 03:35:05 +0000 (11:35 +0800)]
af_key: fix two typos
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Wed, 21 Oct 2015 20:37:05 +0000 (15:37 -0500)]
amd-xgbe: Use wmb before updating current descriptor count
The code currently uses the lightweight dma_wmb barrier before updating
the current descriptor count. Under heavy load, the Tx cleanup routine
was seeing the updated current descriptor count before the updated
descriptor information. As a result, the Tx descriptor was being cleaned
up before it was used because it was not "owned" by the hardware yet,
resulting in a Tx queue hang.
Using the wmb barrier insures that the descriptor is updated before the
descriptor counter preventing the Tx queue hang. For extra insurance,
the Tx cleanup routine is changed to grab the current decriptor count on
entry and uses that initial value in the processing loop rather than
trying to chase the current value.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Sullivan [Wed, 21 Oct 2015 19:17:04 +0000 (14:17 -0500)]
net/phy: micrel: Add workaround for bad autoneg
Very rarely, the KSZ9031 will appear to complete autonegotiation, but
will drop all traffic afterwards. When this happens, the idle error
count will read 0xFF after autonegotiation completes. Reset the PHY
when in that state.
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Oct 2015 09:49:41 +0000 (02:49 -0700)]
Merge branch 'ipv6-overflow-arith'
Hannes Frederic Sowa says:
====================
overflow-arith: begin to add support for overflow builtins functions
I add a new header, linux/overflow-arith.h, as the central place to add
overflow and wrap-around checking functions. The reason I am doing so
is that it can make use of compiler supported builtin functions which
can leverage hardware.
As I need this for a fix in the ipv6 stack, which is also included in
this series, I propose to add it sooner than later over Davem's net
tree. This is also the reason why I start slowly with only the one
function I need at this time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Fri, 16 Oct 2015 09:32:43 +0000 (11:32 +0200)]
ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues
Raw sockets with hdrincl enabled can insert ipv6 extension headers
right into the data stream. In case we need to fragment those packets,
we reparse the options header to find the place where we can insert
the fragment header. If the extension headers exceed the link's MTU we
actually cannot make progress in such a case.
Instead of ending up in broken arithmetic or rounding towards 0 and
entering an endless loop in ip6_fragment, just prevent those cases by
aborting early and signal -EMSGSIZE to user space.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hannes Frederic Sowa [Fri, 16 Oct 2015 09:32:42 +0000 (11:32 +0200)]
overflow-arith: begin to add support for overflow builtin functions
The idea of the overflow-arith.h header is to collect overflow checking
functions in one central place.
If gcc compiler supports the __builtin_overflow_* builtins we use them
because they might give better performance, otherwise the code falls
back to normal overflow checking functions.
The builtin_overflow functions are supported by gcc-5 and clang. The
matter of supporting clang is to just provide a corresponding
CC_HAVE_BUILTIN_OVERFLOW, because the specific overflow checking builtins
don't differ between gcc and clang.
I just provide overflow_usub function here as I intend this to get merged
into net, more functions will definitely follow as they are needed.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Shewmaker [Mon, 19 Oct 2015 04:59:08 +0000 (21:59 -0700)]
tcp: allow dctcp alpha to drop to zero
If alpha is strictly reduced by alpha >> dctcp_shift_g and if alpha is less
than 1 << dctcp_shift_g, then alpha may never reach zero. For example,
given shift_g=4 and alpha=15, alpha >> dctcp_shift_g yields 0 and alpha
remains 15. The effect isn't noticeable in this case below cwnd=137, but
could gradually drive uncongested flows with leftover alpha down to
cwnd=137. A larger dctcp_shift_g would have a greater effect.
This change causes alpha=15 to drop to 0 instead of being decrementing by 1
as it would when alpha=16. However, it requires one less conditional to
implement since it doesn't have to guard against subtracting 1 from 0U. A
decay of 15 is not unreasonable since an equal or greater amount occurs at
alpha >= 240.
Signed-off-by: Andrew G. Shewmaker <agshew@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
lucien [Fri, 23 Oct 2015 07:36:53 +0000 (15:36 +0800)]
ipv6: fix the incorrect return value of throw route
The error condition -EAGAIN, which is signaled by throw routes, tells
the rules framework to walk on searching for next matches. If the walk
ends and we stop walking the rules with the result of a throw route we
have to translate the error conditions to -ENETUNREACH.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 23 Oct 2015 04:57:05 +0000 (00:57 -0400)]
macvtap: unbreak receiving of gro skb with frag list
We don't have fraglist support in TAP_FEATURES. This will lead
software segmentation of gro skb with frag list. Fixes by having
frag list support in TAP_FEATURES.
With this patch single session of netperf receiving were restored from
about 5Gb/s to about 12Gb/s on mlx4.
Fixes
a567dd6252 ("macvtap: simplify usage of tap_features")
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Fri, 23 Oct 2015 01:17:16 +0000 (18:17 -0700)]
openvswitch: Fix egress tunnel info.
While transitioning to netdev based vport we broke OVS
feature which allows user to retrieve tunnel packet egress
information for lwtunnel devices. Following patch fixes it
by introducing ndo operation to get the tunnel egress info.
Same ndo operation can be used for lwtunnel devices and compat
ovs-tnl-vport devices. So after adding such device operation
we can remove similar operation from ovs-vport.
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 23 Oct 2015 02:02:08 +0000 (19:02 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-10-22
This series contains fixes to i40e only.
Jesse provides two small fixes for i40e, first fixes counters that were
being displayed incorrectly due to indexing beyond the array of strings
when printing stats. Then fixed the fact that the driver was printing
a message about not being able to assign VMDq because a lack of MSI-X
vectors, when it was not true. It was due to a line missing that
initialized a variable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jorgen Hansen [Thu, 22 Oct 2015 15:25:25 +0000 (08:25 -0700)]
VSOCK: Fix lockdep issue.
The recent fix for the vsock sock_put issue used the wrong
initializer for the transport spin_lock causing an issue when
running with lockdep checking.
Testing: Verified fix on kernel with lockdep enabled.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Sat, 3 Oct 2015 00:57:21 +0000 (17:57 -0700)]
i40e: fix annoying message
The driver was printing a message about not being able
to assign VMDq because of a lack of MSI-X vectors.
This was because a line was missing that initialized a variable,
simply a merge error.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 3 Oct 2015 02:09:34 +0000 (19:09 -0700)]
i40e: fix stats offsets
The code was setting up stats that were not being initialized.
This caused several counters to be displayed incorrectly, due
to indexing beyond the array of strings when printing stats.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bjørn Mork [Thu, 22 Oct 2015 12:15:58 +0000 (14:15 +0200)]
qmi_wwan: add Sierra Wireless MC74xx/EM74xx
New device IDs shamelessly lifted from the vendor driver.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Oct 2015 14:46:05 +0000 (07:46 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2015-10-22
1) Fix IPsec pre-encap fragmentation for GSO packets.
From Herbert Xu.
2) Fix some header checks in _decode_session6.
We skip the header informations if the data pointer points
already behind the header in question for some protocols.
This is because we call pskb_may_pull with a negative value
converted to unsigened int from pskb_may_pull in this case.
Skipping the header informations can lead to incorrect policy
lookups. From Mathias Krause.
3) Allow to change the replay threshold and expiry timer of a
state without having to set other attributes like replay
counter and byte lifetime. Changing these other attributes
may break the SA. From Michael Rossberg.
4) Fix pmtu discovery for local generated packets.
We may fail dispatch to the inner address family.
As a reault, the local error handler is not called
and the mtu value is not reported back to userspace.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 21 Oct 2015 15:42:22 +0000 (08:42 -0700)]
net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set
741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
adds the RT6_LOOKUP_F_IFACE flag to make device index mismatch fatal if
oif is given. Hajime reported that this change breaks the Mobile IPv6
use case that wants to force the message through one interface yet use
the source address from another interface. Handle this case by only
adding the flag if oif is set and saddr is not set.
Fixes: 741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
Cc: Hajime Tazaki <thehajime@gmail.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Oct 2015 14:23:25 +0000 (07:23 -0700)]
Merge branch 'isdn-null-deref'
Karsten Keil says:
====================
Fix potential NULL pointer access and memory leak in ISDN layer2 functions
Insu Yun did brinup the issue with not checking the skb_clone() return
value in the layer2 I-frame ull functions.
This series fix the issue in a way which avoid protocol violations/data loss
on a temporary memory shortage.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Keil [Wed, 21 Oct 2015 12:18:39 +0000 (14:18 +0200)]
mISDN: fix OOM condition for sending queued I-Frames
The old code did not check the return value of skb_clone().
The extra skb_clone() is not needed at all, if using skb_realloc_headroom()
instead, which gives us a private copy with enough headroom as well.
We need to requeue the original skb if the call failed, because we cannot
inform upper layers about the data loss. Restructure the code to minimise
rollback effort if it happens.
This fix kernel bug #86091
Thanks to Insu Yun <wuninsu@gmail.com> to remind me on this issue.
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karsten Keil [Wed, 21 Oct 2015 12:18:38 +0000 (14:18 +0200)]
ISDN: fix OOM condition for sending queued I-Frames
The skb_clone() return value was not checked and the skb_realloc_headroom()
usage was wrong, the old skb was not freed. It turned out, that the
skb_clone is not needed at all, the skb_realloc_headroom() will create a
private copy with enough headroom and the original SKB can be used for the
ACK queue.
We need to requeue the original skb if the call failed, since the upper
layer cannot be informed about memory shortage.
Thanks to Insu Yun <wuninsu@gmail.com> to remind me on this issue.
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jorgen Hansen [Wed, 21 Oct 2015 11:53:56 +0000 (04:53 -0700)]
VSOCK: sock_put wasn't safe to call in interrupt context
In the vsock vmci_transport driver, sock_put wasn't safe to call
in interrupt context, since that may call the vsock destructor
which in turn calls several functions that should only be called
from process context. This change defers the callling of these
functions to a worker thread. All these functions were
deallocation of resources related to the transport itself.
Furthermore, an unused callback was removed to simplify the
cleanup.
Multiple customers have been hitting this issue when using
VMware tools on vSphere 2015.
Also added a version to the vmci transport module (starting from
1.0.2.0-k since up until now it appears that this module was
sharing version with vsock that is currently at 1.0.1.0-k).
Reviewed-by: Aditya Asarwade <asarwade@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Herrmann [Wed, 21 Oct 2015 09:47:43 +0000 (11:47 +0200)]
netlink: fix locking around NETLINK_LIST_MEMBERSHIPS
Currently, NETLINK_LIST_MEMBERSHIPS grabs the netlink table while copying
the membership state to user-space. However, grabing the netlink table is
effectively a write_lock_irq(), and as such we should not be triggering
page-faults in the critical section.
This can be easily reproduced by the following snippet:
int s = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
void *p = mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
int r = getsockopt(s, 0x10e, 9, p, (void*)((char*)p + 4092));
This should work just fine, but currently triggers EFAULT and a possible
WARN_ON below handle_mm_fault().
Fix this by reducing locking of NETLINK_LIST_MEMBERSHIPS to a read-side
lock. The write-lock was overkill in the first place, and the read-lock
allows page-faults just fine.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew F. Davis [Tue, 20 Oct 2015 21:28:57 +0000 (16:28 -0500)]
net: phy: dp83848: Add TI DP83848 Ethernet PHY
Add support for the TI DP83848 Ethernet PHY device.
The DP83848 is a highly reliable, feature rich, IEEE 802.3 compliant
single port 10/100 Mb/s Ethernet Physical Layer Transceiver supporting
the MII and RMII interfaces.
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hans de Goede [Tue, 20 Oct 2015 08:42:24 +0000 (10:42 +0200)]
net: sun4i-emac: Properly free resources on probe failure and remove
Fix sun4i-emac not releasing the following resources:
-iomapped memory not released on probe-failure nor on remove
-clock not getting disabled on probe-failure nor on remove
-sram not being released on remove
And while at it also add error checking to the clk_prepare_enable call
done on probe.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Tue, 20 Oct 2015 02:19:00 +0000 (19:19 -0700)]
openvswitch: Serialize nested ct actions if provided
If userspace provides a ct action with no nested mark or label, then the
storage for these fields is zeroed. Later when actions are requested,
such zeroed fields are serialized even though userspace didn't
originally specify them. Fix the behaviour by ensuring that no action is
serialized in this case, and reject actions where userspace attempts to
set these fields with mask=0. This should make netlink marshalling
consistent across deserialization/reserialization.
Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Tue, 20 Oct 2015 02:18:59 +0000 (19:18 -0700)]
openvswitch: Mark connections new when not confirmed.
New, related connections are marked as such as part of ovs_ct_lookup(),
but they are not marked as "new" if the commit flag is used. Make this
consistent by setting the "new" flag whenever !nf_ct_is_confirmed(ct).
Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Tue, 20 Oct 2015 02:18:58 +0000 (19:18 -0700)]
openvswitch: Clarify conntrack COMMIT behaviour
The presence of this attribute does not modify the ct_state for the
current packet, only future packets. Make this more clear in the header
definition.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Tue, 20 Oct 2015 02:18:57 +0000 (19:18 -0700)]
openvswitch: Reject ct_state masks for unknown bits
Currently, 0-bits are generated in ct_state where the bit position is
undefined, and matches are accepted on these bit-positions. If userspace
requests to match the 0-value for this bit then it may expect only a
subset of traffic to match this value, whereas currently all packets
will have this bit set to 0. Fix this by rejecting such masks.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renato Westphal [Mon, 19 Oct 2015 20:51:34 +0000 (18:51 -0200)]
tcp: remove improper preemption check in tcp_xmit_probe_skb()
Commit
e520af48c7e5a introduced the following bug when setting the
TCP_REPAIR sockoption:
[ 2860.657036] BUG: using __this_cpu_add() in preemptible [
00000000] code: daemon/12164
[ 2860.657045] caller is __this_cpu_preempt_check+0x13/0x20
[ 2860.657049] CPU: 1 PID: 12164 Comm: daemon Not tainted 4.2.3 #1
[ 2860.657051] Hardware name: Dell Inc. PowerEdge R210 II/0JP7TR, BIOS 2.0.5 03/13/2012
[ 2860.657054]
ffffffff81c7f071 ffff880231e9fdf8 ffffffff8185d765 0000000000000002
[ 2860.657058]
0000000000000001 ffff880231e9fe28 ffffffff8146ed91 ffff880231e9fe18
[ 2860.657062]
ffffffff81cd1a5d ffff88023534f200 ffff8800b9811000 ffff880231e9fe38
[ 2860.657065] Call Trace:
[ 2860.657072] [<
ffffffff8185d765>] dump_stack+0x4f/0x7b
[ 2860.657075] [<
ffffffff8146ed91>] check_preemption_disabled+0xe1/0xf0
[ 2860.657078] [<
ffffffff8146edd3>] __this_cpu_preempt_check+0x13/0x20
[ 2860.657082] [<
ffffffff817e0bc7>] tcp_xmit_probe_skb+0xc7/0x100
[ 2860.657085] [<
ffffffff817e1e2d>] tcp_send_window_probe+0x2d/0x30
[ 2860.657089] [<
ffffffff817d1d8c>] do_tcp_setsockopt.isra.29+0x74c/0x830
[ 2860.657093] [<
ffffffff817d1e9c>] tcp_setsockopt+0x2c/0x30
[ 2860.657097] [<
ffffffff81767b74>] sock_common_setsockopt+0x14/0x20
[ 2860.657100] [<
ffffffff817669e1>] SyS_setsockopt+0x71/0xc0
[ 2860.657104] [<
ffffffff81865172>] entry_SYSCALL_64_fastpath+0x16/0x75
Since tcp_xmit_probe_skb() can be called from process context, use
NET_INC_STATS() instead of NET_INC_STATS_BH().
Fixes: e520af48c7e5 ("tcp: add TCPWinProbe and TCPKeepAlive SNMP counters")
Signed-off-by: Renato Westphal <renatow@taghos.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Oct 2015 02:26:17 +0000 (19:26 -0700)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains four Netfilter fixes for net, they are:
1) Fix Kconfig dependencies of new nf_dup_ipv4 and nf_dup_ipv6.
2) Remove bogus test nh_scope in IPv4 rpfilter match that is breaking
--accept-local, from Xin Long.
3) Wait for RCU grace period after dropping the pending packets in the
nfqueue, from Florian Westphal.
4) Fix sleeping allocation while holding spin_lock_bh, from Nikolay Borisov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Mon, 19 Oct 2015 15:43:11 +0000 (11:43 -0400)]
tipc: conditionally expand buffer headroom over udp tunnel
In commit
d999297c3dbbe ("tipc: reduce locking scope during packet reception")
we altered the packet retransmission function. Since then, when
restransmitting packets, we create a clone of the original buffer
using __pskb_copy(skb, MIN_H_SIZE), where MIN_H_SIZE is the size of
the area we want to have copied, but also the smallest possible TIPC
packet size. The value of MIN_H_SIZE is 24.
Unfortunately, __pskb_copy() also has the effect that the headroom
of the cloned buffer takes the size MIN_H_SIZE. This is too small
for carrying the packet over the UDP tunnel bearer, which requires
a minimum headroom of 28 bytes. A change to just use pskb_copy()
lets the clone inherit the original headroom of 80 bytes, but also
assumes that the copied data area is of at least that size, something
that is not always the case. So that is not a viable solution.
We now fix this by adding a check for sufficient headroom in the
transmit function of udp_media.c, and expanding it when necessary.
Fixes: commit d999297c3dbbe ("tipc: reduce locking scope during packet reception")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Schwab [Mon, 19 Oct 2015 15:37:13 +0000 (17:37 +0200)]
net: cavium: change NET_VENDOR_CAVIUM to bool
CONFIG_NET_VENDOR_CAVIUM is only used to hide/show config options and to
include subdirectories in the build, so it doesn't make sense to make it
tristate.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Mon, 19 Oct 2015 15:33:00 +0000 (11:33 -0400)]
tipc: allow non-linear first fragment buffer
The current code for message reassembly is erroneously assuming that
the the first arriving fragment buffer always is linear, and then goes
ahead resetting the fragment list of that buffer in anticipation of
more arriving fragments.
However, if the buffer already happens to be non-linear, we will
inadvertently drop the already attached fragment list, and later
on trig a BUG() in __pskb_pull_tail().
We see this happen when running fragmented TIPC multicast across UDP,
something made possible since
commit
d0f91938bede ("tipc: add ip/udp media type")
We fix this by not resetting the fragment list when the buffer is non-
linear, and by initiatlizing our private fragment list tail pointer to
the tail of the existing fragment list.
Fixes: commit d0f91938bede ("tipc: add ip/udp media type")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Morse [Mon, 19 Oct 2015 15:31:55 +0000 (16:31 +0100)]
openvswitch: Allocate memory for ovs internal device stats.
"openvswitch: Remove vport stats" removed the per-vport statistics, in
order to use the netdev's statistics fields.
"openvswitch: Fix ovs_vport_get_stats()" fixed the export of these stats
to user-space, by using the provided netdev_ops to collate them - but ovs
internal devices still use an unallocated dev->tstats field to count
packets, which are no longer exported by this api.
Allocate the dev->tstats field for ovs internal devices, and wire up
ndo_get_stats64 with the original implementation of
ovs_vport_get_stats().
On its own, "openvswitch: Fix ovs_vport_get_stats()" fixes the OOPs,
unmasking a full-on panic on arm64:
=============%<==============
[<
ffffffbffc00ce4c>] internal_dev_recv+0xa8/0x170 [openvswitch]
[<
ffffffbffc0008b4>] do_output.isra.31+0x60/0x19c [openvswitch]
[<
ffffffbffc000bf8>] do_execute_actions+0x208/0x11c0 [openvswitch]
[<
ffffffbffc001c78>] ovs_execute_actions+0xc8/0x238 [openvswitch]
[<
ffffffbffc003dfc>] ovs_packet_cmd_execute+0x21c/0x288 [openvswitch]
[<
ffffffc0005e8c5c>] genl_family_rcv_msg+0x1b0/0x310
[<
ffffffc0005e8e60>] genl_rcv_msg+0xa4/0xe4
[<
ffffffc0005e7ddc>] netlink_rcv_skb+0xb0/0xdc
[<
ffffffc0005e8a94>] genl_rcv+0x38/0x50
[<
ffffffc0005e76c0>] netlink_unicast+0x164/0x210
[<
ffffffc0005e7b70>] netlink_sendmsg+0x304/0x368
[<
ffffffc0005a21c0>] sock_sendmsg+0x30/0x4c
[SNIP]
Kernel panic - not syncing: Fatal exception in interrupt
=============%<==============
Fixes: 8c876639c985 ("openvswitch: Remove vport stats.")
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 19 Oct 2015 15:26:05 +0000 (08:26 -0700)]
net: Really fix vti6 with oif in dst lookups
6e28b000825d ("net: Fix vti use case with oif in dst lookups for IPv6")
is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them.
Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Mon, 19 Oct 2015 13:21:37 +0000 (09:21 -0400)]
tipc: extend broadcast link window size
The default fix broadcast window size is currently set to 20 packets.
This is a very low value, set at a time when we were still testing on
10 Mb/s hubs, and a change to it is long overdue.
Commit
7845989cb4b3da1db ("net: tipc: fix stall during bclink wakeup procedure")
revealed a problem with this low value. For messages of importance LOW,
the backlog queue limit will be calculated to 30 packets, while a
single, maximum sized message of 66000 bytes, carried across a 1500 MTU
network consists of 46 packets.
This leads to the following scenario (among others leading to the same
situation):
1: Msg 1 of 46 packets is sent. 20 packets go to the transmit queue, 26
packets to the backlog queue.
2: Msg 2 of 46 packets is attempted sent, but rejected because there is
no more space in the backlog queue at this level. The sender is added
to the wakeup queue with a "pending packets chain size" number of 46.
3: Some packets in the transmit queue are acked and released. We try to
wake up the sender, but the pending size of 46 is bigger than the LOW
wakeup limit of 30, so this doesn't happen.
5: Subsequent acks releases all the remaining buffers. Each time we test
for the wakeup criteria and find that 46 still is larger than 30,
even after both the transmit and the backlog queues are empty.
6: The sender is never woken up and given a chance to send its message.
He is stuck.
We could now loosen the wakeup criteria (used by link_prepare_wakeup())
to become equal to the send criteria (used by tipc_link_xmit()), i.e.,
by ignoring the "pending packets chain size" value altogether, or we can
just increase the queue limits so that the criteria can be satisfied
anyway. There are good reasons (potentially multiple waiting senders) to
not opt for the former solution, so we choose the latter one.
This commit fixes the problem by giving the broadcast link window a
default value of 50 packets. We also introduce a new minimum link
window size BCLINK_MIN_WIN of 32, which is enough to always avoid the
described situation. Finally, in order to not break any existing users
which may set the window explicitly, we enforce that the window is set
to the new minimum value in case the user is trying to set it to
anything lower.
Fixes: 7845989cb4b3da1db ("net: tipc: fix stall during bclink wakeup procedure")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 19 Oct 2015 10:16:49 +0000 (13:16 +0300)]
irda: precedence bug in irlmp_seq_hb_idx()
This is decrementing the pointer, instead of the value stored in the
pointer. KASan detects it as an out of bounds reference.
Reported-by: "Berry Cheng 程君(成淼)" <chengmiao.cj@alibaba-inc.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Jin [Mon, 19 Oct 2015 05:37:17 +0000 (13:37 +0800)]
xen-netfront: update num_queues to real created
Sometimes xennet_create_queues() may failed to created all requested
queues, we need to update num_queues to real created to avoid NULL
pointer dereference.
Signed-off-by: Joe Jin <joe.jin@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Sun, 18 Oct 2015 15:35:56 +0000 (23:35 +0800)]
vsock: fix missing cleanup when misc_register failed
reset transport and unlock if misc_register failed.
Signed-off-by: Gao feng <omarapazanadi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 21 Oct 2015 14:36:51 +0000 (07:36 -0700)]
Merge branch 'mv643xx-fixes'
Philipp Kirchhofer says:
====================
net: mv643xx_eth: TSO TX data corruption fixes
as previously discussed [1] the mv643xx_eth driver has some
issues with data corruption when using TCP segmentation offload (TSO).
The following patch set improves this situation by fixing two data
corruption bugs in the TSO TX path.
Before applying the patches repeatedly accessing large files located on
a SMB share on my NSA325 NAS with TSO enabled resulted in different
hash sums, which confirmed that data corruption is happening during
file transfer. After applying the patches the hash sums were the same.
As this is my first patch submission please feel free to point out any
issues with the patch set.
[1] http://thread.gmane.org/gmane.linux.network/336530
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Philipp Kirchhofer [Sun, 18 Oct 2015 14:02:44 +0000 (16:02 +0200)]
net: mv643xx_eth: Defer writing the first TX descriptor when using TSO
To prevent a race between the TX DMA engine and the CPU the writing of the
first transmit descriptor must be deferred until all following descriptors
have been updated. The network card may otherwise start transmitting before
all packet descriptors are set up correctly, which leads to data corruption
or an aborted transmit operation.
This deferral is already done in the non-TSO TX path, implement it also in
the TSO TX path.
Signed-off-by: Philipp Kirchhofer <philipp@familie-kirchhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philipp Kirchhofer [Sun, 18 Oct 2015 14:02:43 +0000 (16:02 +0200)]
net: mv643xx_eth: Ensure proper data alignment in TSO TX path
The TX DMA engine requires that buffers with a size of 8 bytes or smaller
must be 64 bit aligned. This requirement may be violated when doing TSO,
as in this case larger skb frags can be broken up and transmitted in small
parts with then inappropriate alignment.
Fix this by checking for proper alignment before handing a buffer to the
DMA engine. If the data is misaligned realign it by copying it into the
TSO header data area.
Signed-off-by: Philipp Kirchhofer <philipp@familie-kirchhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 21 Oct 2015 13:41:51 +0000 (06:41 -0700)]
Merge branch 'smsc-energy-detect'
Heiko Schocher says:
====================
net, phy, smsc: add posibility to disable energy detect mode
On some boards the energy enable detect mode leads in
trouble with some switches, so make the enabling of
this mode configurable through DT.
Therefore the property "smsc,disable-energy-detect" is
introduced.
Patch 1 introduces phy-handle support for the ti,cpsw
driver. This is needed now for the smsc phy.
Patch 2 adds the disable energy mode functionality
to the smsc phy
Changes in v2:
- add comments from Florian Fainelli
- I did not change disable property name into enable
because I fear to break existing behaviour
- add smsc vendor prefix
- remove CONFIG_OF and use __maybe_unused
- introduce "phy-handle" ability into ti,cpsw
driver, so I can remove bogus:
if (!of_node && dev->parent->of_node)
of_node = dev->parent->of_node;
construct. Therefore new patch for the ti,cpsw
driver is necessary.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiko Schocher [Sat, 17 Oct 2015 04:04:36 +0000 (06:04 +0200)]
net: phy: smsc: disable energy detect mode
On some boards the energy enable detect mode leads in
trouble with some switches, so make the enabling of
this mode configurable through DT.
Signed-off-by: Heiko Schocher <hs@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiko Schocher [Sat, 17 Oct 2015 04:04:35 +0000 (06:04 +0200)]
drivers: net: cpsw: add phy-handle parsing
add the ability to parse "phy-handle". This
is needed for phys, which have a DT node, and
need to parse DT properties.
Signed-off-by: Heiko Schocher <hs@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Arlott [Thu, 15 Oct 2015 20:00:22 +0000 (21:00 +0100)]
bcm63xx_enet: check 1000BASE-T advertisement configuration
If a gigabit ethernet PHY is connected to a fast ethernet MAC,
then it can detect 1000 support from the partner but not use it.
This results in a forced speed of 1000 and RX/TX failure.
Check for 1000BASE-T support and then check the advertisement
configuration before setting the MAC speed to 1000mbit.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 19 Oct 2015 16:55:40 +0000 (09:55 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Account for extra headroom in ath9k driver, from Felix Fietkau.
2) Fix OOPS in pppoe driver due to incorrect socket state transition,
from Guillaume Nault.
3) Kill memory leak in amd-xgbe debugfx, from Geliang Tang.
4) Power management fixes for iwlwifi, from Johannes Berg.
5) Fix races in reqsk_queue_unlink(), from Eric Dumazet.
6) Fix dst_entry usage in ARP replies, from Jiri Benc.
7) Cure OOPSes with SO_GET_FILTER, from Daniel Borkmann.
8) Missing allocation failure check in amd-xgbe, from Tom Lendacky.
9) Various resource allocation/freeing cures in DSA< from Neil
Armstrong.
10) A series of bug fixes in the openvswitch conntrack support, from
Joe Stringer.
11) Fix two cases (BPF and act_mirred) where we have to clean the sender
cpu stored in the SKB before transmitting. From WANG Cong and
Alexei Starovoitov.
12) Disable VLAN filtering in promiscuous mode in mlx5 driver, from
Achiad Shochat.
13) Older bnx2x chips cannot do 4-tuple UDP hashing, so prevent this
configuration via ethtool. From Yuval Mintz.
14) Don't call rt6_uncached_list_flush_dev() from rt6_ifdown() when
'dev' is NULL, from Eric Biederman.
15) Prevent stalled link synchronization in tipc, from Jon Paul Maloy.
16) kcalloc() gstrings ethtool buffer before having driver fill it in,
in order to prevent kernel memory leaking. From Joe Perches.
17) Fix mixxing rt6_info initialization for blackhole routes, from
Martin KaFai Lau.
18) Kill VLAN regression in via-rhine, from Andrej Ota.
19) Missing pfmemalloc check in sk_add_backlog(), from Eric Dumazet.
20) Fix spurious MSG_TRUNC signalling in netlink dumps, from Ronen Arad.
21) Scrube SKBs when pushing them between namespaces in openvswitch,
from Joe Stringer.
22) bcmgenet enables link interrupts too early, fix from Florian
Fainelli.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits)
net: bcmgenet: Fix early link interrupt enabling
tunnels: Don't require remote endpoint or ID during creation.
openvswitch: Scrub skb between namespaces
xen-netback: correctly check failed allocation
net: asix: add support for the Billionton GUSB2AM-1G-B USB adapter
netlink: Trim skb to alloc size to avoid MSG_TRUNC
net: add pfmemalloc check in sk_add_backlog()
via-rhine: fix VLAN receive handling regression.
ipv6: Initialize rt6_info properly in ip6_blackhole_route()
ipv6: Move common init code for rt6_info to a new function rt6_info_init()
Bluetooth: Fix initializing conn_params in scan phase
Bluetooth: Fix conn_params list update in hci_connect_le_scan_cleanup
Bluetooth: Fix remove_device behavior for explicit connects
Bluetooth: Fix LE reconnection logic
Bluetooth: Fix reference counting for LE-scan based connections
Bluetooth: Fix double scan updates
mlxsw: core: Fix race condition in __mlxsw_emad_transmit
tipc: move fragment importance field to new header position
ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings
tipc: eliminate risk of stalled link synchronization
...
Steffen Klassert [Mon, 19 Oct 2015 08:30:05 +0000 (10:30 +0200)]
xfrm: Fix pmtu discovery for local generated packets.
Commit
044a832a777 ("xfrm: Fix local error reporting crash
with interfamily tunnels") moved the setting of skb->protocol
behind the last access of the inner mode family to fix an
interfamily crash. Unfortunately now skb->protocol might not
be set at all, so we fail dispatch to the inner address family.
As a reault, the local error handler is not called and the
mtu value is not reported back to userspace.
We fix this by setting skb->protocol on message size errors
before we call xfrm_local_error.
Fixes: 044a832a7779c ("xfrm: Fix local error reporting crash with interfamily tunnels")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Florian Fainelli [Sat, 17 Oct 2015 21:22:46 +0000 (14:22 -0700)]
net: bcmgenet: Fix early link interrupt enabling
Link interrupts are enabled in init_umac(), which is too early for us to
process them since we do not yet have a valid PHY device pointer. On
BCM7425 chips for instance, we will crash calling phy_mac_interrupt()
because phydev is NULL.
Fix this by moving the link interrupts enabling in
bcmgenet_netif_start(), under a specific function:
bcmgenet_link_intr_enable() and while at it, update the comments
surrounding the code.
Fixes: 6cc8e6d4dcb36 ("net: bcmgenet: Delay PHY initialization to bcmgenet_open()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 19 Oct 2015 06:05:56 +0000 (23:05 -0700)]
Merge tag 'wireless-drivers-for-davem-2015-10-17' of git://git./linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
iwlwifi:
* mvm: flush fw_dump_wk when mvm fails to start
* mvm: init card correctly on ctkill exit check
* pci: add a few more PCI subvendor IDs for the 7265 series
* fix firmware filename for 3160
* mvm: clear csa countdown when AP is stopped
* mvm: fix D3 firmware PN programming
* dvm: fix D3 firmware PN programming
* mvm: fix D3 CCMP TX PN assignment
rtlwifi:
* rtl8821ae: Fix system lockups on boot
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Fri, 16 Oct 2015 23:36:00 +0000 (16:36 -0700)]
tunnels: Don't require remote endpoint or ID during creation.
Before lightweight tunnels existed, it really didn't make sense to
create a tunnel that was not fully specified, such as without a
destination IP address - the resulting packets would go nowhere.
However, with lightweight tunnels, the opposite is true - it doesn't
make sense to require this information when it will be provided later
on by the route. This loosens the requirements for this information.
An alternative would be to allow the relaxed version only when
COLLECT_METADATA is enabled. However, since there are several
variations on this theme (such as NBMA tunnels in GRE), just dropping
the restrictions seems the most consistent across tunnels and with
the existing configuration.
CC: John Linville <linville@tuxdriver.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Fri, 16 Oct 2015 18:08:18 +0000 (11:08 -0700)]
openvswitch: Scrub skb between namespaces
If OVS receives a packet from another namespace, then the packet should
be scrubbed. However, people have already begun to rely on the behaviour
that skb->mark is preserved across namespaces, so retain this one field.
This is mainly to address information leakage between namespaces when
using OVS internal ports, but by placing it in ovs_vport_receive() it is
more generally applicable, meaning it should not be overlooked if other
port types are allowed to be moved into namespaces in future.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 19 Oct 2015 05:23:33 +0000 (22:23 -0700)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:
====================
pull request: bluetooth 2015-10-16
First of all, sorry for the late set of patches for the 4.3 cycle. We
just finished an intensive week of testing at the Bluetooth UnPlugFest
and discovered (and fixed) issues there. Unfortunately a few issues
affect 4.3-rc5 in a way that they break existing Bluetooth LE mouse and
keyboard support.
The regressions result from supporting LE privacy in conjunction with
scanning for Resolvable Private Addresses before connecting. A feature
that has been tested heavily (including automated unit tests), but sadly
some regressions slipped in. The UnPlugFest with its multitude of test
platforms is a good battle testing ground for uncovering every corner
case.
The patches in this pull request focus only on fixing the regressions in
4.3-rc5. The patches look a bit larger since we also added comments in
the critical sections of the fixes to improve clarity.
I would appreciate if we can get these regression fixes to Linus
quickly. Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Insu Yun [Thu, 15 Oct 2015 18:02:28 +0000 (18:02 +0000)]
xen-netback: correctly check failed allocation
Since vzalloc can be failed in memory pressure,
writes -ENOMEM to xenstore to indicate error.
Signed-off-by: Insu Yun <wuninsu@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chia-Sheng Chang [Thu, 15 Oct 2015 18:00:21 +0000 (02:00 +0800)]
net: asix: add support for the Billionton GUSB2AM-1G-B USB adapter
Just another AX88178-based 10/100/1000 USB-to-Ethernet dongle. This one
shows up in lsusb as: "ID 08dd:0114 Billionton Systems, Inc".
Signed-off-by: Chia-Sheng Chang <changchias@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Luca Ceresoli <luca@lucaceresoli.net>
Cc: Christoph Jaeger <cj@linux.com>
Cc: "Woojung.Huh@microchip.com" <Woojung.Huh@microchip.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Markus Elfring <elfring@users.sourceforge.net>
Cc: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Cc: netdev@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Arad, Ronen [Thu, 15 Oct 2015 08:55:17 +0000 (01:55 -0700)]
netlink: Trim skb to alloc size to avoid MSG_TRUNC
netlink_dump() allocates skb based on the calculated min_dump_alloc or
a per socket max_recvmsg_len.
min_alloc_size is maximum space required for any single netdev
attributes as calculated by rtnl_calcit().
max_recvmsg_len tracks the user provided buffer to netlink_recvmsg.
It is capped at 16KiB.
The intention is to avoid small allocations and to minimize the number
of calls required to obtain dump information for all net devices.
netlink_dump packs as many small messages as could fit within an skb
that was sized for the largest single netdev information. The actual
space available within an skb is larger than what is requested. It could
be much larger and up to near 2x with align to next power of 2 approach.
Allowing netlink_dump to use all the space available within the
allocated skb increases the buffer size a user has to provide to avoid
truncaion (i.e. MSG_TRUNG flag set).
It was observed that with many VLANs configured on at least one netdev,
a larger buffer of near 64KiB was necessary to avoid "Message truncated"
error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when
min_alloc_size was only little over 32KiB.
This patch trims skb to allocated size in order to allow the user to
avoid truncation with more reasonable buffer size.
Signed-off-by: Ronen Arad <ronen.arad@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 18 Oct 2015 23:08:42 +0000 (16:08 -0700)]
Linux 4.3-rc6
Linus Torvalds [Sun, 18 Oct 2015 19:07:48 +0000 (12:07 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Here are some bugfixes for the I2C subsystem.
Kieran found a flaw in the recently renewed wake irq handling. Mika
handled a user bug report where the ACPI info turned out to be
unusable. I updated MAINTAINERS so that such bug reports will sooner
get to the right people. Geert pointed me to a problem of some i2c
drivers regarding PM which I fixed"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: designware: Do not use parameters from ACPI on Dell Inspiron 7348
MAINTAINERS: add maintainers for Synopsis Designware I2C drivers
i2c: designware-platdrv: enable RuntimePM before registering to the core
i2c: s3c2410: enable RuntimePM before registering to the core
i2c: rcar: enable RuntimePM before registering to the core
i2c: return probe deferred status on dev_pm_domain_attach
Mika Westerberg [Thu, 24 Sep 2015 09:06:54 +0000 (12:06 +0300)]
i2c: designware: Do not use parameters from ACPI on Dell Inspiron 7348
ACPI SSCN/FMCN methods were originally added because then the platform can
provide the most accurate HCNT/LCNT values to the driver. However, this
seems not to be true for Dell Inspiron 7348 where using these causes the
touchpad to fail in boot:
i2c_hid i2c-DLL0675:00: failed to retrieve report from device.
i2c_designware INT3433:00: i2c_dw_handle_tx_abort: lost arbitration
i2c_hid i2c-DLL0675:00: failed to retrieve report from device.
i2c_designware INT3433:00: controller timed out
The values received from ACPI are (in fast mode):
HCNT: 72
LCNT: 160
this translates to following timings (input clock is 100MHz on Broadwell):
tHIGH: 720 ns (spec min 600 ns)
tLOW: 1600 ns (spec min 1300 ns)
Bus period: 2920 ns (assuming 300 ns tf and tr)
Bus speed: 342.5 kHz
Both tHIGH and tLOW are within the I2C specification.
The calculated values when ACPI parameters are not used are (in fast mode):
HCNT: 87
LCNT: 159
which translates to:
tHIGH: 870 ns (spec min 600 ns)
tLOW: 1590 ns (spec min 1300 ns)
Bus period 3060 ns (assuming 300 ns tf and tr)
Bus speed 326.8 kHz
These values are also within the I2C specification.
Since both ACPI and calculated values meet the I2C specification timing
requirements it is hard to say why the touchpad does not function properly
with the ACPI values except that the bus speed is higher in this case (but
still well below the max 400kHz).
Solve this by adding DMI quirk to the driver that disables using ACPI
parameters on this particulare machine.
Reported-by: Pavel Roskin <plroskin@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Pavel Roskin <plroskin@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Linus Torvalds [Sat, 17 Oct 2015 15:47:27 +0000 (08:47 -0700)]
Merge branches 'irq-urgent-for-linus' and 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq/timer fixes from Thomas Gleixner:
"irq: a fix for the new hierarchical MSI interrupt handling which
unbreaks PCI=n configurations.
timers: a fix for the new hrtimer clock offset update mechanism to
ensure that the boot time offset is respected"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/msi: Do not use pci_msi_[un]mask_irq as default methods
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Increment clock_was_set_seq in timekeeping_init()
Eric Dumazet [Wed, 30 Sep 2015 01:52:25 +0000 (18:52 -0700)]
net: add pfmemalloc check in sk_add_backlog()
Greg reported crashes hitting the following check in __sk_backlog_rcv()
BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));
The pfmemalloc bit is currently checked in sk_filter().
This works correctly for TCP, because sk_filter() is ran in
tcp_v[46]_rcv() before hitting the prequeue or backlog checks.
For UDP or other protocols, this does not work, because the sk_filter()
is ran from sock_queue_rcv_skb(), which might be called _after_ backlog
queuing if socket is owned by user by the time packet is processed by
softirq handler.
Fixes: b4b9e35585089 ("netvm: set PF_MEMALLOC as appropriate during SKB processing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Borisov [Fri, 16 Oct 2015 06:40:28 +0000 (09:40 +0300)]
netfilter: ipset: Fix sleeping memory allocation in atomic context
Commit
00590fdd5be0 introduced RCU locking in list type and in
doing so introduced a memory allocation in list_set_add, which
is done in an atomic context, due to the fact that ipset rcu
list modifications are serialised with a spin lock. The reason
why we can't use a mutex is that in addition to modifying the
list with ipset commands, it's also being modified when a
particular ipset rule timeout expires aka garbage collection.
This gc is triggered from set_cleanup_entries, which in turn
is invoked from a timer thus requiring the lock to be bh-safe.
Concretely the following call chain can lead to "sleeping function
called in atomic context" splat:
call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL).
And since GFP_KERNEL allows initiating direct reclaim thus
potentially sleeping in the allocation path.
To fix the issue change the allocation type to GFP_ATOMIC, to
correctly reflect that it is occuring in an atomic context.
Fixes: 00590fdd5be0 ("netfilter: ipset: Introduce RCU locking in list type")
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Sat, 17 Oct 2015 00:39:27 +0000 (17:39 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"Just two small fixups to ads7846 touchscreen controller driver and
Cypress touchpad driver"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: cyapa - fix the copy paste error on electrodes_rx value
Input: ads7846 - correct the value got from SPI
Linus Torvalds [Sat, 17 Oct 2015 00:11:14 +0000 (17:11 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"Just one revert for Armada XP devices: the conversion to
of_clk_get_parent_name() wasn't a direct translation, so we
revert back to of_clk_get() + __clk_get_name().
We could make of_clk_get_parent_name() more robust, but that
may have unintended side-effects, so we'll do that in the
next version"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
Partially revert "clk: mvebu: Convert to clk_hw based provider APIs"
Linus Torvalds [Fri, 16 Oct 2015 20:03:05 +0000 (13:03 -0700)]
Merge tag 'dm-4.3-fixes-3' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Two DM target error path cleanup fixes (one for stable in DM thinp and
one for a v4.3-rc5 thinko in DM snapshot)"
* tag 'dm-4.3-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: fix missing pool reference count decrement in pool_ctr error path
dm snapshot persistent: fix missing cleanup in persistent_ctr error path
Linus Torvalds [Fri, 16 Oct 2015 19:55:34 +0000 (12:55 -0700)]
Merge branch 'for-linus-4.3' of git://git./linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"I have two more bug fixes for btrfs.
My commit fixes a bug we hit last week at FB, a combination of lots of
hard links and an admin command to resolve inode numbers.
Dave is adding checks to make sure balance on current kernels ignores
filters it doesn't understand. The penalty for being wrong is just
doing more work (not crashing etc), but it's a good fix"
* 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix use after free iterating extrefs
btrfs: check unsupported filters in balance arguments
Linus Torvalds [Fri, 16 Oct 2015 19:47:02 +0000 (12:47 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"Just two small items from Ilya:
The first patch fixes the RBD readahead to grab full objects. The
second fixes the write ops to prevent undue promotion when a cache
tier is configured on the server side"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: use writefull op for object size writes
rbd: set max_sectors explicitly
Linus Torvalds [Fri, 16 Oct 2015 19:25:54 +0000 (12:25 -0700)]
Merge tag 'pm+acpi-4.3-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These fix two recent regressions (ACPICA, the generic power domains
framework) and one crash that may happen on specific hardware
supported since 4.1 (intel_pstate).
Specifics:
- Fix a regression introduced by a recent ACPICA cleanup that
uncovered a latent bug (Lv Zheng).
- Fix a recent regression in the generic power domains framework that
may cause it to violate PM QoS latency constraints in some cases
(Ulf Hansson).
- Fix an intel_pstate driver crash on the Knights Landing chips that
do not update the MPERF counter as often as expected by the driver
which may result in a divide by 0 (Srinivas Pandruvada)"
* tag 'pm+acpi-4.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Fix divide by zero on Knights Landing (KNL)
ACPICA: Tables: Fix FADT dependency regression
PM / Domains: Fix validation of latency constraints in genpd governor
Linus Torvalds [Fri, 16 Oct 2015 19:19:11 +0000 (12:19 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Nothing too crazy or exciting:
- two MAINTAINERS entries that I didn't see the point in delaying.
- one drm mst fix to stop sending uninitialised data to monitors
- two amdgpu fixes
- one radeon mst tiling fix
- one vmwgfx regression fix
- one virtio warning fix.
I have found one locking problem that needs a bit of reorg to fix, but
I'm not sure it's worth putting in -fixes as I don't think we've seen
it hit in the real world ever, I just found it using the virtio-gpu
driver when working on it. I'll possibly send it next week once I've
time to discuss with Daniel"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/virtio: use %llu format string form atomic64_t
MAINTAINERS: Add myself as maintainer for the gma500 driver
MAINTAINERS: add a maintainer for the atmel-hlcdc DRM driver
drm/amdgpu: Keep the pflip interrupts always enabled v7
drm/amdgpu: adjust default dispclk (v2)
drm/dp/mst: make mst i2c transfer code more robust.
drm/radeon: attach tile property to mst connector
drm/vmwgfx: Fix kernel NULL pointer dereference on older hardware
Linus Torvalds [Fri, 16 Oct 2015 19:07:43 +0000 (12:07 -0700)]
Merge tag 'powerpc-4.3-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Re-enable CONFIG_SCSI_DH in our defconfigs
- Remove unused os_area_db_id_video_mode
- cxl: fix leak of IRQ names in cxl_free_afu_irqs() from Andrew
- cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API from Andrew
- cxl: fix leak of ctx->mapping when releasing kernel API contexts from Andrew
- cxl: Workaround malformed pcie packets on some cards from Philippe
- cxl: Fix number of allocated pages in SPA from Christophe Lombard
- Fix checkstop in native_hpte_clear() with lockdep from Cyril
- Panic on unhandled Machine Check on powernv from Daniel
- selftests/powerpc: Fix build failure of load_unaligned_zeropad test
* tag 'powerpc-4.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix build failure of load_unaligned_zeropad test
powerpc/powernv: Panic on unhandled Machine Check
powerpc: Fix checkstop in native_hpte_clear() with lockdep
cxl: Fix number of allocated pages in SPA
cxl: Workaround malformed pcie packets on some cards
cxl: fix leak of ctx->mapping when releasing kernel API contexts
cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API
cxl: fix leak of IRQ names in cxl_free_afu_irqs()
powerpc/ps3: Remove unused os_area_db_id_video_mode
powerpc/configs: Re-enable CONFIG_SCSI_DH
Linus Torvalds [Fri, 16 Oct 2015 18:42:37 +0000 (11:42 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"6 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
sh: add copy_user_page() alias for __copy_user()
lib/Kconfig: ZLIB_DEFLATE must select BITREVERSE
mm, dax: fix DAX deadlocks
memcg: convert threshold to bytes
builddeb: remove debian/files before build
mm, fs: obey gfp_mapping for add_to_page_cache()
Ross Zwisler [Thu, 15 Oct 2015 22:28:38 +0000 (15:28 -0700)]
sh: add copy_user_page() alias for __copy_user()
copy_user_page() is needed by DAX. Without this we get a compile error
for DAX on SH:
fs/dax.c:280:2: error: implicit declaration of function `copy_user_page' [-Werror=implicit-function-declaration]
copy_user_page(vto, (void __force *)vfrom, vaddr, to);
^
This was done with a random config that happened to include DAX support.
This patch has only been compile tested.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 15 Oct 2015 22:28:35 +0000 (15:28 -0700)]
lib/Kconfig: ZLIB_DEFLATE must select BITREVERSE
lib/built-in.o: In function `__bitrev32':
deftree.c:(.text+0x1e799): undefined reference to `byte_rev_table'
deftree.c:(.text+0x1e7a0): undefined reference to `byte_rev_table'
deftree.c:(.text+0x1e7b4): undefined reference to `byte_rev_table'
deftree.c:(.text+0x1e7c1): undefined reference to `byte_rev_table'
Anything which uses bitrevX() has to select BITREVERSE, to grab
lib/bitrev.o.
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Thu, 15 Oct 2015 22:28:32 +0000 (15:28 -0700)]
mm, dax: fix DAX deadlocks
The following two locking commits in the DAX code:
commit
843172978bb9 ("dax: fix race between simultaneous faults")
commit
46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX")
introduced a number of deadlocks and other issues which need to be fixed
for the v4.3 kernel. The list of issues in DAX after these commits
(some newly introduced by the commits, some preexisting) can be found
here:
https://lkml.org/lkml/2015/9/25/602 (Subject: "Re: [PATCH] dax: fix deadlock in __dax_fault").
This undoes most of the changes introduced by those two commits,
essentially returning us to the DAX locking scheme that was used in
v4.2.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Thu, 15 Oct 2015 22:28:29 +0000 (15:28 -0700)]
memcg: convert threshold to bytes
page_counter_memparse() returns pages for the threshold, while
mem_cgroup_usage() returns bytes for memory usage. Convert the
threshold to bytes.
Fixes: 3e32cb2e0a12b6915 ("memcg: rename cgroup_event to mem_cgroup_event").
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>