Herbert Xu [Mon, 23 Mar 2015 13:50:19 +0000 (00:50 +1100)]
rhashtable: Add barrier to ensure we see new tables in walker
The walker is a lockless reader so it too needs an smp_rmb before
reading the future_tbl field in order to see any new tables that
may contain elements that we should have walked over.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 24 Mar 2015 02:03:43 +0000 (22:03 -0400)]
Merge tag 'linux-can-next-for-4.1-
20150323' of git://git./linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2015-03-23
this is a pull request of 6 patches for net-next/master.
A patch by Florian Westphal, converts the skb->destructor to use
sock_efree() instead of own destructor. Ahmed S. Darwish's patch
converts the kvaser_usb driver to use unregister_candev(). A patch by
me removes a return from a void function in the m_can driver. Yegor
Yefremov contributes a patch for combined rx/tx LED trigger support. A
sparse warning in the esd_usb2 driver was fixes by Thomas Körper. Ben
Dooks converts the at91_can driver to use endian agnostic IO accessors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 24 Mar 2015 02:02:46 +0000 (22:02 -0400)]
Merge git://git./linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next.
Basically, more incremental updates for br_netfilter from Florian
Westphal, small nf_tables updates (including one fix for rb-tree
locking) and small two-liner to add extra validation for the REJECT6
target.
More specifically, they are:
1) Use the conntrack status flags from br_netfilter to know that DNAT is
happening. Patch for Florian Westphal.
2) nf_bridge->physoutdev == NULL already indicates that the traffic is
bridged, so let's get rid of the BRNF_BRIDGED flag. Also from Florian.
3) Another patch to prepare voidization of seq_printf/seq_puts/seq_putc,
from Joe Perches.
4) Consolidation of nf_tables_newtable() error path.
5) Kill nf_bridge_pad used by br_netfilter from ip_fragment(),
from Florian Westphal.
6) Access rb-tree root node inside the lock and remove unnecessary
locking from the get path (we already hold nfnl_lock there), from
Patrick McHardy.
7) You cannot use a NFT_SET_ELEM_INTERVAL_END when the set doesn't
support interval, also from Patrick.
8) Enforce IP6T_F_PROTO from ip6t_REJECT to make sure the core is
actually restricting matches to TCP.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Drozdov [Mon, 23 Mar 2015 06:11:13 +0000 (09:11 +0300)]
af_packet: pass checksum validation status to the user
Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the
af_packet user that at least the transport header checksum
has been already validated.
For now, the flag may be set for incoming packets only.
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Drozdov [Mon, 23 Mar 2015 06:11:12 +0000 (09:11 +0300)]
af_packet: make tpacket_rcv to not set status value before run_filter
It is just an optimization. We don't need the value of status variable
if the packet is filtered.
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fan Du [Mon, 23 Mar 2015 22:00:41 +0000 (15:00 -0700)]
inet: fix double request socket freeing
Eric Hugne reported following error :
I'm hitting this warning on latest net-next when i try to SSH into a machine
with eth0 added to a bridge (but i think the problem is older than that)
Steps to reproduce:
node2 ~ # brctl addif br0 eth0
[ 223.758785] device eth0 entered promiscuous mode
node2 ~ # ip link set br0 up
[ 244.503614] br0: port 1(eth0) entered forwarding state
[ 244.505108] br0: port 1(eth0) entered forwarding state
node2 ~ # [ 251.160159] ------------[ cut here ]------------
[ 251.160831] WARNING: CPU: 0 PID: 3 at include/net/request_sock.h:102 tcp_v4_err+0x6b1/0x720()
[ 251.162077] Modules linked in:
[ 251.162496] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.0.0-rc3+ #18
[ 251.163334] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 251.164078]
ffffffff81a8365c ffff880038a6ba18 ffffffff8162ace4 0000000000009898
[ 251.165084]
0000000000000000 ffff880038a6ba58 ffffffff8104da85 ffff88003fa437c0
[ 251.166195]
ffff88003fa437c0 ffff88003fa74e00 ffff88003fa43bb8 ffff88003fad99a0
[ 251.167203] Call Trace:
[ 251.167533] [<
ffffffff8162ace4>] dump_stack+0x45/0x57
[ 251.168206] [<
ffffffff8104da85>] warn_slowpath_common+0x85/0xc0
[ 251.169239] [<
ffffffff8104db65>] warn_slowpath_null+0x15/0x20
[ 251.170271] [<
ffffffff81559d51>] tcp_v4_err+0x6b1/0x720
[ 251.171408] [<
ffffffff81630d03>] ? _raw_read_lock_irq+0x3/0x10
[ 251.172589] [<
ffffffff81534e20>] ? inet_del_offload+0x40/0x40
[ 251.173366] [<
ffffffff81569295>] icmp_socket_deliver+0x65/0xb0
[ 251.174134] [<
ffffffff815693a2>] icmp_unreach+0xc2/0x280
[ 251.174820] [<
ffffffff8156a82d>] icmp_rcv+0x2bd/0x3a0
[ 251.175473] [<
ffffffff81534ea2>] ip_local_deliver_finish+0x82/0x1e0
[ 251.176282] [<
ffffffff815354d8>] ip_local_deliver+0x88/0x90
[ 251.177004] [<
ffffffff815350f0>] ip_rcv_finish+0xf0/0x310
[ 251.177693] [<
ffffffff815357bc>] ip_rcv+0x2dc/0x390
[ 251.178336] [<
ffffffff814f5da3>] __netif_receive_skb_core+0x713/0xa20
[ 251.179170] [<
ffffffff814f7fca>] __netif_receive_skb+0x1a/0x80
[ 251.179922] [<
ffffffff814f97d4>] process_backlog+0x94/0x120
[ 251.180639] [<
ffffffff814f9612>] net_rx_action+0x1e2/0x310
[ 251.181356] [<
ffffffff81051267>] __do_softirq+0xa7/0x290
[ 251.182046] [<
ffffffff81051469>] run_ksoftirqd+0x19/0x30
[ 251.182726] [<
ffffffff8106cc23>] smpboot_thread_fn+0x153/0x1d0
[ 251.183485] [<
ffffffff8106cad0>] ? SyS_setgroups+0x130/0x130
[ 251.184228] [<
ffffffff8106935e>] kthread+0xee/0x110
[ 251.184871] [<
ffffffff81069270>] ? kthread_create_on_node+0x1b0/0x1b0
[ 251.185690] [<
ffffffff81631108>] ret_from_fork+0x58/0x90
[ 251.186385] [<
ffffffff81069270>] ? kthread_create_on_node+0x1b0/0x1b0
[ 251.187216] ---[ end trace
c947fc7b24e42ea1 ]---
[ 259.542268] br0: port 1(eth0) entered forwarding state
Remove the double calls to reqsk_put()
[edumazet] :
I got confused because reqsk_timer_handler() _has_ to call
reqsk_put(req) after calling inet_csk_reqsk_queue_drop(), as
the timer handler holds a reference on req.
Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Mon, 23 Mar 2015 19:23:12 +0000 (16:23 -0300)]
vxlan: simplify if clause in dev_close
Dan Carpenter's static checker warned that in vxlan_stop we are checking
if 'vs' can be NULL while later we simply derreference it.
As after commit
56ef9c909b40 ("vxlan: Move socket initialization to
within rtnl scope") 'vs' just cannot be NULL in vxlan_stop() anymore, as
the interface won't go up if the socket initialization fails. So we are
good to just remove the check and make it consistent.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 23 Mar 2015 18:51:53 +0000 (11:51 -0700)]
fib_trie: Fix regression in handling of inflate/halve failure
When I updated the code to address a possible null pointer dereference in
resize I ended up reverting an exception handling fix for the suffix length
in the event that inflate or halve failed. This change is meant to correct
that by reverting the earlier fix and instead simply getting the parent
again after inflate has been completed to avoid the possible null pointer
issue.
Fixes: ddb4b9a13 ("fib_trie: Address possible NULL pointer dereference in resize")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Mon, 23 Mar 2015 11:35:37 +0000 (12:35 +0100)]
bgmac: implement scatter/gather support
Always use software checksumming, since the hardware does not have any
checksum offload support.
This significantly improves local TCP tx performance.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Mon, 23 Mar 2015 11:35:36 +0000 (12:35 +0100)]
bgmac: implement GRO and use build_skb
This improves performance for routing and local rx
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Fietkau [Mon, 23 Mar 2015 11:35:35 +0000 (12:35 +0100)]
bgmac: fix descriptor frame start/end definitions
The start-of-frame and end-of-frame bits were accidentally swapped.
In the current code it does not make any difference, since they are
always used together.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki/吉藤英明 [Mon, 23 Mar 2015 09:04:13 +0000 (18:04 +0900)]
net: Move the comment about unsettable socket-level options to default clause and update its reference.
We implement the SO_SNDLOWAT etc not to be settable and return
ENOPROTOOPT per 1003.1g 7. Move the comment to appropriate
position and update the reference.
Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 23 Mar 2015 20:52:34 +0000 (16:52 -0400)]
Merge branch 'listener_refactor_part_15'
Eric Dumazet says:
====================
tcp listener refactoring part 15
I am trying to make the final patch pushing request socks into ehash
as small as possible. In this patch series, I made various adjustments
for the SYNACK generation, allowing me to reach 1 Mpps SYNACK in my
stress test (still hitting LISTENER spinlock of course, and the syn_wait
spinlock)
I also converted the ICMP handlers a bit ahead of time :
They no longer need to get the LISTENER socket, and can use
only a lookup in ehash table. No big deal if we ignore ICMP
for requests socks before the final steps.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 22 Mar 2015 17:22:25 +0000 (10:22 -0700)]
ipv6: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets
dccp_v6_err() can restrict lookups to ehash table, and not to listeners.
Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 22 Mar 2015 17:22:24 +0000 (10:22 -0700)]
ipv4: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets
dccp_v4_err() can restrict lookups to ehash table, and not to listeners.
Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.
New dccp_req_err() helper is exported so that we can use it in IPv6
in following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 22 Mar 2015 17:22:23 +0000 (10:22 -0700)]
ipv6: tcp: handle ICMP messages on TCP_NEW_SYN_RECV request sockets
tcp_v6_err() can restrict lookups to ehash table, and not to listeners.
Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 22 Mar 2015 17:22:22 +0000 (10:22 -0700)]
ipv4: tcp: handle ICMP messages on TCP_NEW_SYN_RECV request sockets
tcp_v4_err() can restrict lookups to ehash table, and not to listeners.
Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.
New tcp_req_err() helper is exported so that we can use it in IPv6
in following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 22 Mar 2015 17:22:21 +0000 (10:22 -0700)]
net: convert syn_wait_lock to a spinlock
This is a low hanging fruit, as we'll get rid of syn_wait_lock eventually.
We hold syn_wait_lock for such small sections, that it makes no sense to use
a read/write lock. A spin lock is simply faster.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 22 Mar 2015 17:22:20 +0000 (10:22 -0700)]
inet: remove some sk_listener dependencies
listener can be source of false sharing. request sock has some
useful information like : ireq->ir_iif, ireq->ir_num, ireq->ireq_net
This patch does not solve the major problem of having to read
sk->sk_protocol which is sharing a cache line with sk->sk_wmem_alloc.
(This same field is read later in ip_build_and_send_pkt())
One idea would be to move sk_protocol close to sk_family
(using 8 bits instead of 16 for sk_family seems enough)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 22 Mar 2015 17:22:19 +0000 (10:22 -0700)]
inet: remove sk_listener parameter from syn_ack_timeout()
It is not needed, and req->sk_listener points to the listener anyway.
request_sock argument can be const.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 22 Mar 2015 17:22:18 +0000 (10:22 -0700)]
inet: cache listen_sock_qlen() and read rskq_defer_accept once
Cache listen_sock_qlen() to limit false sharing, and read
rskq_defer_accept once as it might change under us.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 23 Mar 2015 20:47:29 +0000 (16:47 -0400)]
Merge branch 'gigaset_modem_response'
Tilman Schmidt says:
====================
isdn/gigaset: restructure modem response parser
This series of patches restructures the Gigaset ISDN driver's
modem response parser to improve code readability and conform
better to the device's specification and actual behaviour.
Could you please merge these through net-next?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 21 Mar 2015 19:15:32 +0000 (20:15 +0100)]
isdn/gigaset: restructure modem response parser (4)
Restructure the control structure of the modem response parser
to improve readability and error handling.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 21 Mar 2015 19:15:32 +0000 (20:15 +0100)]
isdn/gigaset: restructure modem response parser (3)
Separate CID detection from main parser loop.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 21 Mar 2015 19:15:32 +0000 (20:15 +0100)]
isdn/gigaset: restructure modem response parser (2)
Separate literal string handling from main parser loop.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tilman Schmidt [Sat, 21 Mar 2015 19:15:32 +0000 (20:15 +0100)]
isdn/gigaset: restructure modem response parser (1)
Factor out queueing of modem response events into helper function
add_cid_event().
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Sat, 21 Mar 2015 17:27:28 +0000 (10:27 -0700)]
switchdev: fix stp update API to work with layered netdevices
make it same as the netdev_switch_port_bridge_setlink/dellink
api (ie traverse lowerdevs to get to the switch port).
removes "WARN_ON(!ops->ndo_switch_parent_id_get)" because
direct bridge ports can be stacked netdevices (like bonds
and team of switch ports) which may not implement this ndo.
v2 to v3:
- remove changes to bond and team. Bring back the
transparently following lowerdevs like i initially
had for setlink/getlink
(http://www.spinics.net/lists/netdev/msg313436.html)
dave and scott feldman also seem to prefer it be that
way and move to non-transparent way of doing things
if we see a problem down the lane.
v3 to v4:
- fix ret initialization
v4 to v5:
- return err on first failure (scott feldman)
v5 to v6:
- change variable name (err) and initialize to
-EOPNOTSUPP (scott feldman).
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 20 Mar 2015 21:29:09 +0000 (14:29 -0700)]
net: clear skb->priority when forwarding to another netns
skb->priority can be set for two purposes:
1) With respect to IP TOS field, which is computed by a mask.
Ususally used for priority qdisc's (pfifo, prio etc.), on TX
side (we only have ingress qdisc on RX side).
2) Used as a classid or flowid, works in the same way with tc
classid. What's more, this can even override the classid
of tc filters.
For case 1), it has been respected within its netns, I don't
see any point of keeping it for another netns, especially
when packets will be forwarded to Rx path (no matter from TX
path or RX path).
For case 2) we care, our applications run inside a netns,
and we classify the packets by our own filters outside,
If some application sets this priority, it could bypass
our filters, therefore clear it when moving out of a netns,
it makes no sense to bypass tc filters out of its netns.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 23 Mar 2015 20:41:44 +0000 (16:41 -0400)]
Merge branch 'crypto_async'
Tadeusz Struk says:
====================
Add support for async socket operations
After the iocb parameter has been removed from sendmsg() and recvmsg() ops
the socket layer, and the network stack no longer support async operations.
This patch set adds support for asynchronous operations on sockets back.
Changes in v3:
* As sugested by Al Viro instead of adding new functions aio_sendmsg
and aio_recvmsg, added a ptr to iocb into the kernel-side msghdr structure.
This way no change to aio.c is required.
Changes in v2:
* removed redundant total_size param from aio_sendmsg and aio_recvmsg functions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tadeusz Struk [Thu, 19 Mar 2015 19:31:40 +0000 (12:31 -0700)]
crypto: algif - change algif_skcipher to be asynchronous
The way the algif_skcipher works currently is that on sendmsg/sendpage it
builds an sgl for the input data and then on read/recvmsg it sends the job
for encryption putting the user to sleep till the data is processed.
This way it can only handle one job at a given time.
This patch changes it to be asynchronous by adding AIO support.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tadeusz Struk [Thu, 19 Mar 2015 19:31:30 +0000 (12:31 -0700)]
crypto: af_alg - Allow to link sgl
Allow to link af_alg sgls.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tadeusz.struk@intel.com [Thu, 19 Mar 2015 19:31:25 +0000 (12:31 -0700)]
net: socket: add support for async operations
Add support for async operations.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Dooks [Wed, 18 Mar 2015 15:53:10 +0000 (15:53 +0000)]
can: at91_can: use endian agnostic IO accessors
Change __raw accesors to endian agnostic versions to allow the driver
to work properly on big endian systems.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Thomas Körper [Mon, 16 Mar 2015 08:48:29 +0000 (09:48 +0100)]
can: esd_usb2: Fix sparse warnings
The hnd field of the structs does not need to be __le32: the
device just returns the value without using it itself.
Signed-off-by: Thomas Körper <thomas.koerper@esd.eu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Yegor Yefremov [Mon, 16 Mar 2015 08:38:13 +0000 (09:38 +0100)]
can: add combined rx/tx LED trigger support
Add <ifname>-rxtx trigger, that will be activated both for tx
as rx events. This trigger mimics "activity" LED for Ethernet
devices.
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Marc Kleine-Budde [Sat, 14 Mar 2015 14:39:09 +0000 (15:39 +0100)]
can: m_cam: m_can_fifo_write(): remove return from void function
This patch removes the return from the void function m_can_fifo_write().
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Ahmed S. Darwish [Sat, 14 Mar 2015 13:11:47 +0000 (09:11 -0400)]
can: kvaser_usb: Use can-dev unregistration mechanism
Use can-dev's unregister_candev() instead of directly calling
networking unregister_netdev(). While both are functionally
equivalent, unregister_candev() might do extra stuff in the
future than just calling networking layer unregistration code.
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Florian Westphal [Tue, 10 Mar 2015 03:48:20 +0000 (04:48 +0100)]
can: use sock_efree instead of own destructor
It is identical to the can destructor.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Pablo Neira Ayuso [Sat, 21 Mar 2015 19:20:23 +0000 (20:20 +0100)]
netfilter: ip6t_REJECT: check for IP6T_F_PROTO
Make sure IP6T_F_PROTO is set to enforce layer 4 protocol matching from
the ip6_tables core.
Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sat, 21 Mar 2015 15:19:16 +0000 (15:19 +0000)]
netfilter: nf_tables: reject NFT_SET_ELEM_INTERVAL_END flag for non-interval sets
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sat, 21 Mar 2015 15:19:14 +0000 (15:19 +0000)]
netfilter: nft_rbtree: fix locking
Fix a race condition and unnecessary locking:
* the root rb_node must only be accessed under the lock in nft_rbtree_lookup()
* the lock is not needed in lookup functions in netlink context
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Wed, 18 Mar 2015 19:55:31 +0000 (20:55 +0100)]
netfilter: bridge: kill nf_bridge_pad
The br_netfilter frag output function calls skb_cow_head() so in
case it needs a larger headroom to e.g. re-add a previously stripped PPPOE
or VLAN header things will still work (at cost of reallocation).
We can then move nf_bridge_encap_header_len to br_netfilter.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Herbert Xu [Sat, 21 Mar 2015 03:14:03 +0000 (14:14 +1100)]
netlink: Remove netlink_compare_arg.trailer
Instead of computing the offset from trailer, this patch computes
netlink_compare_arg_len from the offset of portid and then adds 4
to it. This allows trailer to be removed.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 21 Mar 2015 02:03:15 +0000 (22:03 -0400)]
Merge branch 'netcp-next'
Murali Karicheri says:
====================
NetCP: Add support for version 1.5
NetCP 1.5 is used in newer K2 SoCs from Texas Instruments
such as K2E, K2L etc. This patch series add support for Ethss
driver for this version of NetCP. While at it, fix couple of
bugs in the original driver.
One of the earlier patch "net: netcp: select davinci_mdio driver
by default" is folded onto this series.
Please review and let me know your comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
WingMan Kwok [Fri, 20 Mar 2015 20:11:25 +0000 (16:11 -0400)]
net: netcp: ethss: enhancement to support NetCP 1.5 ethss
NetCP 1.5 available on newer K2 SoCs such as K2E and K2L introduced 3
variants of the ethss subsystem, 9 port, 5 port and 2 port. These have
one host port towards the CPU and N external slave ports.
To customize the driver for these new ethss sub systems, multiple
compatibility strings are introduced. Currently some of parameters that
are different on different variants such as number of ALE ports, stats
modules and number of ports are defined through constants. These are now
changed to variables in gbe_priv data that get set based on the
compatibility string. This is required as there are no hardware
identification registers available to distinguish among the variants
of NetCP 1.5 ethss. However there is identification register available
to differentiate between NetCP 1.4 vs NetCP 1.5 and the same is made use
of in the code to differentiate them.
For more reading on the details of this peripheral, please refer to the
User Guide available at http://www.ti.com/lit/pdf/spruhz3
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Mugunthan V N <mugunthanvnm@ti.com>
CC: "Lad, Prabhakar" <prabhakar.csengg@gmail.com>
CC: Grygorii Strashko <grygorii.strashko@ti.com>
CC: Christoph Jaeger <cj@linux.com>
CC: Lokesh Vutla <lokeshvutla@ti.com>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Kumar Gala <galak@codeaurora.org>
CC: Ian Campbell <ijc+devicetree@hellion.org.uk>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Pawel Moll <pawel.moll@arm.com>
CC: Rob Herring <robh+dt@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karicheri, Muralidharan [Fri, 20 Mar 2015 20:11:24 +0000 (16:11 -0400)]
net: netcp: enclose macros in parentheses
Fix following checkpatch error. It seems to have passed checkpatch
last time when original code was introduced.
ERROR: Macros with complex values should be enclosed in parentheses
#172: FILE: drivers/net/ethernet/ti/netcp_ethss.c:869:
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Mugunthan V N <mugunthanvnm@ti.com>
CC: "Lad, Prabhakar" <prabhakar.csengg@gmail.com>
CC: Grygorii Strashko <grygorii.strashko@ti.com>
CC: Christoph Jaeger <cj@linux.com>
CC: Lokesh Vutla <lokeshvutla@ti.com>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Kumar Gala <galak@codeaurora.org>
CC: Ian Campbell <ijc+devicetree@hellion.org.uk>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Pawel Moll <pawel.moll@arm.com>
CC: Rob Herring <robh+dt@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karicheri, Muralidharan [Fri, 20 Mar 2015 20:11:23 +0000 (16:11 -0400)]
net: netcp: select davinci_mdio driver by default
Keystone netcp driver re-uses davinci mdio driver. So enable it
by default for keystone netcp driver.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Mugunthan V N <mugunthanvnm@ti.com>
CC: "Lad, Prabhakar" <prabhakar.csengg@gmail.com>
CC: Grygorii Strashko <grygorii.strashko@ti.com>
CC: Christoph Jaeger <cj@linux.com>
CC: Lokesh Vutla <lokeshvutla@ti.com>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Kumar Gala <galak@codeaurora.org>
CC: Ian Campbell <ijc+devicetree@hellion.org.uk>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Pawel Moll <pawel.moll@arm.com>
CC: Rob Herring <robh+dt@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karicheri, Muralidharan [Fri, 20 Mar 2015 20:11:22 +0000 (16:11 -0400)]
net: netcp: use separate reg region for individual ethss modules
Ethss has multiple modules within the sub system
- switch sub system
- sgmii
- mdio
- switch module
NetCP driver re-uses existing davinci mdio driver. It requires to
have its own register region to map the reg space. So restructure
the code to use separate reg region for the individual modules it
manages. Use range property to define register space of NetCP and
use reg property to define individual reg spaces. So MDIO will have
its own reg space to map. This is a pre-requisite to enable MDIO
driver for NetCP.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Mugunthan V N <mugunthanvnm@ti.com>
CC: "Lad, Prabhakar" <prabhakar.csengg@gmail.com>
CC: Grygorii Strashko <grygorii.strashko@ti.com>
CC: Christoph Jaeger <cj@linux.com>
CC: Lokesh Vutla <lokeshvutla@ti.com>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Kumar Gala <galak@codeaurora.org>
CC: Ian Campbell <ijc+devicetree@hellion.org.uk>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Pawel Moll <pawel.moll@arm.com>
CC: Rob Herring <robh+dt@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Karicheri, Muralidharan [Fri, 20 Mar 2015 20:11:21 +0000 (16:11 -0400)]
net: netcp: fix forward port number usage for 10G ethss
10G switch requires forward port number in the taginfo field,
where as it should be in packet_info field for necp 1.4 Ethss. So
fill this value correctly in the knav dma descriptor.
Also rename dma_psflags field in struct netcp_tx_pipe to switch_to_port
as it contain no flag, but the switch port number for forwarding the
packet. Add a flag to hold the new flag, SWITCH_TO_PORT_IN_TAGINFO which
will be set for 10G. This can also used in the future for other flags for
the tx_pipe.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Mugunthan V N <mugunthanvnm@ti.com>
CC: "Lad, Prabhakar" <prabhakar.csengg@gmail.com>
CC: Grygorii Strashko <grygorii.strashko@ti.com>
CC: Christoph Jaeger <cj@linux.com>
CC: Lokesh Vutla <lokeshvutla@ti.com>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Kumar Gala <galak@codeaurora.org>
CC: Ian Campbell <ijc+devicetree@hellion.org.uk>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Pawel Moll <pawel.moll@arm.com>
CC: Rob Herring <robh+dt@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki/吉藤英明 [Thu, 19 Mar 2015 13:42:04 +0000 (22:42 +0900)]
net: neighbour: Document {mcast, ucast}_solicit, mcast_resolicit.
Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
YOSHIFUJI Hideaki/吉藤英明 [Thu, 19 Mar 2015 13:41:46 +0000 (22:41 +0900)]
net: neighbour: Add mcast_resolicit to configure the number of multicast resolicitations in PROBE state.
We send unicast neighbor (ARP or NDP) solicitations ucast_probes
times in PROBE state. Zhu Yanjun reported that some implementation
does not reply against them and the entry will become FAILED, which
is undesirable.
We had been dealt with such nodes by sending multicast probes mcast_
solicit times after unicast probes in PROBE state. In 2003, I made
a change not to send them to improve compatibility with IPv6 NDP.
Let's introduce per-protocol per-interface sysctl knob "mcast_
reprobe" to configure the number of multicast (re)solicitation for
reconfirmation in PROBE state. The default is 0, since we have
been doing so for 10+ years.
Reported-by: Zhu Yanjun <Yanjun.Zhu@windriver.com>
CC: Ulf Samuelsson <ulf.samuelsson@ericsson.com>
Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafał Miłecki [Fri, 20 Mar 2015 22:14:32 +0000 (23:14 +0100)]
bgmac: allow enabling on ARCH_BCM_5301X
Home routers based on ARM SoCs like BCM4708 also have bcma bus with core
supported by bgmac.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rafał Miłecki [Fri, 20 Mar 2015 22:14:31 +0000 (23:14 +0100)]
bgmac: register fixed PHY for ARM BCM470X / BCM5301X chipsets
On ARM SoCs with bgmac Ethernet hardware we don't have any normal PHY.
There is always a switch attached but it's not even controlled over MDIO
like in case of MIPS devices.
We need a fixed PHY to be able to send/receive packets from the switch.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 21 Mar 2015 01:40:47 +0000 (21:40 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-03-20
This series contains updates to ixgb, e1000e, igb and igbvf.
Eliezer and Todd provide patches to fix a potential issue found during code
inspection. When bringing down an interface netif_carrier_off() should
be one of the first things we do, since this will prevent the stack from
queueing more packets to this interface.
Yanir provides a fix for e1000e that was found in validating i219,
where the call to e1000e_write_protect_nvm_ich8lan() is no longer
supported in newer hardware. Access to these registers causes a
system freeze in early steppings and is ignored in later steppings.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mathieu Olivari [Sat, 21 Mar 2015 01:31:03 +0000 (18:31 -0700)]
net: dsa: make NET_DSA manually selectable from the config
Change
bd76a116707bd2381da36cf7c3183df11293f1d6 made all DSA drivers
depend on NET_DSA rather than selecting them. However, as the only way
to select this option was to actually select a driver, it made DSA
impossible to enable at all.
This patch adds an explicit entry which the user will have to enable
prior selecting a driver.
Signed-off-by: Mathieu Olivari <mathieu@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Sat, 21 Mar 2015 00:42:46 +0000 (17:42 -0700)]
switchdev: kernel-doc cleanup on swithdev ops
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 21 Mar 2015 00:15:19 +0000 (17:15 -0700)]
Revert "selinux: add a skb_owned_by() hook"
This reverts commit
ca10b9e9a8ca7342ee07065289cbe74ac128c169.
No longer needed after commit
eb8895debe1baba41fcb62c78a16f0c63c21662a
("tcp: tcp_make_synack() should use sock_wmalloc")
When under SYNFLOOD, we build lot of SYNACK and hit false sharing
because of multiple modifications done on sk_listener->sk_wmem_alloc
Since tcp_make_synack() uses sock_wmalloc(), there is no need
to call skb_set_owner_w() again, as this adds two atomic operations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Todd Fujinaka [Sat, 21 Mar 2015 00:41:54 +0000 (17:41 -0700)]
igbvf: use netif_carrier_off earlier when bringing if down
Use netif_carrier_off() first, since that will prevent the stack from
queuing more packets to this IF. This operation is fast, and should
behave much nicer when trying to bring down an interface under load.
Reported-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Todd Fujinaka [Sat, 21 Mar 2015 00:41:53 +0000 (17:41 -0700)]
igb: use netif_carrier_off earlier when bringing if down
Use netif_carrier_off() first, since that will prevent the stack from
queuing more packets to this IF. This operation is fast, and should
behave much nicer when trying to bring down an interface under load.
Reported-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Yanir Lubetkin [Sat, 21 Mar 2015 00:41:53 +0000 (17:41 -0700)]
e1000e: NVM write protect access removed from SPT HW
The call to e1000e_write_protect_nvm_ich8lan() is no longer supported by HW.
Access to these registers causes a system freeze in A step hardware and is
ignored in B step hardware. This function must not be called in hardware
newer than LPT.
Signed-off-by: Yanir Lubetkin <yanirx.lubetkin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eliezer Tamir [Sat, 21 Mar 2015 00:41:52 +0000 (17:41 -0700)]
e1000e: call netif_carrier_off early on down
When bringing down an interface netif_carrier_off() should be
one the first things we do, since this will prevent the stack
from queuing more packets to this interface.
This operation is very fast, and should make the device behave
much nicer when trying to bring down an interface under load.
Also, this would Do The Right Thing (TM) if this device has some
sort of fail-over teaming and redirect traffic to the other IF.
Move netif_carrier_off as early as possible.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eliezer Tamir [Sat, 21 Mar 2015 00:41:52 +0000 (17:41 -0700)]
ixgb: call netif_carrier_off early on down
When bringing down an interface netif_carrier_off() should be
one the first things we do, since this will prevent the stack
from queuing more packets to this interface.
This operation is very fast, and should make the device behave
much nicer when trying to bring down an interface under load.
Also, this would Do The Right Thing (TM) if this device has some
sort of fail-over teaming and redirect traffic to the other IF.
Move netif_carrier_off as early as possible.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Fri, 20 Mar 2015 23:10:50 +0000 (19:10 -0400)]
Merge branch 'ebpf-next'
Daniel Borkmann says:
====================
This set adds native eBPF support also to act_bpf and thus covers tc
with eBPF in the classifier *and* action part.
A link to iproute2 preview has been provided in patch 2 and the code
will be pushed out after Stephen has processed the classifier part
and helper bits for tc.
This set depends on
ced585c83b27 ("act_bpf: allow non-default TC_ACT
opcodes as BPF exec outcome"), so a net into net-next merge would be
required first. Hope that's fine by you, Dave. ;)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 20 Mar 2015 14:11:12 +0000 (15:11 +0100)]
act_bpf: add initial eBPF support for actions
This work extends the "classic" BPF programmable tc action by extending
its scope also to native eBPF code!
Together with commit
e2e9b6541dd4 ("cls_bpf: add initial eBPF support
for programmable classifiers") this adds the facility to implement fully
flexible classifier and actions for tc that can be implemented in a C
subset in user space, "safely" loaded into the kernel, and being run in
native speed when JITed.
Also, since eBPF maps can be shared between eBPF programs, it offers the
possibility that cls_bpf and act_bpf can share data 1) between themselves
and 2) between user space applications. That means that, f.e. customized
runtime statistics can be collected in user space, but also more importantly
classifier and action behaviour could be altered based on map input from
the user space application.
For the remaining details on the workflow and integration, see the cls_bpf
commit
e2e9b6541dd4. Preliminary iproute2 part can be found under [1].
[1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf-act
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 20 Mar 2015 14:11:11 +0000 (15:11 +0100)]
ebpf: add sched_act_type and map it to sk_filter's verifier ops
In order to prepare eBPF support for tc action, we need to add
sched_act_type, so that the eBPF verifier is aware of what helper
function act_bpf may use, that it can load skb data and read out
currently available skb fields.
This is bascially analogous to
96be4325f443 ("ebpf: add sched_cls_type
and map it to sk_filter's verifier ops").
BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT need to be
separate since both will have a different set of functionality in
future (classifier vs action), thus we won't run into ABI troubles
when the point in time comes to diverge functionality from the
classifier.
The future plan for act_bpf would be that it will be able to write
into skb->data and alter selected fields mirrored in struct __sk_buff.
For an initial support, it's sufficient to map it to sk_filter_ops.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 20 Mar 2015 22:51:09 +0000 (18:51 -0400)]
Merge git://git./linux/kernel/git/davem/net
Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
net/core/sysctl_net_core.c
net/ipv4/inet_diag.c
The be_main.c conflict resolution was really tricky. The conflict
hunks generated by GIT were very unhelpful, to say the least. It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().
So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged. And this worked beautifully.
The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 22:18:45 +0000 (18:18 -0400)]
rhashtable: Fix undeclared EEXIST build error on ia64
We need to include linux/errno.h in rhashtable.h since it doesn't
always get included otherwise.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 20 Mar 2015 17:41:43 +0000 (17:41 +0000)]
net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfrom
Cc: stable@vger.kernel.org # v3.19
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 20 Mar 2015 20:34:03 +0000 (16:34 -0400)]
Merge branch 'amd-xgbe-next'
Tom Lendacky says:
====================
amd-xgbe: AMD XGBE driver updates 2015-03-19
The following series of patches includes functional updates and changes
to the driver.
- Use the phydev->advertising field instead of the phydev->supported
field when configuring for auto-negotiation, etc.
- Use the phy_driver flags field for setting the transceiver type
instead of hardcoding it in the ethtool support.
- Provide an auto-negotiation timeout check
- Clarify the Tx/Rx queue information messages
- Use the new DMA memory barrier operations
- Set the device DMA mask based on what the hardware reports
- Remove the software implementation of Tx coalescing
- Fix the reporting of the Rx coalescing value
- Use napi_alloc_skb when allocating an SKB in softirq
This patch series is based on net-next.
Changes from v2:
- Use jiffies instead of timespec for the auto-negotiation timeout check
- Remove the Rx path SKB allocation re-work patch since we should only
inline the headers and the current code guards better against any
hardware bugs
Changes from v1:
- Default to 32-bit DMA width (minimum supported) if hardware returns
an unexpected DMA width value
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:41 +0000 (11:50 -0500)]
amd-xgbe: Use napi_alloc_skb when allocating skb in softirq
Use the napi_alloc_skb function to allocate an skb when running within
the softirq context to avoid calls to local_irq_save/restore.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:34 +0000 (11:50 -0500)]
amd-xgbe: Fix Rx coalescing reporting
The Rx coalescing value is internally converted from usecs to a value
that the hardware can use. When reporting the Rx coalescing value, this
internal value is converted back to usecs. During the conversion from
and back to usecs some rounding occurs. So, for example, when setting an
Rx usec of 30, it will be reported as 29. Fix this reporting issue by
keeping the original usec value and using that during reporting.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:28 +0000 (11:50 -0500)]
amd-xgbe: Remove Tx coalescing
The Tx coalescing support in the driver was a software implementation
for something lacking in the hardware. Using hrtimers, the idea was to
trigger a timer interrupt after having queued a packet for transmit.
Unfortunately, as the timer value was lowered, the timer expired before
the hardware actually did the transmit and so it was racey and resulted
in unnecessary interrupts.
Remove the Tx coalescing support and hrtimer and replace with a Tx timer
that is used as a reclaim timer in case of inactivity.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:22 +0000 (11:50 -0500)]
amd-xgbe: Set DMA mask based on hardware register value
The hardware supplies a value that indicates the DMA range that it
is capable of using. Use this value rather than hard-coding it in
the driver.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:16 +0000 (11:50 -0500)]
amd-xgbe: Use the new DMA memory barriers where appropriate
Use the new lighter weight memory barriers when working with the device
descriptors.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:10 +0000 (11:50 -0500)]
amd-xgbe: Clarify output message about queues
Clarify that the queues referred to in a message when the device is
brought up are hardware queues and not necessarily related to the
Linux network queues.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:50:04 +0000 (11:50 -0500)]
amd-xgbe-phy: Provide support for auto-negotiation timeout
Currently, there is no interrupt code that indicates auto-negotiation
has timed out. If the auto-negotiation has timed out then the start of
a new auto-negotiation will begin again with a new base page being
received. The state machine could be in a state that is not expecting
this interrupt code which results in an error during auto-negotiation.
Update the code to timestamp when the auto-negotiation starts. Should
another page received interrupt code occur before auto-negotiation has
completed but after the auto-negotiation timeout, then reset the state
machine to allow the auto-negotiation to continue.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:49:53 +0000 (11:49 -0500)]
amd-xgbe-phy: Use the phy_driver flags field
Remove the setting of the transceiver type when retrieving the device
settings using ethtool and instead set the transceiver type in the
phy_driver structure flags field. Change the transceiver type to be
internal, also.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 20 Mar 2015 16:49:42 +0000 (11:49 -0500)]
amd-xgbe-phy: Use phydev advertising field vs supported
With ethtool being able to control what is advertised, the advertising
field is what should be used for priming the auto-negotiation registers
and for various other checks, instead of the supported field.
Also, move the initial setting of the supported and advertising fields
into the probe function so that they are not reset each time the device
is brought up, thus allowing the user to set as desired before bringing
the device up.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Catalin Marinas [Fri, 20 Mar 2015 16:48:13 +0000 (16:48 +0000)]
net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour
Commit
db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an
error) introduced the clamping of msg_namelen when the unsigned value
was larger than sizeof(struct sockaddr_storage). This caused a
msg_namelen of -1 to be valid. The native code was subsequently fixed by
commit
dbb490b96584 (net: socket: error on a negative msg_namelen).
In addition, the native code sets msg_namelen to 0 when msg_name is
NULL. This was done in commit (
6a2a2b3ae075 net:socket: set msg_namelen
to 0 if msg_name is passed as NULL in msghdr struct from userland) and
subsequently updated by
08adb7dabd48 (fold verify_iovec() into
copy_msghdr_from_user()).
This patch brings the get_compat_msghdr() in line with
copy_msghdr_from_user().
Fixes: db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error)
Cc: David S. Miller <davem@davemloft.net>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 20 Mar 2015 20:16:32 +0000 (16:16 -0400)]
Merge branch 'rhashtable-inlined-interface'
Herbert Xu says:
====================
rhashtable: Introduce inlined interface
This series of patches introduces the inlined rhashtable interface.
The idea is to make all the function pointers visible to the compiler
by providing the rhashtable_params structure explicitly to each
inline rhashtable function. For example, instead of doing
obj = rhashtable_lookup(ht, key);
you would now do
obj = rhashtable_lookup_fast(ht, key, params);
Where params is the same data that you would give to rhashtable_init.
In particular, within rhashtable.c itself we would simply supply
ht->p.
So to convert users over, you simply have to make params globally
accessible, e.g., by placing it in a static const variable, which
can then be used at each inlined call site, as well as by the
rhashtable_init call.
The only ticky bit is that some users (i.e., netfilter) has a
dynamic key length. This is dealt with by using params.key_len
in the inline functions when it is non-zero, and otherwise falling
back on ht->p.key_len.
Note that I've only tested this on one compiler, gcc 4.7.2. So
please test this with your compilers as well and make sure that
the code is actually inlined without indirect function calls.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:06 +0000 (21:57 +1100)]
rhashtable: Rip out obsolete out-of-line interface
Now that all rhashtable users have been converted over to the
inline interface, this patch removes the unused out-of-line
interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:05 +0000 (21:57 +1100)]
tipc: Use inlined rhashtable interface
This patch converts tipc to the inlined rhashtable interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:04 +0000 (21:57 +1100)]
test_rhashtable: Use inlined rhashtable interface
This patch converts test_rhashtable to the inlined rhashtable
interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:02 +0000 (21:57 +1100)]
netfilter: Convert nft_hash to inlined rhashtable
This patch converts nft_hash to the inlined rhashtable interface.
This patch also replaces the call to rhashtable_lookup_compare with
a straight rhashtable_lookup_fast because it's simply doing a memcmp
(in fact nft_hash_lookup already uses memcmp instead of nft_data_cmp).
Furthermore, the compare function is only meant to compare, it is not
supposed to have side-effects. The current side-effect code can
simply be moved into the nft_hash_get.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:01 +0000 (21:57 +1100)]
netlink: Move namespace into hash key
Currently the name space is a de facto key because it has to match
before we find an object in the hash table. However, it isn't in
the hash value so all objects from different name spaces with the
same port ID hash to the same bucket.
This is bad as the number of name spaces is unbounded.
This patch fixes this by using the namespace when doing the hash.
Because the namespace field doesn't lie next to the portid field
in the netlink socket, this patch switches over to the rhashtable
interface without a fixed key.
This patch also uses the new inlined rhashtable interface where
possible.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:57:00 +0000 (21:57 +1100)]
rhashtable: Allow hash/comparison functions to be inlined
This patch deals with the complaint that we make indirect function
calls on the fast paths unnecessarily in rhashtable. We resolve
it by moving the fast paths into inline functions that take struct
rhashtable_param (which obviously must be the same set of parameters
supplied to rhashtable_init) as an argument.
The only remaining indirect call is to obj_hashfn (or key_hashfn it
obj_hashfn is unset) on the rehash as well as the insert-during-
rehash slow path.
This patch also extends the support of vairable-length keys to
include those where the key is fixed but scattered in the object.
For example, in netlink we want to key off the namespace and the
portid but they're not next to each other.
This patch does this by directly using the object hash function
as the indicator of whether the key is accessible or not. It
also adds a new function obj_cmpfn to compare a key against an
object. This means that the caller no longer needs to supply
explicit compare functions.
All this is done in a backwards compatible manner so no existing
users are affected until they convert to the new interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 20 Mar 2015 10:56:59 +0000 (21:56 +1100)]
rhashtable: Make rhashtable_init params argument const
This patch marks the rhashtable_init params argument const as
there is no reason to modify it since we will always make a copy
of it in the rhashtable.
This patch also fixes a bug where we don't actually round up the
value of min_size unless it is less than HASH_MIN_SIZE.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 19 Mar 2015 18:38:27 +0000 (19:38 +0100)]
ebpf, filter: do not convert skb->protocol to host endianess during runtime
Commit
c24973957975 ("bpf: allow BPF programs access 'protocol' and 'vlan_tci'
fields") has added support for accessing protocol, vlan_present and vlan_tci
into the skb offset map.
As referenced in the below discussion, accessing skb->protocol from an eBPF
program should be converted without handling endianess.
The reason for this is that an eBPF program could simply do a check more
naturally, by f.e. testing skb->protocol == htons(ETH_P_IP), where the LLVM
compiler resolves htons() against a constant automatically during compilation
time, as opposed to an otherwise needed run time conversion.
After all, the way of programming both from a user perspective differs quite
a lot, i.e. bpf_asm ["ld proto"] versus a C subset/LLVM.
Reference: https://patchwork.ozlabs.org/patch/450819/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 20 Mar 2015 14:37:17 +0000 (11:37 -0300)]
ipv6: invert join/leave anycast rtnl/socket locking order
Commit
baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
missed to update two setsockopt options, IPV6_JOIN_ANYCAST and
IPV6_LEAVE_ANYCAST, causing a lock inverstion regarding to the updated ones.
As ipv6_sock_ac_join and ipv6_sock_ac_leave are only called from
do_ipv6_setsockopt, we are good to just move the rtnl lock upper.
Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket")
Reported-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner [Fri, 20 Mar 2015 13:26:21 +0000 (10:26 -0300)]
vxlan: fix possible use of uninitialized in vxlan_igmp_{join, leave}
Test robot noticed that we check the return of vxlan_igmp_join and leave
but inside them there was a path that it could be used initialized.
It's not really possible because those if() inside these igmp functions
would always match as we can't have sockets of other type in there, but
this way we keep the compiler happy.
Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 20 Mar 2015 17:25:56 +0000 (13:25 -0400)]
Merge branch 'be2net'
Sathya Perla says:
====================
be2net: patch set
Hi David, this patch set includes 3 bug fixes to the be2net driver.
Patch 1 fixes a vlan isolation issue with VFs. When a VF is placed in
promiscous mode, it could receive packets belonging to any vlan, as
the PF driver grants vlan promisc capability to VFs. The PF
driver now disables the vlan promisc capability for VFs to fix this
problem.
Patch 2 fixes the call to MODIFY_EQ_DELAY FW cmd to not include more
than 8 EQs per cmd. The FW is not capable of handling more than 8 EQs
per cmd.
Patch 3 fixes an EEH error detection issue. On Power platforms,
when an EEH error occurs, the slot disconnect state is more reliably
detected via an MMIO read compared to a config read. So, the error
register reads that occur every second are now done via MMIO.
Pls apply this patch set to the "net" tree. Thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Suresh Reddy [Fri, 20 Mar 2015 10:28:25 +0000 (06:28 -0400)]
be2net: use PCI MMIO read instead of config read for errors
When an EEH error occurs, the device/slot is disconnected. This condition
is more reliably detected (i.e., returns all ones) with an MMIO read rather
than a config read -- especially on power platforms.
Hence, this patch fixes EEH error detection by replacing config reads with
MMIO reads for reading the error registers. The error registers in
Skyhawk-R/BE2/BE3 are accessible both via the config space and the
PCICFG (BAR0) memory space.
Reported-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suresh Reddy [Fri, 20 Mar 2015 10:28:24 +0000 (06:28 -0400)]
be2net: restrict MODIFY_EQ_DELAY cmd to a max of 8 EQs
Issuing this cmd for more than 8 EQs does not have the intended effect
even on BEx and Skyhawk-R.
This patch fixes this by issuing this cmd for upto 8 EQs at a time.
Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasundhara Volam [Fri, 20 Mar 2015 10:28:23 +0000 (06:28 -0400)]
be2net: Prevent VFs from enabling VLAN promiscuous mode
Currently, a PF does not restrict its VF interface from enabling vlan
promiscuous mode. This breaks vlan isolation when a vlan
(transparent tagging) is configured on a VF.
This patch fixes this problem by disabling the vlan promisc capability
for VFs.
Reported-by: Yoann Juet <veilletechno-irts@univ-nantes.fr>
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Josh Hunt [Thu, 19 Mar 2015 23:19:30 +0000 (19:19 -0400)]
tcp: fix tcp fin memory accounting
tcp_send_fin() does not account for the memory it allocates properly, so
sk_forward_alloc can be negative in cases where we've sent a FIN:
ss example output (ss -amn | grep -B1 f4294):
tcp FIN-WAIT-1 0 1 192.168.0.1:45520 192.0.2.1:8080
skmem:(r0,rb87380,t0,tb87380,
f4294966016,w1280,o0,bl0)
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steven Barth [Thu, 19 Mar 2015 15:16:04 +0000 (16:16 +0100)]
ipv6: fix backtracking for throw routes
for throw routes to trigger evaluation of other policy rules
EAGAIN needs to be propagated up to fib_rules_lookup
similar to how its done for IPv4
A simple testcase for verification is:
ip -6 rule add lookup 33333 priority 33333
ip -6 route add throw 2001:db8::1
ip -6 route add 2001:db8::1 via fe80::1 dev wlan0 table 33333
ip route get 2001:db8::1
Signed-off-by: Steven Barth <cyrus@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Markos Chandras [Thu, 19 Mar 2015 10:28:14 +0000 (10:28 +0000)]
net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}
On a MIPS Malta board, tons of fifo underflow errors have been observed
when using u-boot as bootloader instead of YAMON. The reason for that
is that YAMON used to set the pcnet device to SRAM mode but u-boot does
not. As a result, the default Tx threshold (64 bytes) is now too small to
keep the fifo relatively used and it can result to Tx fifo underflow errors.
As a result of which, it's best to setup the SRAM on supported controllers
so we can always use the NOUFLO bit.
Cc: <netdev@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: Don Fry <pcnet32@frontier.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Thu, 19 Mar 2015 10:22:32 +0000 (11:22 +0100)]
ipv6: call ipv6_proxy_select_ident instead of ipv6_select_ident in udp6_ufo_fragment
Matt Grant reported frequent crashes in ipv6_select_ident when
udp6_ufo_fragment is called from openvswitch on a skb that doesn't
have a dst_entry set.
ipv6_proxy_select_ident generates the frag_id without using the dst
associated with the skb. This approach was suggested by Vladislav
Yasevich.
Fixes: 0508c07f5e0c ("ipv6: Select fragment id during UFO segmentation if not set.")
Cc: Vladislav Yasevich <vyasevic@redhat.com>
Reported-by: Matt Grant <matt@mattgrant.net.nz>
Tested-by: Matt Grant <matt@mattgrant.net.nz>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Palik, Imre [Thu, 19 Mar 2015 10:05:42 +0000 (11:05 +0100)]
xen-netback: making the bandwidth limiter runtime settable
With the current netback, the bandwidth limiter's parameters are only
settable during vif setup time. This patch register a watch on them, and
thus makes them runtime changeable.
When the watch fires, the timer is reset. The timer's mutex is used for
fencing the change.
Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: Imre Palik <imrep@amazon.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 20 Mar 2015 16:40:33 +0000 (12:40 -0400)]
Merge branch 'listener_refactor_part_14'
Eric Dumazet says:
====================
inet: tcp listener refactoring part 14
OK, we have serious patches here.
We get rid of the central timer handling SYNACK rtx,
which is killing us under even medium SYN flood.
We still use the listener specific hash table.
This will be done in next round ;)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>