Jon Maloy [Mon, 19 Feb 2018 11:48:21 +0000 (12:48 +0100)]
tipc: fix bug on error path in tipc_topsrv_kern_subscr()
In commit
cc1ea9ffadf7 ("tipc: eliminate struct tipc_subscriber") we
re-introduced an old bug on the error path in the function
tipc_topsrv_kern_subscr(). We now re-introduce the correction too.
Reported-by: syzbot+f62e0f2a0ef578703946@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 19 Feb 2018 19:19:12 +0000 (14:19 -0500)]
Merge branch 'pernet_ops-conversions-part-2'
Kirill Tkhai says:
====================
Converting pernet_operations (part #2)
This patchset continues to review and to convert pernet_operations
to async. There are mostly ipv6, also some regular used netfilter
pernet_operations are involved. One more converted is cfg80211_pernet_ops.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:50:54 +0000 (11:50 +0300)]
net: Convert iptable_filter_net_ops
These pernet_operations register and unregister
net::ipv4.iptable_filter table. Since there are
no packets in-flight at the time of exit method
is working, iptables rules should not be touched.
Also, pernet_operations should not send ipv4
packets each other. So, it's safe to mark them
async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:50:45 +0000 (11:50 +0300)]
net: Convert ip_tables_net_ops, udplite6_net_ops and xt_net_ops
ip_tables_net_ops and udplite6_net_ops create and destroy /proc entries.
xt_net_ops does nothing.
So, we are able to mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:50:37 +0000 (11:50 +0300)]
net: Convert ip6_frags_ops
Exit methods calls inet_frags_exit_net() with global ip6_frags
as argument. So, after we make the pernet_operations async,
a pair of exit methods may be called to iterate this hash table.
Since there is inet_frag_worker(), which already may work
in parallel with inet_frags_exit_net(), and it can make the same
cleanup, that inet_frags_exit_net() does, it's safe. So we may
mark these pernet_operations as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:50:28 +0000 (11:50 +0300)]
net: Convert fib6_net_ops, ipv6_addr_label_ops and ip6_segments_ops
These pernet_operations register and unregister tables
and lists for packets forwarding. All of the entities
are per-net. Init methods makes simple initializations,
and since net is not visible for foreigners at the time
it is working, it can't race with anything. Exit method
is executed when there are only local devices, and there
mustn't be packets in-flight. Also, it looks like no one
pernet_operations want to send ipv6 packets to foreign
net. The same reasons are for ipv6_addr_label_ops and
ip6_segments_ops. So, we are able to mark all them as
async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:50:18 +0000 (11:50 +0300)]
net: Convert xfrm6_net_ops
These pernet_operations create sysctl tables and
initialize net::xfrm.xfrm6_dst_ops used for routing.
It doesn't look like another pernet_operations send
ipv6 packets to foreign net namespaces, so it should
be safe to mark the pernet_operations as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:50:09 +0000 (11:50 +0300)]
net: Convert ip6_flowlabel_net_ops
These pernet_operations create and destroy /proc entries.
ip6_fl_purge() makes almost the same actions as timer
ip6_fl_gc_timer does, and as it can be executed in parallel
with ip6_fl_purge(), two parallel ip6_fl_purge() may be
executed. So, we can mark it async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:49:59 +0000 (11:49 +0300)]
net: Convert ping_v6_net_ops
These pernet_operations only register and unregister /proc
entries, so it's possible to mark them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:49:49 +0000 (11:49 +0300)]
net: Convert ipv6_sysctl_net_ops
These pernet_operations create and destroy sysctl tables.
They are not touched by another net pernet_operations.
So, it's possible to execute them in parallel with others.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:49:40 +0000 (11:49 +0300)]
net: Convert tcpv6_net_ops
These pernet_operations create and destroy net::ipv6.tcp_sk
socket, which is used in tcp_v6_send_response() only. It looks
like foreign pernet_operations don't want to set ipv6 connection
inside destroyed net, so this socket may be created in destroyed
in parallel with anything else. inet_twsk_purge() is also safe
for that, as described in patch for tcp_sk_ops. So, it's possible
to mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:49:31 +0000 (11:49 +0300)]
net: Convert fib6_rules_net_ops
These pernet_operations register and unregister
net::ipv6.fib6_rules_ops, which are used for
routing. It looks like there are no pernet_operations,
which send ipv6 packages to another net, so we
are able to mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:49:20 +0000 (11:49 +0300)]
net: Convert ipv6_inetpeer_ops
net->ipv6.peers is dereferenced in three places via inet_getpeer_v6(),
and it's used to handle skb. All the users of inet_getpeer_v6() do not
look like be able to be called from foreign net pernet_operations, so
we may mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:49:10 +0000 (11:49 +0300)]
net: Convert raw6_net_ops, udplite6_net_ops, ipv6_proc_ops, if6_proc_net_ops and ip6_route_net_late_ops
These pernet_operations create and destroy /proc entries
and safely may be converted and safely may be mark as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:48:57 +0000 (11:48 +0300)]
net: Convert icmpv6_sk_ops, ndisc_net_ops and igmp6_net_ops
These pernet_operations create and destroy net::ipv6.icmp_sk
socket, used to send ICMP or error reply.
Nobody can dereference the socket to handle a packet before
net is initialized, as there is no routing; nobody can do
that in parallel with exit, as all of devices are moved
to init_net or destroyed and there are no packets it-flight.
So, it's possible to mark these pernet_operations as async.
The same for ndisc_net_ops and for igmp6_net_ops. The last
one also creates and destroys /proc entries.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:48:37 +0000 (11:48 +0300)]
net: Convert ip6mr_net_ops
These pernet_operations create and destroy /proc entries,
populate and depopulate net::rules_ops and multiroute table.
All the structures are pernet, and they are not touched
by foreign net pernet_operations. So, it's possible to mark
them async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:48:14 +0000 (11:48 +0300)]
net: Convert cfg80211_pernet_ops
This patch finishes converting pernet_operations
registered in net/wireless directory.
These pernet_operations have only exit method,
which moves devices to init_net. This action
is not pernet_operations-specific, and function
cfg80211_switch_netns() may be called all time
during the system life. All necessary protection
against concurrent cfg80211_pernet_exit() is made
by rtnl_lock(). So, cfg80211_pernet_ops is able
to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 19 Feb 2018 08:48:04 +0000 (11:48 +0300)]
net: Convert inet6_net_ops
init method initializes sysctl defaults, allocates
percpu arrays and creates /proc entries.
exit method reverts the above.
There are no pernet_operations, which are interested
in the above entities of foreign net namespace, so
inet6_net_ops are able to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jake Moroni [Sun, 18 Feb 2018 20:26:04 +0000 (15:26 -0500)]
dpaa_eth: fix pause capability advertisement logic
The ADVERTISED_Asym_Pause bit was being improperly set when both
rx and tx pause were enabled. When rx and tx are both enabled, only
the ADVERTISED_Pause bit is supposed to be set.
Signed-off-by: Jake Moroni <mail@jakemoroni.com>
Acked-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 18 Feb 2018 15:21:05 +0000 (18:21 +0300)]
sh_eth: simplify sh_eth_check_reset()
The *while* loop in this function can be turned into a normal *for* loop.
And getting rid of the single return point saves us a few more LoCs...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 19 Feb 2018 16:26:32 +0000 (11:26 -0500)]
Merge branch 'dwmac-meson8b-small-cleanup'
Martin Blumenstingl says:
====================
dwmac-meson8b: small cleanup
This is a follow-up to my previous series "dwmac-meson8b: clock fixes for
Meson8b" from [0].
during the review of that series it was found that the clock registration
could be simplified. now that the previous series has landed we can start
cleaning up the clock registration.
the goal of this series is to simplify the code in the dwmac-meson8b
driver. no functional changes are intended.
I have tested this on my Khadas VIM2 (GXM SoC, with RGMII PHY) and my
Endless Mini (EC-100, Meson8b SoC with RMII PHY, .dts support is not part
of mainline yet). no problems were found.
[0] http://lists.infradead.org/pipermail/linux-amlogic/2018-January/006143.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Sat, 17 Feb 2018 14:08:20 +0000 (15:08 +0100)]
net: stmmac: dwmac-meson8b: make the clock configurations private
The common clock framework needs access to the "clock configuration"
structs during runtime.
However, only the common clock framework should access these. Ensure
this by moving the configuration structs out of struct meson8b_dwmac,
so only meson8b_init_rgmii_tx_clk() and the common clock framework know
about these configurations.
Suggested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Sat, 17 Feb 2018 14:08:19 +0000 (15:08 +0100)]
net: stmmac: dwmac-meson8b: only keep struct device around
Nothing in the dwmac-meson8b driver (except .probe itself) requires the
platform_device anymore after .probe has finished. Replace it with a
pointer to struct device since this is what the functions inside the
driver are actually accessing.
No functional changes.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl [Sat, 17 Feb 2018 14:08:18 +0000 (15:08 +0100)]
net: stmmac: dwmac-meson8b: simplify clock registration
To goal of this patch is to simplify the registration of the RGMII TX
clock (and it's parent clocks). This is achieved by:
- introducing the meson8b_dwmac_register_clk helper-function to remove
code duplication when registering a single clock (this saves a few
lines since we have 4 clocks internally)
- using devm_add_action_or_reset to disable the RGMII TX clock
automatically when needed. This also allows us to re-use the standard
stmmac_pltfr_remove function.
- devm_kasprintf() and devm_kstrdup() are not used anymore to generate
the clock name (these are replaced by a variable on the stack) because
the common clock framework already uses kstrdup() internally.
No functional changes intended.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gustavo A. R. Silva [Fri, 16 Feb 2018 17:47:39 +0000 (11:47 -0600)]
net: dsa: mv88e6xxx: hwtstamp: remove unnecessary range checking tests
_port_ is already known to be a valid index in the callers [1]. So
these checks are unnecessary.
[1] https://lkml.org/lkml/2018/2/16/469
Addresses-Coverity-ID:
1465287
Addresses-Coverity-ID:
1465291
Suggested-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Parri [Fri, 16 Feb 2018 11:06:13 +0000 (12:06 +0100)]
ptr_ring: Remove now-redundant smp_read_barrier_depends()
Because READ_ONCE() now implies smp_read_barrier_depends(), the
smp_read_barrier_depends() in __ptr_ring_consume() is redundant;
this commit removes it and updates the comments.
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <linux-kernel@vger.kernel.org>
Cc: <netdev@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Fri, 16 Feb 2018 10:03:07 +0000 (11:03 +0100)]
tun: export flags, uid, gid, queue information over netlink
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 16 Feb 2018 19:03:03 +0000 (11:03 -0800)]
net: Only honor ifindex in IP_PKTINFO if non-0
Only allow ifindex from IP_PKTINFO to override SO_BINDTODEVICE settings
if the index is actually set in the message.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 16 Feb 2018 16:55:05 +0000 (16:55 +0000)]
net: dsa: mv88e6xxx: avoid unintended sign extension on a 16 bit shift
The shifting of timehi by 16 bits to the left will be promoted to
a 32 bit signed int and then sign-extended to an u64. If the top bit
of timehi is set then all then all the upper bits of ns end up as also
being set because of the sign-extension. Fix this by making timehi and
timelo u64. Also move the declaration of ns.
Detected by CoverityScan, CID#
1465288 ("Unintended sign extension")
Fixes: c6fe0ad2c349 ("net: dsa: mv88e6xxx: add rx/tx timestamping support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Söderlund [Fri, 16 Feb 2018 16:10:08 +0000 (17:10 +0100)]
ravb: add support for changing MTU
Allow for changing the MTU within the limit of the maximum size of a
descriptor (2048 bytes). Add the callback to change MTU from user-space
and take the configurable MTU into account when configuring the
hardware.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Feb 2018 21:24:24 +0000 (16:24 -0500)]
Merge branch 'nfp-whitespace-sync-and-flower-TCP-flags'
Jakub Kicinski says:
====================
nfp: whitespace sync and flower TCP flags
Whitespace cleanup from Michael and flower offload support for matching
on TCP flags from Pieter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pieter Jansen van Vuuren [Fri, 16 Feb 2018 04:19:09 +0000 (20:19 -0800)]
nfp: flower: implement tcp flag match offload
Implement tcp flag match offloading. Current tcp flag match support include
FIN, SYN, RST, PSH and URG flags, other flags are unsupported. The PSH and
URG flags are only set in the hardware fast path when used in combination
with the SYN, RST and PSH flags.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Rapson [Fri, 16 Feb 2018 04:19:08 +0000 (20:19 -0800)]
nfp: standardize FW header whitespace
The nfp_net_ctrl.h file used spaces for indentation in the past but
tabs have crept in. Host driver files use tabs for indentation by
default, so let's convert to tabs for consistency across the file
and our drivers.
Signed-off-by: Michael Rapson <michael.rapson@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Feb 2018 21:05:59 +0000 (16:05 -0500)]
Merge branch 'net-sched-act-add-extack-support'
Alexander Aring says:
====================
net: sched: act: add extack support
this patch series adds extack support for the TC action subsystem.
As example I for the extack support in a TC action I choosed mirred
action.
- Alex
Cc: David Ahern <dsahern@gmail.com>
changes since v3:
- adapt recommended changes from Davide Caratti, please check if
I catch everything. Thanks.
changes since v2:
- remove newline in extack of generic walker handling
Thanks to Davide Caratti
- add kernel@mojatatu.com in cc
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 15 Feb 2018 15:55:00 +0000 (10:55 -0500)]
net: sched: act: mirred: add extack support
This patch adds extack support for TC mirred action.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 15 Feb 2018 15:54:59 +0000 (10:54 -0500)]
net: sched: act: handle extack in tcf_generic_walker
This patch adds extack handling for a common used TC act function
"tcf_generic_walker()" to add an extack message on failures.
The tcf_generic_walker() function can fail if get a invalid command
different than DEL and GET. The naming "action" here is wrong, the
correct naming would be command.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 15 Feb 2018 15:54:58 +0000 (10:54 -0500)]
net: sched: act: add extack for walk callback
This patch adds extack support for act walker callback api. This
prepares to handle extack support inside each specific act
implementation.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 15 Feb 2018 15:54:57 +0000 (10:54 -0500)]
net: sched: act: add extack for lookup callback
This patch adds extack support for act lookup callback api. This
prepares to handle extack support inside each specific act
implementation.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 15 Feb 2018 15:54:56 +0000 (10:54 -0500)]
net: sched: act: add extack to init callback
This patch adds extack support for act init callback api. This
prepares to handle extack support inside each specific act
implementation.
Based on work by David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 15 Feb 2018 15:54:55 +0000 (10:54 -0500)]
net: sched: act: handle generic action errors
This patch adds extack support for generic act handling. The extack
will be set deeper to each called function which is not part of netdev
core api.
Based on work by David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 15 Feb 2018 15:54:54 +0000 (10:54 -0500)]
net: sched: act: add extack to init
This patch adds extack to tcf_action_init and tcf_action_init_1
functions. These are necessary to make individual extack handling in
each act implementation.
Based on work by David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 15 Feb 2018 15:54:53 +0000 (10:54 -0500)]
net: sched: act: fix code style
This patch is used by subsequent patches. It fixes code style issues
caught by checkpatch.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Feb 2018 21:04:18 +0000 (16:04 -0500)]
Merge branch 'RDS-zerocopy-support'
Sowmini Varadhan says:
====================
RDS: zerocopy support
This is version 3 of the series, following up on review comments for
http://patchwork.ozlabs.org/project/netdev/list/?series=28530
Review comments addressed
Patch 4
- fix fragile use of skb->cb[], do not set ee_code incorrectly.
Patch 5:
- remove needless bzero of skb->cb[], consolidate err cleanup
A brief overview of this feature follows.
This patch series provides support for MSG_ZERCOCOPY
on a PF_RDS socket based on the APIs and infrastructure added
by Commit
f214f915e7db ("tcp: enable MSG_ZEROCOPY")
For single threaded rds-stress testing using rds-tcp with the
ixgbe driver using 1M message sizes (-a 1M -q 1M) preliminary
results show that there is a significant reduction in latency: about
90 usec with zerocopy, compared with 200 usec without zerocopy.
This patchset modifies the above for zerocopy in the following manner.
- if the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and,
- if the SO_ZEROCOPY socket option has been set on the PF_RDS socket,
application pages sent down with rds_sendmsg are pinned. The pinning
uses the accounting infrastructure added by
a91dbff551a6 ("sock: ulimit
on MSG_ZEROCOPY pages"). The message is unpinned when all references
to the message go down to 0, and the message is freed by rds_message_purge.
A multithreaded application using this infrastructure must send down
a unique 32 bit cookie as ancillary data with each sendmsg invocation.
The format of this ancillary data is described in Patch 5 of the series.
The cookie is passed up to the application on the sk_error_queue when
the message is unpinned, indicating to the application that it is now
safe to free/reuse the message buffer. The details of the completion
notification are provided in Patch 4 of this series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:38 +0000 (10:49 -0800)]
selftests/net: add zerocopy support for PF_RDS test case
Send a cookie with sendmsg() on PF_RDS sockets, and process the
returned batched cookies in do_recv_completion()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:37 +0000 (10:49 -0800)]
selftests/net: add support for PF_RDS sockets
Add support for basic PF_RDS client-server testing in msg_zerocopy
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:36 +0000 (10:49 -0800)]
rds: zerocopy Tx support.
If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and,
if the SO_ZEROCOPY socket option has been set on the PF_RDS socket,
application pages sent down with rds_sendmsg() are pinned.
The pinning uses the accounting infrastructure added by
Commit
a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages")
The payload bytes in the message may not be modified for the
duration that the message has been pinned. A multi-threaded
application using this infrastructure may thus need to be notified
about send-completion so that it can free/reuse the buffers
passed to rds_sendmsg(). Notification of send-completion will
identify each message-buffer by a cookie that the application
must specify as ancillary data to rds_sendmsg().
The ancillary data in this case has cmsg_level == SOL_RDS
and cmsg_type == RDS_CMSG_ZCOPY_COOKIE.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:35 +0000 (10:49 -0800)]
rds: support for zcopy completion notification
RDS removes a datagram (rds_message) from the retransmit queue when
an ACK is received. The ACK indicates that the receiver has queued
the RDS datagram, so that the sender can safely forget the datagram.
When all references to the rds_message are quiesced, rds_message_purge
is called to release resources used by the rds_message
If the datagram to be removed had pinned pages set up, add
an entry to the rs->rs_znotify_queue so that the notifcation
will be sent up via rds_rm_zerocopy_callback() when the
rds_message is eventually freed by rds_message_purge.
rds_rm_zerocopy_callback() attempts to batch the number of cookies
sent with each notification to a max of SO_EE_ORIGIN_MAX_ZCOOKIES.
This is achieved by checking the tail skb in the sk_error_queue:
if this has room for one more cookie, the cookie from the
current notification is added; else a new skb is added to the
sk_error_queue. Every invocation of rds_rm_zerocopy_callback() will
trigger a ->sk_error_report to notify the application.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:34 +0000 (10:49 -0800)]
sock: permit SO_ZEROCOPY on PF_RDS socket
allow the application to set SO_ZEROCOPY on the underlying sk
of a PF_RDS socket
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:33 +0000 (10:49 -0800)]
rds: hold a sock ref from rds_message to the rds_sock
The existing model holds a reference from the rds_sock to the
rds_message, but the rds_message does not itself hold a sock_put()
on the rds_sock. Instead the m_rs field in the rds_message is
assigned when the message is queued on the sock, and nulled when
the message is dequeued from the sock.
We want to be able to notify userspace when the rds_message
is actually freed (from rds_message_purge(), after the refcounts
to the rds_message go to 0). At the time that rds_message_purge()
is called, the message is no longer on the rds_sock retransmit
queue. Thus the explicit reference for the m_rs is needed to
send a notification that will signal to userspace that
it is now safe to free/reuse any pages that may have
been pinned down for zerocopy.
This patch manages the m_rs assignment in the rds_message with
the necessary refcount book-keeping.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 15 Feb 2018 18:49:32 +0000 (10:49 -0800)]
skbuff: export mm_[un]account_pinned_pages for other modules
RDS would like to use the helper functions for managing pinned pages
added by Commit
a91dbff551a6 ("sock: ulimit on MSG_ZEROCOPY pages")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Feb 2018 21:03:39 +0000 (16:03 -0500)]
net: Revert sched action extack support series.
It was mis-applied and the changes had rejects.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Feb 2018 20:44:42 +0000 (15:44 -0500)]
Merge branch 'net-sched-act-add-extack-support'
Alexander Aring says:
====================
net: sched: act: add extack support
this patch series adds extack support for the TC action subsystem.
As example I for the extack support in a TC action I choosed mirred
action.
- Alex
Cc: David Ahern <dsahern@gmail.com>
changes since v3:
- adapt recommended changes from Davide Caratti, please check if
I catch everything. Thanks.
changes since v2:
- remove newline in extack of generic walker handling
Thanks to Davide Caratti
- add kernel@mojatatu.com in cc
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 15 Feb 2018 15:54:54 +0000 (10:54 -0500)]
net: sched: act: add extack to init
This patch adds extack to tcf_action_init and tcf_action_init_1
functions. These are necessary to make individual extack handling in
each act implementation.
Based on work by David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 15 Feb 2018 15:54:53 +0000 (10:54 -0500)]
net: sched: act: fix code style
This patch is used by subsequent patches. It fixes code style issues
caught by checkpatch.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 15 Feb 2018 14:50:57 +0000 (15:50 +0100)]
net: sched: fix unbalance in the error path of tca_action_flush()
When tca_action_flush() calls the action walk() and gets an error,
a successful call to nla_nest_start() is not followed by a call to
nla_nest_cancel(). It's harmless, as the skb is freed in the error
path - but it's worth to fix this unbalance.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Feb 2018 20:37:10 +0000 (15:37 -0500)]
Merge branch 'dsa-mv88e6xxx-Improve-PTP-access-latency'
Andrew Lunn says:
====================
net: dsa: mv88e6xxx: Improve PTP access latency
PTP needs to retrieve the hardware timestamps from the switch device
in a low latency manor. However ethtool -S and bridge fdb show can
hold the switch register access mutex for a long time. These patches
changes the reading the statistics and the ATU so that the mutex is
released and taken again between each statistic or ATU entry. The PTP
code can then interleave its access to the hardware, keeping its
latency low.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 15 Feb 2018 13:38:35 +0000 (14:38 +0100)]
net: dsa: mv88e6xxx: Release mutex between each ATU read
The PTP code needs low latency access to the PTP hardware timestamps.
Reading all the ATU entries in one go adds a lot of latency to the PTP
code. So take and release the reg_lock mutex for each individual MAC
address in the ATU, allowing the PTP thread jump in between.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 15 Feb 2018 13:38:34 +0000 (14:38 +0100)]
net: dsa: mv88e6xxx: Release mutex between each statistics read
The PTP code needs low latency access to the PTP hardware timestamps.
Reading all the statistics in one go adds a lot of latency to the PTP
code. So take and release the reg_lock mutex for each individual
statistics, allowing the PTP thread jump in between.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Feb 2018 20:26:35 +0000 (15:26 -0500)]
Merge branch 'tipc-de-generealize-topology-server'
Jon Maloy says:
====================
tipc: de-generealize topology server
The topology server is partially based on a template that is much
more generic than what we need. This results in a code that is
unnecessarily hard to follow and keeping bug free.
We now take the consequence of the fact that we only have one such
server in TIPC, - with no prospects for introducing any more, and
adapt the code to the specialized task is really is doing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Thu, 15 Feb 2018 09:40:51 +0000 (10:40 +0100)]
tipc: rename tipc_server to tipc_topsrv
We rename struct tipc_server to struct tipc_topsrv. This reflect its now
specialized role as topology server. Accoringly, we change or add function
prefixes to make it clearer which functionality those belong to.
There are no functional changes in this commit.
Acked-by: Ying.Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Thu, 15 Feb 2018 09:40:50 +0000 (10:40 +0100)]
tipc: separate topology server listener socket from subcsriber sockets
We move the listener socket to struct tipc_server and give it its own
work item. This makes it easier to follow the code, and entails some
simplifications in the reception code in subscriber sockets.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Thu, 15 Feb 2018 09:40:49 +0000 (10:40 +0100)]
tipc: make struct tipc_server private for server.c
In order to narrow the interface and dependencies between the topology
server and the subscription/binding table functionality we move struct
tipc_server inside the file server.c. This requires some code
adaptations in other files, but those are mostly minor.
The most important change is that we have to move the start/stop
functions for the topology server to server.c, where they logically
belong anyway.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Thu, 15 Feb 2018 09:40:48 +0000 (10:40 +0100)]
tipc: some prefix changes
Since we now have removed struct tipc_subscriber from the code, and
only struct tipc_subscription remains, there is no longer need for long
and awkward prefixes to distinguish between their pertaining functions.
We now change all tipc_subscrp_* prefixes to tipc_sub_*. This is
a purely cosmetic change.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Thu, 15 Feb 2018 09:40:47 +0000 (10:40 +0100)]
tipc: collapse subscription creation functions
After the previous changes it becomes logical to collapse the two-level
creation of subscription instances into one. We do that here.
We also rename the creation and deletion functions for more consistency.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Thu, 15 Feb 2018 09:40:46 +0000 (10:40 +0100)]
tipc: simplify endianness handling in topology subscriber
Because of the requirement for total distribution transparency, users
send subscriptions and receive topology events in their own host format.
It is up to the topology server to determine this format and do the
correct conversions to and from its own host format when needed.
Until now, this has been handled in a rather non-transparent way inside
the topology server and subscriber code, leading to unnecessary
complexity when creating subscriptions and issuing events.
We now improve this situation by adding two new macros, tipc_sub_read()
and tipc_evt_write(). Both those functions calculate the need for
conversion internally before performing their respective operations.
Hence, all handling of such conversions become transparent to the rest
of the code.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Thu, 15 Feb 2018 09:40:45 +0000 (10:40 +0100)]
tipc: simplify interaction between subscription and topology connection
The message transmission and reception in the topology server is more
generic than is currently necessary. By basing the funtionality on the
fact that we only send items of type struct tipc_event and always
receive items of struct tipc_subcr we can make several simplifications,
and also get rid of some unnecessary dynamic memory allocations.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Thu, 15 Feb 2018 09:40:44 +0000 (10:40 +0100)]
tipc: eliminate struct tipc_subscriber
It is unnecessary to keep two structures, struct tipc_conn and struct
tipc_subscriber, with a one-to-one relationship and still with different
life cycles. The fact that the two often run in different contexts, and
still may access each other via direct pointers constitutes an additional
hazard, something we have experienced at several occasions, and still
see happening.
We have identified at least two remaining problems that are easier to
fix if we simplify the topology server data structure somewhat.
- When there is a race between a subscription up/down event and a
timeout event, it is fully possible that the former might be delivered
after the latter, leading to confusion for the receiver.
- The function tipc_subcrp_timeout() is executing in interrupt context,
while the following call chain is at least theoretically possible:
tipc_subscrp_timeout()
tipc_subscrp_send_event()
tipc_conn_sendmsg()
conn_put()
tipc_conn_kref_release()
sock_release(sock)
I.e., we end up calling a function that might try to sleep in
interrupt context. To eliminate this, we need to ensure that the
tipc_conn structure and the socket, as well as the subscription
instances, only are deleted in work queue context, i.e., after the
timeout event really has been sent out.
We now remove this unnecessary complexity, by merging data and
functionality of the subscriber structure into struct tipc_conn
and the associated file server.c. We thereafter add a spinlock and
a new 'inactive' state to the subscription structure. Using those,
both problems described above can be easily solved.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Thu, 15 Feb 2018 09:40:43 +0000 (10:40 +0100)]
tipc: remove unnecessary function pointers
Interaction between the functionality in server.c and subscr.c is
done via function pointers installed in struct server. This makes
the code harder to follow, and doesn't serve any obvious purpose.
Here, we replace the function pointers with direct function calls.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Thu, 15 Feb 2018 09:40:42 +0000 (10:40 +0100)]
tipc: remove redundant code in topology server
The socket handling in the topology server is unnecessarily generic.
It is prepared to handle both SOCK_RDM, SOCK_DGRAM and SOCK_STREAM
type sockets, as well as the only socket type which is really used,
SOCK_SEQPACKET.
We now remove this redundant code to make the code more readable.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Feb 2018 20:43:49 +0000 (15:43 -0500)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:
====================
pull request: bluetooth-next 2018-02-15
Here's the first bluetooth-next pull request targetting the 4.17 kernel
release.
- Fixes & cleanups to Atheros and Marvell drivers
- Support for two new Realtek controllers
- Support for new Intel Bluetooth controller
- Fix for supporting multiple slave-role Bluetooth LE connections
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Bhole [Thu, 15 Feb 2018 00:19:26 +0000 (09:19 +0900)]
selftests/net: fixes psock_fanout eBPF test case
eBPF test fails due to verifier failure because log_buf is too small.
Fixed by increasing log_buf size
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 14 Feb 2018 22:24:28 +0000 (14:24 -0800)]
net/ipv4: Remove fib table id from rtable
Remove rt_table_id from rtable. It was added for getroute to return the
table id that was hit in the lookup. With the changes for fibmatch the
table id can be extracted from the fib_info returned in the fib_result
so it no longer needs to be in rtable directly.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Feb 2018 20:38:33 +0000 (15:38 -0500)]
Merge branch 'tc-testing-plugin-architecture'
Brenda J. Butler says:
====================
tools: tc-testing: Plugin Architecture
To make tdc.py more general, we are introducing a plugin architecture.
This patch set first organizes the command line parameters, then
introduces the plugin architecture and some example plugins.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenda J. Butler [Wed, 14 Feb 2018 19:09:25 +0000 (14:09 -0500)]
tools: tc-testing: Update README and TODO
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenda J. Butler [Wed, 14 Feb 2018 19:09:24 +0000 (14:09 -0500)]
tools: tc-testing: valgrindPlugin
Run the command under test under valgrind. Produce an extra set of
tap output for the memory check on each test.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenda J. Butler [Wed, 14 Feb 2018 19:09:23 +0000 (14:09 -0500)]
tools: tc-testing: nsPlugin
Move the functionality of creating a namespace before the test suite
and destroying it afterwards to a plugin.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenda J. Butler [Wed, 14 Feb 2018 19:09:22 +0000 (14:09 -0500)]
tools: tc-testing: rootPlugin
Move the functionality that checks for root permissions into a plugin.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenda J. Butler [Wed, 14 Feb 2018 19:09:21 +0000 (14:09 -0500)]
tools: tc-testing: Introduce plugin architecture
This should be a general test architecture, and yet allow specific
tests to be done. Introduce a plugin architecture.
An individual test has 4 stages, setup/execute/verify/teardown. Each
plugin gets a chance to run a function at each stage, plus one call
before all the tests are called ("pre" suite) and one after all the
tests are called ("post" suite). In addition, just before each
command is executed, the plugin gets a chance to modify the command
using the "adjust_command" hook. This makes the test suite quite
flexible.
Future patches will take some functionality out of the tdc.py script and
place it in plugins.
To use the plugins, place the implementation in the plugins directory
and run tdc.py. It will notice the plugins and use them.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenda J. Butler [Wed, 14 Feb 2018 19:09:20 +0000 (14:09 -0500)]
tools: tc-testing: Refactor test-runner
Split the test_runner function into the loop part (test_runner)
and the contents (run_one_test) for maintainability.
It makes it a little easier to catch exceptions
in an individual test, and keep going (and flush a bunch
of tap results for the skipped tests).
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenda J. Butler [Wed, 14 Feb 2018 19:09:19 +0000 (14:09 -0500)]
tools: tc-testing: Command line parms
Separate the functionality of the command line parameters into "selection"
parameters, "action" parameters and other parameters.
"Selection" parameters are for choosing which tests on which to act.
"Action" parameters are for choosing what to do with the selected tests.
"Other" parameters are for global effect (like "help" or "verbose").
With this commit, we add the ability to name a directory as another
selection mechanism. We can accumulate a number of tests by directory,
file, category, or even by test id, instead of being constrained to
run all tests in one collection or just one test.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Acked-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 15 Feb 2018 20:34:42 +0000 (15:34 -0500)]
Merge branch 'tunchr-get-netns'
Kirill Tkhai says:
====================
net: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device
Currently, it's not possible to get or check net namespace,
which was used to create tun socket. User may have two tun
devices with the same names in different nets, and there
is no way to differ them each other.
The patchset adds support for ioctl() cmd SIOCGSKNS for tun
devices. It will allow people to obtain net namespace file
descriptor like we allow to do that for sockets in general.
v2: Add new patch [2/3] to export open_related_ns().
====================
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Wed, 14 Feb 2018 13:40:14 +0000 (16:40 +0300)]
tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device
This patch adds possibility to get tun device's net namespace fd
in the same way we allow to do that for sockets.
Socket ioctl numbers do not intersect with tun-specific, and there
is already SIOCSIFHWADDR used in tun code. So, SIOCGSKNS number
is choosen instead of custom-made for this functionality.
Note, that open_related_ns() uses plain get_net_ns() and it's safe
(net can't be already dead at this moment):
tun socket is allocated via sk_alloc() with zero last arg (kern = 0).
So, each alive socket increments net::count, and the socket is definitely
alive during ioctl syscall.
Also, common variable net is introduced, so small cleanup in TUNSETIFF
is made.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Wed, 14 Feb 2018 13:40:05 +0000 (16:40 +0300)]
net: Export open_related_ns()
This function will be used to obtain net of tun device.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Wed, 14 Feb 2018 13:39:56 +0000 (16:39 +0300)]
net: Make extern and export get_net_ns()
This function will be used to obtain net of tun device.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Feb 2018 20:44:05 +0000 (15:44 -0500)]
Merge branch '40GbE' of git://git./linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-02-14
This patch series enables the new mqprio hardware offload mechanism
creating traffic classes on VFs for XL710 devices. The parameters
needed to configure these traffic classes/queue channels are provides
by the user via the tc tool. A maximum of four traffic classes can be
created on each VF. This patch series also enables application of cloud
filters to each of these traffic classes. The cloud filters are applied
using the tc-flower classifier.
Example:
1. tc qdisc add dev vf0 root mqprio num_tc 4 map 0 0 0 0 1 2 2 3\
queues 2@0 2@2 1@4 1@5 hw 1 mode channel
2. tc qdisc add dev vf0 ingress
3. ethtool -K vf0 hw-tc-offload on
4. ip link set eth0 vf 0 spoofchk off
5. tc filter add dev vf0 protocol ip parent ffff: prio 1 flower dst_ip\
192.168.3.5/32 ip_proto udp dst_port 25 skip_sw hw_tc 2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 14 Feb 2018 17:22:42 +0000 (09:22 -0800)]
kcm: Call strp_stop before strp_done in kcm_attach
In kcm_attach strp_done is called when sk_user_data is already
set to fail the attach. strp_done needs the strp to be stopped and
warns if it isn't. Call strp_stop in this case to eliminate the
warning message.
Reported-by: syzbot+88dfb55e4c8b770d86e3@syzkaller.appspotmail.com
Fixes: e5571240236c5652f ("kcm: Check if sk_user_data already set in kcm_attach"
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wadim Egorov [Wed, 14 Feb 2018 16:07:12 +0000 (17:07 +0100)]
net: phy: dp83867: Add documentation for CLK_OUT pin muxing
Add documentation of ti,clk-output-sel which can be used to select
a specific clock for CLK_OUT.
Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Signed-off-by: Daniel Schultz <d.schultz@phytec.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wadim Egorov [Wed, 14 Feb 2018 16:07:11 +0000 (17:07 +0100)]
net: phy: dp83867: Add binding for the CLK_OUT pin muxing option
The DP83867 has a muxing option for the CLK_OUT pin. It is possible
to set CLK_OUT for different channels.
Create a binding to select a specific clock for CLK_OUT pin.
Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Signed-off-by: Daniel Schultz <d.schultz@phytec.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Wed, 14 Feb 2018 12:34:39 +0000 (13:34 +0100)]
tipc: apply bearer link tolerance on running links
Currently, the default link tolerance set in struct tipc_bearer only
has effect on links going up after that moment. I.e., a user has to
reset all the node's links across that bearer to have the new value
applied. This is too limiting and disturbing on a running cluster to
be useful.
We now change this so that also already existing links are updated
dynamically, without any need for a reset, when the bearer value is
changed. We leverage the already existing per-link functionality
for this to achieve the wanted effect.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Feb 2018 20:01:52 +0000 (15:01 -0500)]
Merge branch 'cxgb4-speed-up-reading-on-chip-memory'
Rahul Lakkireddy says:
====================
cxgb4: speed up reading on-chip memory
This series of patches speed up reading on-chip memory (EDC and MC)
by reading 64-bits at a time.
Patch 1 reworks logic to read EDC and MC.
Patch 2 adds logic to read EDC and MC 64-bits at a time.
v2:
- Dropped AVX CPU intrinsic instructions.
- Use readq() to read 64-bits at a time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Wed, 14 Feb 2018 07:26:28 +0000 (12:56 +0530)]
cxgb4: speed up on-chip memory read
Use readq() (via t4_read_reg64()) to read 64-bits at a time.
Read residual in 32-bit multiples.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Lakkireddy [Wed, 14 Feb 2018 07:26:27 +0000 (12:56 +0530)]
cxgb4: rework on-chip memory read
Rework logic to read EDC and MC. Do 32-bit reads at a time.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 14 Feb 2018 04:32:04 +0000 (20:32 -0800)]
net: Move ipv4 set_lwt_redirect helper to lwtunnel
IPv4 uses set_lwt_redirect to set the lwtunnel redirect functions as
needed. Move it to lwtunnel.h as lwtunnel_set_redirect and change
IPv6 to also use it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 14 Feb 2018 19:33:38 +0000 (14:33 -0500)]
Merge branch 'PTP-support-for-DSA-and-mv88e6xxx-driver'
Andrew Lunn says:
====================
PTP support for DSA and mv88e6xxx driver.
This patchset adds support for using the PTP hardware in switches
supported by the mv88e6xxx driver. The code was produces in
collaboration with Brandon Streiff doing the initial implementation,
and then Richard Cochran and Andrew Lunn making further changes and
cleanups.
The code is sufficient to use ptp4l on a single DSA interface, either
as a master or a slave. Due to the use of an MDIO bus to access the
switch, reading hardware timestamps is slower than what ptp4l
expects. Thus it is necessary to use the option
--tx_timestamp_timeout=32. Heavy use of ethtool -S, or bridge fdb show
can also upset ptp4l. Patches to address this will follow.
Further work is requires to support bridges using Boundary Clock or
Transparent Clock mode.
Since the RFC, an overflow bug has been fixed. Brandon Streiff
has also Acked-by: the updates to his initial patchset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brandon Streiff [Wed, 14 Feb 2018 00:07:51 +0000 (01:07 +0100)]
net: dsa: mv88e6xxx: add workaround for 6341 timestamping
88E6341 devices default to timestamping at the PHY, but due to a
hardware issue, timestamps via this component are unreliable. For
this family, configure the PTP hardware to force the timestamping
to occur at the MAC.
Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brandon Streiff [Wed, 14 Feb 2018 00:07:50 +0000 (01:07 +0100)]
net: dsa: mv88e6xxx: add rx/tx timestamping support
This patch implements RX/TX timestamping support.
The Marvell PTP hardware supports RX timestamping individual message
types, but for simplicity we only support the EVENT receive filter since
few if any clients bother with the more specific filter types.
checkpatch and reverse Christmas tree changes by Andrew Lunn.
Re-factor duplicated code paths and avoid IfOk anti-pattern, use the
common ptp worker thread from the class layer and time stamp UDP/IPv4
frames as well as Layer-2 frame by Richard Cochran.
Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brandon Streiff [Wed, 14 Feb 2018 00:07:49 +0000 (01:07 +0100)]
net: dsa: forward timestamping callbacks to switch drivers
Forward the rx/tx timestamp machinery from the dsa infrastructure to the
switch driver.
On the rx side, defer delivery of skbs until we have an rx timestamp.
This mimicks the behavior of skb_defer_rx_timestamp.
On the tx side, identify PTP packets, clone them, and pass them to the
underlying switch driver before we transmit. This mimicks the behavior
of skb_tx_timestamp.
Adjusted txstamp API to keep the allocation and freeing of the clone
in the same central function by Richard Cochran
Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brandon Streiff [Wed, 14 Feb 2018 00:07:48 +0000 (01:07 +0100)]
net: dsa: forward hardware timestamping ioctls to switch driver
This patch adds support to the dsa slave network device so that
switch drivers can implement the SIOC[GS]HWTSTAMP ioctls and the
ethtool timestamp-info interface.
Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brandon Streiff [Wed, 14 Feb 2018 00:07:47 +0000 (01:07 +0100)]
net: dsa: mv88e6xxx: add support for event capture
This patch adds support for configuring mv88e6xxx GPIO lines as PTP
pins, so that they may be used for time stamping external events or for
periodic output.
Checkpatch and reverse Christmas tree fixes by Andrew Lunn
Periodic output removed by Richard Cochran, until a better abstraction
of a VCO is added to Linux in general.
Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brandon Streiff [Wed, 14 Feb 2018 00:07:46 +0000 (01:07 +0100)]
net: dsa: mv88e6xxx: add support for GPIO configuration
MV88E6352 and later switches support GPIO control through the "Scratch
& Misc" global2 register. (Older switches do too, though with a slightly
different register interface. Only the 6352-style is implemented here.)
Add a new file, global2_scratch.c, for operations in the Scratch & Misc
space. Additionally, add a GPIO operations structure to present an
abstract view over GPIO manipulation.
Reverse Christmas tree and unsigned has been replaced with unsigned
int by Andrew Lunn.
Signed-off-by: Brandon Streiff <brandon.streiff@ni.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>