Pablo Neira Ayuso [Tue, 27 Mar 2018 09:53:05 +0000 (11:53 +0200)]
netfilter: nf_tables: rename struct nf_chain_type
Use nft_ prefix. By when I added chain types, I forgot to use the
nftables prefix. Rename enum nft_chain_type to enum nft_chain_types too,
otherwise there is an overlap.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Joe Perches [Wed, 21 Mar 2018 11:03:22 +0000 (04:03 -0700)]
netfilter: ebt_stp: Use generic functions for comparisons
Instead of unnecessary const declarations, use the generic functions to
save a little object space.
$ size net/bridge/netfilter/ebt_stp.o*
text data bss dec hex filename
1250 144 0 1394 572 net/bridge/netfilter/ebt_stp.o.new
1344 144 0 1488 5d0 net/bridge/netfilter/ebt_stp.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Wed, 28 Mar 2018 13:00:43 +0000 (15:00 +0200)]
netfilter: add flowtable documentation
This patch adds initial documentation for the Netfilter flowtable
infrastructure.
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Bernie Harris [Wed, 21 Mar 2018 02:42:16 +0000 (15:42 +1300)]
netfilter: ebtables: Add string filter
This patch is part of a proposal to add a string filter to
ebtables, which would be similar to the string filter in
iptables. Like iptables, the ebtables filter uses the xt_string
module.
Signed-off-by: Bernie Harris <bernie.harris@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Bernie Harris [Wed, 21 Mar 2018 02:42:15 +0000 (15:42 +1300)]
netfilter: ebtables: Add support for specifying match revision
Currently ebtables assumes that the revision number of all match
modules is 0, which is an issue when trying to use existing
xtables matches with ebtables. The solution is to modify ebtables
to allow extensions to specify a revision number, similar to
iptables. This gets passed down to the kernel, which is then able
to find the match module correctly.
To main binary backwards compatibility, the size of the ebt_entry
structures is not changed, only the size of the name field is
decreased by 1 byte to make room for the revision field.
Signed-off-by: Bernie Harris <bernie.harris@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Matthias Schiffer [Sun, 4 Mar 2018 08:28:54 +0000 (09:28 +0100)]
netfilter: ebtables: add support for matching IGMP type
We already have ICMPv6 type/code matches (which can be used to distinguish
different types of MLD packets). Add support for IPv4 IGMP matches in the
same way.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Matthias Schiffer [Sun, 4 Mar 2018 08:28:53 +0000 (09:28 +0100)]
netfilter: ebtables: add support for matching ICMP type and code
We already have ICMPv6 type/code matches. This adds support for IPv4 ICMP
matches in the same way.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Tue, 20 Mar 2018 11:33:51 +0000 (12:33 +0100)]
netfilter: ctnetlink: synproxy support
This patch exposes synproxy information per-conntrack. Moreover, send
sequence adjustment events once server sends us the SYN,ACK packet, so
we can synchronize the sequence adjustment too for packets going as
reply from the server, as part of the synproxy logic.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Arushi Singhal [Mon, 12 Mar 2018 13:06:29 +0000 (18:36 +0530)]
netfilter: Replace printk() with pr_*() and define pr_fmt()
Using pr_<loglevel>() is more concise than printk(KERN_<LOGLEVEL>).
This patch:
* Replace printks having a log level with the appropriate
pr_*() macros.
* Define pr_fmt() to include relevant name.
* Remove redundant prefixes from pr_*() calls.
* Indent the code where possible.
* Remove the useless output messages.
* Remove periods from messages.
Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jack Ma [Sun, 18 Mar 2018 20:41:59 +0000 (09:41 +1300)]
netfilter: xt_conntrack: Support bit-shifting for CONNMARK & MARK targets.
This patch introduces a new feature that allows bitshifting (left
and right) operations to co-operate with existing iptables options.
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jack Ma <jack.ma@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Taehee Yoo [Wed, 14 Mar 2018 14:36:53 +0000 (23:36 +0900)]
netfilter: ebtables: use ADD_COUNTER macro
xtables uses ADD_COUNTER macro to increase
packet and byte count. ebtables also can use this.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Gustavo A. R. Silva [Tue, 13 Mar 2018 03:16:17 +0000 (22:16 -0500)]
netfilter: nf_tables: remove VLA usage
In preparation to enabling -Wvla, remove VLA and replace it
with dynamic memory allocation.
>From a security viewpoint, the use of Variable Length Arrays can be
a vector for stack overflow attacks. Also, in general, as the code
evolves it is easy to lose track of how big a VLA can get. Thus, we
can end up having segfaults that are hard to debug.
Also, fixed as part of the directive to remove all VLAs from
the kernel: https://lkml.org/lkml/2018/3/7/621
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Gustavo A. R. Silva [Tue, 13 Mar 2018 00:21:38 +0000 (19:21 -0500)]
netfilter: nfnetlink_cthelper: Remove VLA usage
In preparation to enabling -Wvla, remove VLA and replace it
with dynamic memory allocation.
>From a security viewpoint, the use of Variable Length Arrays can be
a vector for stack overflow attacks. Also, in general, as the code
evolves it is easy to lose track of how big a VLA can get. Thus, we
can end up having segfaults that are hard to debug.
Also, fixed as part of the directive to remove all VLAs from
the kernel: https://lkml.org/lkml/2018/3/7/621
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Gustavo A. R. Silva [Mon, 12 Mar 2018 23:14:42 +0000 (18:14 -0500)]
netfilter: cttimeout: remove VLA usage
In preparation to enabling -Wvla, remove VLA and replace it
with dynamic memory allocation.
>From a security viewpoint, the use of Variable Length Arrays can be
a vector for stack overflow attacks. Also, in general, as the code
evolves it is easy to lose track of how big a VLA can get. Thus, we
can end up having segfaults that are hard to debug.
Also, fixed as part of the directive to remove all VLAs from
the kernel: https://lkml.org/lkml/2018/3/7/621
While at it, remove likely() notation which is not necessary from the
control plane code.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 9 Mar 2018 10:57:20 +0000 (11:57 +0100)]
netfilter: nft_ct: add NFT_CT_{SRC,DST}_{IP,IP6}
All existing keys, except the NFT_CT_SRC and NFT_CT_DST are assumed to
have strict datatypes. This is causing problems with sets and
concatenations given the specific length of these keys is not known.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
Yi-Hung Wei [Sun, 4 Mar 2018 23:29:52 +0000 (15:29 -0800)]
netfilter: conncount: Support count only use case
Currently, nf_conncount_count() counts the number of connections that
matches key and inserts a conntrack 'tuple' with the same key into the
accounting data structure. This patch supports another use case that only
counts the number of connections where 'tuple' is not provided. Therefore,
proper changes are made on nf_conncount_count() to support the case where
'tuple' is NULL. This could be useful for querying statistics or
debugging purpose.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Yi-Hung Wei [Sun, 4 Mar 2018 23:29:51 +0000 (15:29 -0800)]
netfilter: Refactor nf_conncount
Remove parameter 'family' in nf_conncount_count() and count_tree().
It is because the parameter is not useful after commit
625c556118f3
("netfilter: connlimit: split xt_connlimit into front and backend").
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Gustavo A. R. Silva [Mon, 5 Mar 2018 21:35:57 +0000 (15:35 -0600)]
ipvs: use true and false for boolean values
Assign true or false to boolean variables instead of an integer value.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 6 Mar 2018 07:26:00 +0000 (08:26 +0100)]
netfilter: x_tables: fix build with CONFIG_COMPAT=n
I placed the helpers within CONFIG_COMPAT section, move them
outside.
Fixes: 472ebdcd15ebdb ("netfilter: x_tables: check error target size too")
Fixes: 07a9da51b4b6ae ("netfilter: x_tables: check standard verdicts in core")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Geert Uytterhoeven [Fri, 2 Mar 2018 13:59:54 +0000 (14:59 +0100)]
netfilter: xt_limit: Spelling s/maxmum/maximum/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cong Wang [Fri, 2 Mar 2018 02:58:38 +0000 (18:58 -0800)]
netfilter: make xt_rateest hash table per net
As suggested by Eric, we need to make the xt_rateest
hash table and its lock per netns to reduce lock
contentions.
Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 27 Feb 2018 18:42:37 +0000 (19:42 +0100)]
netfilter: x_tables: ensure last rule in base chain matches underflow/policy
Harmless from kernel point of view, but again iptables assumes that
this is true when decoding ruleset coming from kernel.
If a (syzkaller generated) ruleset doesn't have the underflow/policy
stored as the last rule in the base chain, then iptables will abort()
because it doesn't find the chain policy.
libiptc assumes that the policy is the last rule in the basechain, which
is only true for iptables-generated rulesets.
Unfortunately this needs code duplication -- the functions need the
struct layout of the rule head, but that is different for
ip/ip6/arptables.
NB: pr_warn could be pr_debug but in case this break rulesets somehow its
useful to know why blob was rejected.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 27 Feb 2018 18:42:36 +0000 (19:42 +0100)]
netfilter: x_tables: make sure compat af mutex is held
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 27 Feb 2018 18:42:35 +0000 (19:42 +0100)]
netfilter: compat: reject huge allocation requests
no need to bother even trying to allocating huge compat offset arrays,
such ruleset is rejected later on anyway becaus we refuse to allocate
overly large rule blobs.
However, compat translation happens before blob allocation, so we should
add a check there too.
This is supposed to help with fuzzing by avoiding oom-killer.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 27 Feb 2018 18:42:34 +0000 (19:42 +0100)]
netfilter: compat: prepare xt_compat_init_offsets to return errors
should have no impact, function still always returns 0.
This patch is only to ease review.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 27 Feb 2018 18:42:33 +0000 (19:42 +0100)]
netfilter: x_tables: add counters allocation wrapper
allows to have size checks in a single spot.
This is supposed to reduce oom situations when fuzz-testing xtables.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 27 Feb 2018 18:42:32 +0000 (19:42 +0100)]
netfilter: x_tables: limit allocation requests for blob rule heads
This is a very conservative limit (
134217728 rules), but good
enough to not trigger frequent oom from syzkaller.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 27 Feb 2018 18:42:31 +0000 (19:42 +0100)]
netfilter: x_tables: cap allocations at 512 mbyte
Arbitrary limit, however, this still allows huge rulesets
(> 1 million rules). This helps with automated fuzzer as it prevents
oom-killer invocation.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 27 Feb 2018 18:42:30 +0000 (19:42 +0100)]
netfilter: x_tables: enforce unique and ascending entry points
Harmless from kernel point of view, but iptables assumes that this is
true when decoding a ruleset.
iptables walks the dumped blob from kernel, and, for each entry that
creates a new chain it prints out rule/chain information.
Base chains (hook entry points) are thus only shown when they appear
in the rule blob. One base chain that is referenced multiple times
in hook blob is then only printed once.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 27 Feb 2018 18:42:29 +0000 (19:42 +0100)]
netfilter: x_tables: move hook entry checks into core
Allow followup patch to change on location instead of three.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 27 Feb 2018 18:42:28 +0000 (19:42 +0100)]
netfilter: x_tables: check error target size too
Check that userspace ERROR target (custom user-defined chains) match
expected format, and the chain name is null terminated.
This is irrelevant for kernel, but iptables itself relies on sane input
when it dumps rules from kernel.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Tue, 27 Feb 2018 18:42:27 +0000 (19:42 +0100)]
netfilter: x_tables: check standard verdicts in core
Userspace must provide a valid verdict to the standard target.
The verdict can be either a jump (signed int > 0), or a return code.
Allowed return codes are either RETURN (pop from stack), NF_ACCEPT, DROP
and QUEUE (latter is allowed for legacy reasons).
Jump offsets (verdict > 0) are checked in more detail later on when
loop-detection is performed.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Xin Long [Fri, 16 Feb 2018 10:04:56 +0000 (18:04 +0800)]
netfilter: unlock xt_table earlier in __do_replace
Now it's doing cleanup_entry for oldinfo under the xt_table lock,
but it's not really necessary. After the replacement job is done
in xt_replace_table, oldinfo is not used elsewhere any more, and
it can be freed without xt_table lock safely.
The important thing is that rtnl_lock is called in some xt_target
destroy, which means rtnl_lock, a big lock is used in xt_table
lock, a smaller one. It usually could be the reason why a dead
lock may happen.
Besides, all xt_target/match checkentry is called out of xt_table
lock. It's better also to move all cleanup_entry calling out of
xt_table lock, just as do_replace_finish does for ebtables.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Gustavo A. R. Silva [Tue, 13 Feb 2018 14:25:57 +0000 (08:25 -0600)]
netfilter: ipt_ah: return boolean instead of integer
Return statements in functions returning bool should use
true/false instead of 1/0.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Taehee Yoo [Sun, 11 Feb 2018 14:28:18 +0000 (23:28 +0900)]
netfilter: nf_conntrack_broadcast: remove useless parameter
parameter protoff in nf_conntrack_broadcast_help is not used anywhere.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Taehee Yoo [Sun, 11 Feb 2018 13:57:29 +0000 (22:57 +0900)]
netfilter: xt_cluster: get rid of xt_cluster_ipv6_is_multicast
If use the ipv6_addr_is_multicast instead of xt_cluster_ipv6_is_multicast,
then we can reduce code size.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Taehee Yoo [Sun, 11 Feb 2018 10:17:20 +0000 (19:17 +0900)]
netfilter: nfnetlink_acct: remove useless parameter
parameter skb in nfnl_acct_overquota is not used anywhere.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
kbuild test robot [Fri, 19 Jan 2018 20:27:58 +0000 (04:27 +0800)]
netfilter: nf_tables: nf_tables_obj_lookup_byhandle() can be static
Fixes: 3ecbfd65f50e ("netfilter: nf_tables: allocate handle and delete objects via handle")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Mon, 5 Mar 2018 17:55:55 +0000 (12:55 -0500)]
Merge branch 'mvpp2-jumbo-frames-support'
Antoine Tenart says:
====================
net: mvpp2: jumbo frames support
This series enable jumbo frames support in the Marvell PPv2 driver. The
first 2 patches rework the buffer management, then two patches prepare for
the final patch which adds the jumbo frames support into the driver.
This is based on top of net-next, and was tested on a mcbin.
Thanks!
Antoine
Since v1:
- Improved the Tx FIFO initialization comment.
- Improved the pool sanity check in mvpp2_bm_pool_use().
- Fixed pool related comments.
- Cosmetic fixes (used BIT() whenever possible).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Chulski [Mon, 5 Mar 2018 14:16:54 +0000 (15:16 +0100)]
net: mvpp2: jumbo frames support
This patch adds the support for jumbo frames in the Marvell PPv2 driver.
A third buffer pool is added with 10KB buffers, which is used if the MTU
is higher than 1518B for packets larger than 1518B. Please note only the
port 0 supports hardware checksum offload due to the Tx FIFO size
limitation.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Mon, 5 Mar 2018 14:16:53 +0000 (15:16 +0100)]
net: mvpp2: enable UDP/TCP checksum over IPv6
This patch adds the NETIF_F_IPV6_CSUM to the driver's features to enable
UDP/TCP checksum over IPv6. No extra configuration of the engine is
needed on top of the IPv4 counterpart, which already is in the features
list (NETIF_F_IP_CSUM).
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yan Markman [Mon, 5 Mar 2018 14:16:52 +0000 (15:16 +0100)]
net: mvpp2: use a data size of 10kB for Tx FIFO on port 0
This patch sets the Tx FIFO data size on port 0 to 10kB. This prepares
the PPv2 driver for the Jumbo frame support addition as the hardware
will need big enough Tx FIFO buffers when dealing with frames going
through an interface with an MTU of 9000.
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: commit message, small reworks.]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Chulski [Mon, 5 Mar 2018 14:16:51 +0000 (15:16 +0100)]
net: mvpp2: update the BM buffer free/destroy logic
The buffer free routine is updated to release only given a number of
buffers, and the destroy routine now checks the actual number of buffers
in the (BPPI and BPPE) HW counters before draining the pools. This
change helps getting jumbo frames support.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Chulski [Mon, 5 Mar 2018 14:16:50 +0000 (15:16 +0100)]
net: mvpp2: use the same buffer pool for all ports
This patch configures the buffer manager long pool for all ports part of
the same CP. Long pool separation between ports is redundant since there
are no performance improvement when different pools are used.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathan Neuschäfer [Sun, 4 Mar 2018 02:29:53 +0000 (03:29 +0100)]
net: core: dst: Add kernel-doc for 'net' parameter
This fixes the following kernel-doc warning:
./include/net/dst.h:366: warning: Function parameter or member 'net' not described in 'skb_tunnel_rx'
Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathan Neuschäfer [Sun, 4 Mar 2018 02:29:52 +0000 (03:29 +0100)]
net: core: dst_cache_set_ip6: Rename 'addr' parameter to 'saddr' for consistency
The other dst_cache_{get,set}_ip{4,6} functions, and the doc comment for
dst_cache_set_ip6 use 'saddr' for their source address parameter. Rename
the parameter to increase consistency.
This fixes the following kernel-doc warnings:
./include/net/dst_cache.h:58: warning: Function parameter or member 'addr' not described in 'dst_cache_set_ip6'
./include/net/dst_cache.h:58: warning: Excess function parameter 'saddr' description in 'dst_cache_set_ip6'
Fixes: 911362c70df5 ("net: add dst_cache support")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jonathan Neuschäfer [Sun, 4 Mar 2018 02:29:51 +0000 (03:29 +0100)]
net: core: dst_cache: Fix a typo in a comment
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 5 Mar 2018 15:48:29 +0000 (10:48 -0500)]
Merge branch 'convert-pernet_operations-part4'
Kirill Tkhai says:
====================
Converting pernet_operations (part #4)
this series continues to review and to convert pernet_operations
to make them possible to be executed in parallel for several
net namespaces in the same time. The patches touch mostly netfilter,
also there are small number of changes in other places.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:32:23 +0000 (14:32 +0300)]
net: Convert proto_gre_net_ops
These pernet_operations register and unregister sysctl.
nf_conntrack_l4proto_gre4->init_net is simple memory
initializer. Also, exit method removes gre keymap_list,
which is per-net. This looks safe to be executed
in parallel with other pernet_operations.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:32:15 +0000 (14:32 +0300)]
net: Convert ctnetlink_net_ops
These pernet_operations register and unregister
two conntrack notifiers, and they seem to be safe
to be executed in parallel.
General/not related to async pernet_operations JFI:
ctnetlink_net_exit_batch() actions are grouped in batch,
and this could look like there is synchronize_rcu()
is forgotten. But there is synchronize_rcu() on module
exit patch (in ctnetlink_exit()), so this batch may
be reworked as simple .exit method.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:32:06 +0000 (14:32 +0300)]
net: Convert nf_conntrack_net_ops
These pernet_operations register and unregister sysctl and /proc
entries. Exit batch method also waits till all per-net conntracks
are dead. Thus, they are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:55 +0000 (14:31 +0300)]
net: Convert ip_set_net_ops
These pernet_operations initialize and destroy
net_generic(net, ip_set_net_id)-related data.
Since ip_set is under CONFIG_IP_SET, it's easy
to watch drivers, which depend on this config.
All of them are in net/netfilter/ipset directory,
except of net/netfilter/xt_set.c. There are no
more drivers, which use ip_set, and all of
the above don't register another pernet_operations.
Also, there are is no indirect users, as header
file include/linux/netfilter/ipset/ip_set.h does
not define indirect users by something like this:
#ifdef CONFIG_IP_SET
extern func(void);
#else
static inline func(void);
#endif
So, there are no more pernet operations, dereferencing
net_generic(net, ip_set_net_id).
ip_set_net_ops are OK to be executed in parallel
for several net, so we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:47 +0000 (14:31 +0300)]
net: Convert fou_net_ops
These pernet_operations initialize and destroy
pernet net_generic(net, fou_net_id) list.
The rest of net_generic(net, fou_net_id) accesses
may happen after netlink message, and in-tree
pernet_operations do not send FOU_GENL_NAME messages.
So, these pernet_operations are safe to be marked
as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:37 +0000 (14:31 +0300)]
net: Convert dccp_v6_ops
These pernet_operations looks similar to dccp_v4_ops,
and they are also safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:28 +0000 (14:31 +0300)]
net: Convert dccp_v4_ops
These pernet_operations create and destroy net::dccp::v4_ctl_sk.
It looks like another pernet_operations don't want to send
dccp packets to dying or creating net. Batch method similar
to ipv4/ipv6 sockets and it has to be safe to be executed
in parallel with anything else. So, we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:19 +0000 (14:31 +0300)]
net: Convert cangw_pernet_ops
These pernet_operations have a deal with cgw_list,
and the rest of accesses are made under rtnl_lock().
The only exception is cgw_dump_jobs(), which is
accessed under rcu_read_lock(). cgw_dump_jobs() is
called on netlink request, and it does not seem,
foreign pernet_operations want to send a net such
the messages. So, we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:10 +0000 (14:31 +0300)]
net: Convert caif_net_ops
Init method just allocates memory for new cfg, and
assigns net_generic(net, caif_net_id). Despite there is
synchronize_rcu() on error path in cfcnfg_create(),
in real this function does not use global lists,
so it looks like this synchronize_rcu() is some legacy
inheritance. Exit method removes caif devices under
rtnl_lock().
There could be a problem, if someone from foreign net
pernet_operations dereference caif_net_id of this net.
It's dereferenced in get_cfcnfg() and caif_device_list().
get_cfcnfg() is used from netdevice notifiers, where
they are called under rtnl_lock(). The notifiers can't
be called from foreign nets pernet_operations. Also,
it's used from caif_disconnect_client() and from
caif_connect_client(). The both of the functions work
with caif socket, and there is the only possibility
to have a socket, when the net is dead. This may happen
only of the socket was created as kern using sk_alloc().
Grep by PF_CAIF shows we do not create kern caif sockets,
so get_cfcnfg() is safe.
caif_device_list() is used in netdevice notifiers and exit
method under rtnl lock. Also, from caif_get() used in
the netdev notifiers and in caif_flow_cb(). The last item
is skb destructor. Since there are no kernel caif sockets
nobody can send net a packet in parallel with init/exit,
so this is also safe.
So, these pernet_operations are safe to be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:00 +0000 (14:31 +0300)]
net: Convert arp_tables_net_ops and ip6_tables_net_ops
These pernet_operations call xt_proto_init() and xt_proto_fini(),
which just register and unregister /proc entries.
They are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:30:50 +0000 (14:30 +0300)]
net: Convert log pernet_operations
These pernet_operations use nf_log_set() and nf_log_unset()
in their methods:
nf_log_bridge_net_ops
nf_log_arp_net_ops
nf_log_ipv4_net_ops
nf_log_ipv6_net_ops
nf_log_netdev_net_ops
Nobody can send such a packet to a net before it's became
registered, nobody can send a packet after all netdevices
are unregistered. So, these pernet_operations are able
to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:30:41 +0000 (14:30 +0300)]
net: Convert broute_net_ops, frame_filter_net_ops and frame_nat_net_ops
These pernet_operations use ebt_register_table() and
ebt_unregister_table() to act on the tables, which
are used as argument in ebt_do_table(), called from
ebtables hooks.
Since there are no net-related bridge packets in-flight,
when the init and exit methods are called, these
pernet_operations are safe to be executed in parallel
with any other.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 5 Mar 2018 01:37:47 +0000 (17:37 -0800)]
selftests: forwarding: Add suppport to create veth interfaces
For tests using veth interfaces, the test infrastructure can create
the netdevs if they do not exist. Arguably this is a preferred approach
since the tests require p$N and p$(N+1) to be pairs.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Samuel Mendoza-Jonas [Mon, 5 Mar 2018 00:39:05 +0000 (11:39 +1100)]
net/ncsi: Add generic netlink family
Add a generic netlink family for NCSI. This supports three commands;
NCSI_CMD_PKG_INFO which returns information on packages and their
associated channels, NCSI_CMD_SET_INTERFACE which allows a specific
package or package/channel combination to be set as the preferred
choice, and NCSI_CMD_CLEAR_INTERFACE which clears any preferred setting.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Priyaranjan Jha [Sun, 4 Mar 2018 18:38:36 +0000 (10:38 -0800)]
tcp: add ca_state stat in SCM_TIMESTAMPING_OPT_STATS
This patch adds TCP_NLA_CA_STATE stat into SCM_TIMESTAMPING_OPT_STATS.
It reports ca_state of socket, when timestamp is generated.
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Priyaranjan Jha [Sun, 4 Mar 2018 18:38:35 +0000 (10:38 -0800)]
tcp: add send queue size stat in SCM_TIMESTAMPING_OPT_STATS
This patch adds TCP_NLA_SENDQ_SIZE stat into SCM_TIMESTAMPING_OPT_STATS.
It reports no. of bytes present in send queue, when timestamp is
generated.
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arkadi Sharshevsky [Sun, 4 Mar 2018 14:35:26 +0000 (16:35 +0200)]
selftests: Extend the tc action test for action mirror
Currently the tc action test is used only to test mirred redirect
action. This patch extends it for mirred mirror.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 4 Mar 2018 12:12:04 +0000 (14:12 +0200)]
net: Make RX-FCS and LRO mutually exclusive
LRO and RX-FCS offloads cannot be enabled at the same time since it is
not clear what should happen to the FCS of each coalesced packet.
The FCS is not really part of the TCP payload, hence cannot be merged
into one big packet. On the other hand, providing one big LRO packet
with one FCS contradicts the RX-FCS feature goal.
Use the fix features mechanism in order to prevent intersection of the
features and drop LRO in case RX-FCS is requested.
Enabling RX-FCS while LRO is enabled will result in:
$ ethtool -K ens6 rx-fcs on
Actual changes:
large-receive-offload: off [requested on]
rx-fcs: on
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Intiyaz Basha [Sat, 3 Mar 2018 02:29:04 +0000 (18:29 -0800)]
liquidio: Corrected Rx bytes counting
Corrected stats mismatch between Host Tx and its peer Rx stats
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Mashak [Sat, 3 Mar 2018 01:52:01 +0000 (20:52 -0500)]
net sched actions: corrected extack message
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 23:45:39 +0000 (18:45 -0500)]
Merge tag 'batadv-next-for-davem-
20180302' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- bump copyright years, by Sven Eckelmann
- fix macro indendation for checkpatch, by Sven Eckelmann
- fix comparison operator for bool returning functions,
by Sven Eckelmann
- assume 2-byte packet alignments for all packet types,
by Matthias Schiffer
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 2 Mar 2018 15:03:32 +0000 (16:03 +0100)]
ipvlan: forbid vlan devices on top of ipvlan
Currently we allow the creation of 8021q devices on top of
ipvlan, but such devices are nonfunctional, as the underlying
ipvlan rx_hanlder hook can't match the relevant traffic.
Be explicit and forbid the creation of such nonfunctional devices.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 2 Mar 2018 09:29:14 +0000 (17:29 +0800)]
virtio-net: re enable XDP_REDIRECT for mergeable buffer
XDP_REDIRECT support for mergeable buffer was removed since commit
7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. So this patch fixes this
by reserving enough tailroom and using fixed size of rx buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Bhole [Fri, 2 Mar 2018 02:22:20 +0000 (11:22 +0900)]
selftests: rtnetlink: remove testns on test fail
This patch removes testns after test failure so that next test can
continue with clean ns
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 23:35:02 +0000 (18:35 -0500)]
Merge branch 'gre-seq-collect_md'
William Tu says:
====================
gre: add sequence number for collect md mode.
Currently GRE sequence number can only be used in native tunnel mode.
The first patch adds sequence number support for gre collect
metadata mode, and the second patch tests it using BPF.
RFC2890 defines GRE sequence number to be specific to the traffic
flow identified by the key. However, this patch does not implement
per-key seqno. The sequence number is shared in the same tunnel
device. That is, different tunnel keys using the same collect_md
tunnel share single sequence number.
A new BFP uapi tunnel flag 'BPF_F_SEQ_NUMBER' is added.
--
v1->v2:
rename BPF_F_GRE_SEQ to BPF_F_SEQ_NUMBER suggested by Daniel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Thu, 1 Mar 2018 21:49:58 +0000 (13:49 -0800)]
samples/bpf: add gre sequence number test.
The patch adds tests for GRE sequence number
support for metadata mode tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Thu, 1 Mar 2018 21:49:57 +0000 (13:49 -0800)]
gre: add sequence number for collect md mode.
Currently GRE sequence number can only be used in native
tunnel mode. This patch adds sequence number support for
gre collect metadata mode. RFC2890 defines GRE sequence
number to be specific to the traffic flow identified by the
key. However, this patch does not implement per-key seqno.
The sequence number is shared in the same tunnel device.
That is, different tunnel keys using the same collect_md
tunnel share single sequence number.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 23:19:26 +0000 (18:19 -0500)]
Merge branch 'enic-update'
Govindarajulu Varadarajan says:
====================
enic update
This series adds support for IPv6 vxlan offload and UDP rss along with a
bug fix in filling the rq ring.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:24 +0000 (11:07 -0800)]
enic: set IG desc cache flag in open
New adapter needs CMD_OPENF_IG_DESCCACHE flag to be set. If this flag is
not set, fw flushes the global IG desc cache. This flag is nop in older
adapter.
Also increment driver version
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:23 +0000 (11:07 -0800)]
enic: enable rq before updating rq descriptors
rq should be enabled before posting the buffers to rq desc. If not hw sees
stale value and casuses DMAR errors.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:22 +0000 (11:07 -0800)]
enic: set UDP rss flag
New hardware needs UDP flag set to enable UDP L4 rss hash. Add ethtool
get option to display supported rss flow hash.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:21 +0000 (11:07 -0800)]
enic: Check if hw supports multi wq with vxlan offload
Some adaptors do not support vxlan offload when multi wq is configured.
If hw supports multi wq, BIT(2) is set in a1.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:20 +0000 (11:07 -0800)]
enic: Add vxlan offload support for IPv6 pkts
New adaptors supports vxlan offload for inner IPv6 and outer IPv6 vxlan
pkts.
Fw sets BIT(0) & BIT(1) in a1 if hw supports ipv6 inner & outer pkt
offload.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan [Thu, 1 Mar 2018 19:07:19 +0000 (11:07 -0800)]
enic: Check inner ip proto for pseudo header csum
To compute pseudo IP header csum, we need to check the inner header for
encap pkt, not outer IP header.
Also add pseudo csum for IPv6 inner pkt.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 1 Mar 2018 16:42:40 +0000 (16:42 +0000)]
net: amd8111e: remove redundant assignment to 'tx_index'
The variable tx_index is being initialized with a value that is never
read and re-assigned a little later, hence the initialization is redundant
and can be removed.
Cleans up clang warning:
drivers/net/ethernet/amd/amd8111e.c:652:6: warning: Value stored to
'tx_index' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Thu, 1 Mar 2018 11:27:35 +0000 (13:27 +0200)]
r8169: switch to device-managed functions in probe (part 2)
This is a follow up to the commit
4c45d24a759d ("r8169: switch to device-managed functions in probe")
to move towards managed resources even more.
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Thu, 1 Mar 2018 11:27:34 +0000 (13:27 +0200)]
r8169: Dereference MMIO address immediately before use
There is no need to dereference struct rtl8169_private to get mmio_addr
in almost every function in the driver.
Replace it by using pointer to struct rtl8169_private directly.
No functional change intended.
Next step might be a conversion of RTL_Wxx() / RTL_Rxx() macros
to inline functions for sake of type checking.
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 1 Mar 2018 10:23:03 +0000 (10:23 +0000)]
net: phy: Fix spelling mistake: "advertisment"-> "advertisement"
Trivial fix to spelling mistake in comments and error message text.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Thu, 1 Mar 2018 09:31:04 +0000 (15:01 +0530)]
cxgb4vf: Forcefully link up virtual interfaces
The Virtual Interfaces are connected to an internal switch on the chip
which allows VIs attached to the same port to talk to each other even
when the port link is down. As a result, we generally want to always
report a VI's link as being "up".
Based on the original work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 18:34:19 +0000 (13:34 -0500)]
Merge branch 'dsa-serdes-stats'
Andrew Lunn says:
====================
Export SERDES stats via ethtool -S
The mv88e6352 family has a SERDES interface which can be used for
example to connect to SFF/SFP modules. This interface has a couple of
statistics counters. Add support for including these counters in the
output of ethtool -S.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 1 Mar 2018 01:02:31 +0000 (02:02 +0100)]
net: dsa: mv88e6xxx: Get mv88e6352 SERDES statistics
Add support for reading the SERDES statistics of the mv88e8352, using
the standard ethtool -S option. The SERDES interface can be mapped to
either port 4 or 5, so only return statistics on those ports, if the
SERDES interface is in use.
The counters are reset on read, so need to be accumulated. Add a per
port structure to hold the stats counters. The 6352 only has a single
SERDES interface and so only one port will using the newly added
array. However the 6390 family has as many SERDES interfaces as ports,
each with statistics counters. Also, PTP has a number of counters per
port which will also need accumulating.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 1 Mar 2018 01:02:30 +0000 (02:02 +0100)]
net: dsa: mv88e6xxx: Add helper to determining if port has SERDES
Refactor the existing code. This helper will be used for SERDES
statistics.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 1 Mar 2018 01:02:29 +0000 (02:02 +0100)]
net: dsa: mv88e6xxx: Allow the SERDES interfaces to have statistics
When gettting the number of statistics, the strings and the actual
statistics, call the SERDES ops if implemented. This means the stats
code needs to return the number of strings/stats they have placed into
the data, so that the SERDES strings/stats can follow on.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 1 Mar 2018 01:02:28 +0000 (02:02 +0100)]
net: dsa: mv88e6xxx: Hold mutex while doing stats operations
Until now, there has been no need to hold the reg mutex while getting
the count of statistics, or the strings, because the hardware was not
accessed. When adding support for SERDES statistics, it is necessary
to access the hardware, to determine if a port is using the SERDES
interface. So add mutex lock/unlocks.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 1 Mar 2018 01:02:27 +0000 (02:02 +0100)]
dsa: Pass the port to get_sset_count()
By passing the port, we allow different ports to have different
statistics. This is useful since some ports have SERDES interfaces
with their own statistic counters.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Brenda J. Butler [Wed, 28 Feb 2018 20:36:19 +0000 (15:36 -0500)]
tools: tc-testing: Add notap option
Add a command line arg to suppress tap output. Handy in case
all the tap output is being supplied by the plugins.
Signed-off-by: Brenda J. Butler <bjb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 18:04:24 +0000 (13:04 -0500)]
Merge branch 'net-ipv6-Add-support-for-path-selection-using-hash-of-5-tuple'
David Ahern says:
====================
net/ipv6: Add support for path selection using hash of 5-tuple
Hardware supports multipath selection using the standard L4 5-tuple
instead of just L3 and the flow label. In addition, some network
operators prefer IPv6 path selection to use the 5-tuple. To that end,
add support to IPv6 for multipath hash policy similar to
bf4e0a3db97eb ("net: ipv4: add support for ECMP hash policy choice").
The default is still L3 which covers source and destination addresses
along with flow label and IPv6 protocol. This gives users a choice in
hash algorithms if they believe L3 only and the IPv6 flow label are not
sufficient for their use case.
A separate sysctl is added for IPv6, allowing IPv4 and IPv6 to use
different algorithms if desired.
The first 3 patches modify the IPv4 variant so that at the end of the
patch set the ipv4 and ipv6 implementations are direct parallels.
Patch 4 refactors the existing rt6_multipath_hash in preparation for
adding the policy option.
Patch 5 renames the existing netevent to have IPv4 in the name so ipv4
changes can be distinguished from IPv6 if the netevent handler cares.
Patch 6 adds the skb as an argument through the FIB lookup functions
to the multipath selection. Needed for the forwarding case.
Patch 7 adds the L4 hash support.
Patch 8 adds the hook for the netevent to the spectrum driver to update
the ASIC.
Patch 9 removes no longer used code.
Patch 10 adds a testcase for IPv6 multipath with L4 hash.
v3
- comments from Ido:
- removed fib_info arg in patch 1; left by mistake on rebase to net-next
- removed __get_hash_from_flowi4 declaration
- line wrap change to spectrum_router.c to maintain 80 chars
v2
- rebased to top of tree
- added refactor of fib_multipath_hash following recent change
- plumb skb through lookup functions to multipath selection
- fix sysctl setting; was missing the data set in ipv6_sysctl_net_init
- added test case
RFC to v1:
- rebase to top of net-next
- fix addr_type in hash_keys and removed flow label as noticed by Ido
- added a comment to cover letter about choice in algorithms based on
use case per Or's comments
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:21 +0000 (08:32 -0800)]
selftests: forwarding: Add multipath test for L4 hashing
Add IPv6 multipath test using L4 hashing. Created with inputs from
Ido Schimmel.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:20 +0000 (08:32 -0800)]
net: Remove unused get_hash_from_flow functions
__get_hash_from_flowi6 is still used for flowlabels, but the IPv4
variant and the wrappers to both are not used. Remove them.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:19 +0000 (08:32 -0800)]
mlxsw: spectrum_router: Add support for ipv6 hash policy update
Similar to
28678f07f127d ("mlxsw: spectrum_router: Update multipath hash
parameters upon netevents") for IPv4, make sure the kernel and asic are
using the same hash algorithm for path selection.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:18 +0000 (08:32 -0800)]
net/ipv6: Add support for path selection using hash of 5-tuple
Some operators prefer IPv6 path selection to use a standard 5-tuple
hash rather than just an L3 hash with the flow the label. To that end
add support to IPv6 for multipath hash policy similar to
bf4e0a3db97eb
("net: ipv4: add support for ECMP hash policy choice"). The default
is still L3 which covers source and destination addresses along with
flow label and IPv6 protocol.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 2 Mar 2018 16:32:17 +0000 (08:32 -0800)]
net/ipv6: Pass skb to route lookup
IPv6 does path selection for multipath routes deep in the lookup
functions. The next patch adds L4 hash option and needs the skb
for the forward path. To get the skb to the relevant FIB lookup
functions it needs to go through the fib rules layer, so add a
lookup_data argument to the fib_lookup_arg struct.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>