openwrt/staging/blogic.git
7 years agoMerge branch 'dsa-ACB-for-bcm_sf2-and-bcmsysport'
David S. Miller [Thu, 12 Oct 2017 19:10:02 +0000 (12:10 -0700)]
Merge branch 'dsa-ACB-for-bcm_sf2-and-bcmsysport'

Florian Fainelli says:

====================
Enable ACB for bcm_sf2 and bcmsysport

This patch series enables Broadcom's Advanced Congestion Buffering mechanism
which requires cooperation between the CPU/Management Ethernet MAC controller
and the switch.

I took the notifier approach because ultimately the information we need to
carry to the master network device is DSA specific and I saw little room for
generalizing beyond what DSA requires. Chances are that this is highly specific
to the Broadcom HW as I don't know of any HW out there that supports something
nearly similar for similar or identical needs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: systemport: Turn on ACB at the SYSTEMPORT level
Florian Fainelli [Wed, 11 Oct 2017 17:57:52 +0000 (10:57 -0700)]
net: systemport: Turn on ACB at the SYSTEMPORT level

Now that we have established the queue mapping between the switch port
egress queues and the SYSTEMPORT egress queues, we can turn on Advanced
Congestion Buffering (ACB) at the SYSTEMPORT level. This enables the
Ethernet MAC controller to get out of band flow control information
directly from the switch port and queue that it monitors such that its
internal TDMA can be appropriately backpressured.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: bcm_sf2: Turn on ACB at the switch level
Florian Fainelli [Wed, 11 Oct 2017 17:57:51 +0000 (10:57 -0700)]
net: dsa: bcm_sf2: Turn on ACB at the switch level

Turn on the out of band Advanced Congestion Buffering (ACB) mechanism at
the switch level now that we have properly established the queue mapping
between the switch egress queues and the SYSTEMPORT egress queues. This
allows the switch to correctly backpressure the host system when one of
its queue drops below the configured thresholds.

This is also helping achieve so called "lossless" behavior by adapting
the TX interrupt pacing to the actual speed and capacity of the switch
port.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: systemport: Establish lower/upper queue mapping
Florian Fainelli [Wed, 11 Oct 2017 17:57:50 +0000 (10:57 -0700)]
net: systemport: Establish lower/upper queue mapping

Establish a queue mapping between the DSA slave network device queues
created that correspond to switch port queues, and the transmit queue
that SYSTEMPORT manages.

We need to configure the SYSTEMPORT transmit queue with the switch port number
and switch port queue number in order for the switch and SYSTEMPORT hardware to
utilize the out of band congestion notification. This hardware mechanism works
by looking at the switch port egress queue and determines whether there is
enough buffers for this queue, with that class of service for a successful
transmission and if not, backpressures the SYSTEMPORT queue that is being used.

For this to work, we implement a notifier which looks at the
DSA_PORT_REGISTER event.  When DSA network devices are registered, the
framework calls the DSA notifiers when that happens, extracts the number
of queues for these devices and their associated port number, remembers
that in the driver private structure and linearly maps those queues to
TX rings/queues that we manage.

This scheme works because DSA slave network deviecs always transmit
through SYSTEMPORT so when DSA slave network devices are
destroyed/brought down, the corresponding SYSTEMPORT queues are no
longer used. Also, by design of the DSA framework, the master network
device (SYSTEMPORT) is registered first.

For faster lookups we use an array of up to DSA_MAX_PORTS * number of
queues per port, and then map pointers to bcm_sysport_tx_ring such that
our ndo_select_queue() implementation can just index into that array to
locate the corresponding ring index.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: tag_brcm: Indicate to master netdevice port + queue
Florian Fainelli [Wed, 11 Oct 2017 17:57:49 +0000 (10:57 -0700)]
net: dsa: tag_brcm: Indicate to master netdevice port + queue

We need to tell the DSA master network device doing the actual
transmission what the desired switch port and queue number is for it to
resolve that to the internal transmit queue it is mapped to.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: Add support for DSA specific notifiers
Florian Fainelli [Wed, 11 Oct 2017 17:57:48 +0000 (10:57 -0700)]
net: dsa: Add support for DSA specific notifiers

In preparation for communicating a given DSA network device's port
number and switch index, create a specialized DSA notifier and two
events: DSA_PORT_REGISTER and DSA_PORT_UNREGISTER that communicate: the
slave network device (slave_dev), port number and switch number in the
tree.

This will be later used for network device drivers like bcmsysport which
needs to cooperate with its DSA network devices to set-up queue mapping
and scheduling.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoRevert "net: qcom/emac: enforce DMA address restrictions"
Timur Tabi [Thu, 12 Oct 2017 17:42:04 +0000 (12:42 -0500)]
Revert "net: qcom/emac: enforce DMA address restrictions"

This reverts commit df1ec1b9d0df57e96011f175418dc95b1af46821.

It turns out that memory allocated via dma_alloc_coherent is always
aligned to the size of the buffer, so there's no way the RRD and RFD
can ever be in separate 32-bit regions.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: remove obsolete helpers
Eric Dumazet [Thu, 12 Oct 2017 03:45:40 +0000 (20:45 -0700)]
tcp: remove obsolete helpers

Remove three inline helpers that are no longer needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: remove redundant variable old_flags
Colin Ian King [Wed, 11 Oct 2017 10:56:23 +0000 (11:56 +0100)]
bpf: remove redundant variable old_flags

Variable old_flags is being assigned but is never read; it is redundant
and can be removed.

Cleans up clang warning: Value stored to 'old_flags' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlx4-XDP-TX-improvements'
David S. Miller [Thu, 12 Oct 2017 03:21:23 +0000 (20:21 -0700)]
Merge branch 'mlx4-XDP-TX-improvements'

Tariq Toukan says:

====================
mlx4_en XDP TX improvements

This patchset contains performance improvements
to the XDP_TX use case in the mlx4 Eth driver.

Patch 1 is a simple change in a function parameter type.
Patch 2 replaces a call to a generic function with the
  relevant parts inlined.
Patch 3 moves the write of descriptors' constant values
  from data path to control path.

Series generated against net-next commit:
833e0e2f24fd net: dst: move cpu inside ifdef to avoid compilation warning
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: XDP_TX, assign constant values of TX descs on ring creaion
Tariq Toukan [Wed, 11 Oct 2017 10:17:27 +0000 (13:17 +0300)]
net/mlx4_en: XDP_TX, assign constant values of TX descs on ring creaion

In XDP_TX, some fields in tx_info and tx_desc are constants across
all entries of the different XDP_TX rings.
Assign values to these fields on ring creation time, rather than in
data-path.

Patchset performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.

XDP_TX packet rate:
------------------------------
Before    | After     | Gain |
13.7 Mpps | 14.0 Mpps | %2.2 |
------------------------------

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Obsolete call to generic write_desc in XDP xmit flow
Tariq Toukan [Wed, 11 Oct 2017 10:17:26 +0000 (13:17 +0300)]
net/mlx4_en: Obsolete call to generic write_desc in XDP xmit flow

Function mlx4_en_tx_write_desc() is not optimized to use of XDP xmit.
Use the relevant parts inline instead.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Replace netdev parameter with priv in XDP xmit function
Tariq Toukan [Wed, 11 Oct 2017 10:17:25 +0000 (13:17 +0300)]
net/mlx4_en: Replace netdev parameter with priv in XDP xmit function

The struct net_device parameter was passed only to extract
struct mlx4_en_priv out of it.
Here we pass the priv parameter directly.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mpls: make function ipgre_mpls_encap_hlen static
Colin Ian King [Wed, 11 Oct 2017 09:53:28 +0000 (10:53 +0100)]
net: mpls: make function ipgre_mpls_encap_hlen static

The function ipgre_mpls_encap_hlen is local to the source and
does not need to be in global scope, so make it static.

Cleans up sparse warning:
symbol 'ipgre_mpls_encap_hlen' was not declared. Should it be static?

Fixes: bdc476413dcdb ("ip_tunnel: add mpls over gre support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: make array sctp_sched_ops static
Colin Ian King [Wed, 11 Oct 2017 10:17:57 +0000 (11:17 +0100)]
sctp: make array sctp_sched_ops static

The array sctp_sched_ops  is local to the source and
does not need to be in global scope, so make it static.

Cleans up sparse warning:
symbol 'sctp_sched_ops' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: addrconf: don't use rtnl mutex in RTM_GETADDR
Florian Westphal [Wed, 11 Oct 2017 08:28:01 +0000 (10:28 +0200)]
ipv6: addrconf: don't use rtnl mutex in RTM_GETADDR

Similar to the previous patch, use the device lookup functions
that bump device refcount and flag this as DOIT_UNLOCKED to avoid
rtnl mutex.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: addrconf: don't use rtnl mutex in RTM_GETNETCONF
Florian Westphal [Wed, 11 Oct 2017 08:28:00 +0000 (10:28 +0200)]
ipv6: addrconf: don't use rtnl mutex in RTM_GETNETCONF

Instead of relying on rtnl mutex bump device reference count.
After this change, values reported can change in parallel, but thats not
much different from current state, as anyone can change the settings
right after rtnl_unlock (and before userspace processed reply).

While at it, switch to GFP_KERNEL allocation.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'net-sched-get-rid-of-cls_flower-egress_dev'
David S. Miller [Thu, 12 Oct 2017 03:15:43 +0000 (20:15 -0700)]
Merge branch 'net-sched-get-rid-of-cls_flower-egress_dev'

Jiri Pirko says:

====================
net: sched: get rid of cls_flower->egress_dev

Introduction of cls_flower->egress_dev was a workaround. Turned out
to be a bit ugly hack. So replace it with more generic and reusable
infrastructure.

This is a dependency of shared block introduction that will be send as
a follow-up patchsets group.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: remove unused tcf_exts_get_dev helper and cls_flower->egress_dev
Jiri Pirko [Wed, 11 Oct 2017 07:41:10 +0000 (09:41 +0200)]
net: sched: remove unused tcf_exts_get_dev helper and cls_flower->egress_dev

The helper and the struct field ares no longer used by any code,
so remove them.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: convert cls_flower->egress_dev users to tc_setup_cb_egdev infra
Jiri Pirko [Wed, 11 Oct 2017 07:41:09 +0000 (09:41 +0200)]
net: sched: convert cls_flower->egress_dev users to tc_setup_cb_egdev infra

The only user of cls_flower->egress_dev is mlx5. So do the conversion
there alongside with the code originating the call in cls_flower
function fl_hw_replace_filter to the newly introduced egress device
callback infrastucture.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: introduce per-egress action device callbacks
Jiri Pirko [Wed, 11 Oct 2017 07:41:08 +0000 (09:41 +0200)]
net: sched: introduce per-egress action device callbacks

Introduce infrastructure that allows drivers to register callbacks that
are called whenever tc would offload inserted rule and specified device
acts as tc action egress device.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: sched: make tc_action_ops->get_dev return dev and avoid passing net
Jiri Pirko [Wed, 11 Oct 2017 07:41:07 +0000 (09:41 +0200)]
net: sched: make tc_action_ops->get_dev return dev and avoid passing net

Return dev directly, NULL if not possible. That is enough.

Makes no sense to pass struct net * to get_dev op, as there is only one
net possible, the one the action was created in. So just store it in
mirred priv and use directly.

Rename the mirred op callback function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'rmnet-Rewrite-some-existing-functionality'
David S. Miller [Thu, 12 Oct 2017 03:05:30 +0000 (20:05 -0700)]
Merge branch 'rmnet-Rewrite-some-existing-functionality'

Subash Abhinov Kasiviswanathan says:

====================
net: qualcomm: rmnet: Rewrite some existing functionality

This series fixes some of the broken rmnet functionality.
Bridge mode is re-written and made useable and the muxed_ep is converted to hlist.

Patches 1-5 are cleanups in preparation for these changes.
Patch 6 does the hlist conversion.
Patch 7 has the implementation of the rmnet bridge mode.

v1->v2: Fix the warning and code style issue in rmnet_rx_handler as
mentioned by David.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qualcomm: rmnet: Implement bridge mode
Subash Abhinov Kasiviswanathan [Thu, 12 Oct 2017 00:43:58 +0000 (18:43 -0600)]
net: qualcomm: rmnet: Implement bridge mode

Add support to bridge two devices which can send multiplexing and
aggregation (MAP) data. This is done only when the data itself is
not going to be consumed in the stack but is being passed on to a
different endpoint. This is mainly used for testing.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qualcomm: rmnet: Convert the muxed endpoint to hlist
Subash Abhinov Kasiviswanathan [Thu, 12 Oct 2017 00:43:57 +0000 (18:43 -0600)]
net: qualcomm: rmnet: Convert the muxed endpoint to hlist

Rather than using a static array, use a hlist to store the muxed
endpoints and use the mux id to query the rmnet_device.
This is useful as usually very few mux ids are used.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Dan Williams <dcbw@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qualcomm: rmnet: Remove duplicate setting of rmnet_devices
Subash Abhinov Kasiviswanathan [Thu, 12 Oct 2017 00:43:56 +0000 (18:43 -0600)]
net: qualcomm: rmnet: Remove duplicate setting of rmnet_devices

The rmnet_devices information is already stored in muxed_ep, so
storing this in rmnet_devices[] again is redundant.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qualcomm: rmnet: Remove duplicate setting of rmnet private info
Subash Abhinov Kasiviswanathan [Thu, 12 Oct 2017 00:43:55 +0000 (18:43 -0600)]
net: qualcomm: rmnet: Remove duplicate setting of rmnet private info

The end point is set twice in the local_ep as well as the mux_id and
the real_dev in the rmnet private structure. Remove the local_ep.
While these elements are equivalent, rmnet_endpoint will be
used only as part of the rmnet_port for muxed scenarios in VND mode.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qualcomm: rmnet: Move rmnet_mode to rmnet_port
Subash Abhinov Kasiviswanathan [Thu, 12 Oct 2017 00:43:54 +0000 (18:43 -0600)]
net: qualcomm: rmnet: Move rmnet_mode to rmnet_port

Mode information on the real device makes it easier to route packets
to rmnet device or bridged device based on the configuration.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qualcomm: rmnet: Remove some unused defines
Subash Abhinov Kasiviswanathan [Thu, 12 Oct 2017 00:43:53 +0000 (18:43 -0600)]
net: qualcomm: rmnet: Remove some unused defines

Most of these constants were used in the initial patchset where
custom netlink configuration was used and hence are no longer relevant.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qualcomm: rmnet: Remove existing logic for bridge mode
Subash Abhinov Kasiviswanathan [Thu, 12 Oct 2017 00:43:52 +0000 (18:43 -0600)]
net: qualcomm: rmnet: Remove existing logic for bridge mode

This will be rewritten in the following patches.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'qcom-emac-various-minor-fixes'
David S. Miller [Wed, 11 Oct 2017 23:01:57 +0000 (16:01 -0700)]
Merge branch 'qcom-emac-various-minor-fixes'

Timur Tabi says:

====================
net: qcom/emac: various minor fixes

A set of patches for 4.15 that clean up some code, apply minors fixes,
and so on.  Some of the code also prepares the driver for a future
version of the EMAC controller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qcom/emac: clean up some TX/RX error messages
Timur Tabi [Wed, 11 Oct 2017 19:52:26 +0000 (14:52 -0500)]
net: qcom/emac: clean up some TX/RX error messages

Some of the error messages that are printed by the interrupt handlers
are poorly written.  For example, many don't include a device prefix,
so there's no indication that they are EMAC errors.

Also use rate limiting for all messages that could be printed from
interrupt context.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qcom/emac: enforce DMA address restrictions
Timur Tabi [Wed, 11 Oct 2017 19:52:25 +0000 (14:52 -0500)]
net: qcom/emac: enforce DMA address restrictions

The EMAC has a restriction that the upper 32 bits of the base addresses
for the RFD and RRD rings must be the same.  The ensure that restriction,
we allocate twice the space for the RRD and locate it at an appropriate
address.

We also re-arrange the allocations so that invalid addresses are even
less likely.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qcom/emac: remove unused address arrays
Timur Tabi [Wed, 11 Oct 2017 19:52:24 +0000 (14:52 -0500)]
net: qcom/emac: remove unused address arrays

The EMAC is capable of multiple TX and RX rings, but the driver only
supports one ring for each.  One function had some left-over unused
code that supports multiple rings, but all it did was make the code
harder to read.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qcom/emac: specify the correct DMA mask
Timur Tabi [Wed, 11 Oct 2017 19:52:23 +0000 (14:52 -0500)]
net: qcom/emac: specify the correct DMA mask

The 64/32-bit DMA mask hackery in the EMAC driver is not actually necessary,
and is technically not accurate.  The EMAC hardware is limted to a 45-bit
DMA address.  Although no EMAC-enabled system can have that much DDR,
an IOMMU could possible provide a larger address.  Rather than play games
with the DMA mappings, the driver should provide a correct value and
trust the DMA/IOMMU layers to do the right thing.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'qrtr-Fixes-and-support-receiving-version-2-packets'
David S. Miller [Wed, 11 Oct 2017 22:28:39 +0000 (15:28 -0700)]
Merge branch 'qrtr-Fixes-and-support-receiving-version-2-packets'

Bjorn Andersson says:

====================
net: qrtr: Fixes and support receiving version 2 packets

On the latest Qualcomm platforms remote processors are sending packets with
version 2 of the message header. This series starts off with some fixes and
then refactors the qrtr code to support receiving messages of both version 1
and version 2.

As all remotes are backwards compatible transmitted packets continues to be
send as version 1, but some groundwork has been done to make this a per-link
property.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qrtr: Support decoding incoming v2 packets
Bjorn Andersson [Wed, 11 Oct 2017 06:45:23 +0000 (23:45 -0700)]
net: qrtr: Support decoding incoming v2 packets

Add the necessary logic for decoding incoming messages of version 2 as
well. Also make sure there's room for the bigger of version 1 and 2
headers in the code allocating skbs for outgoing messages.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qrtr: Use sk_buff->cb in receive path
Bjorn Andersson [Wed, 11 Oct 2017 06:45:22 +0000 (23:45 -0700)]
net: qrtr: Use sk_buff->cb in receive path

Rather than parsing the header of incoming messages throughout the
implementation do it once when we retrieve the message and store the
relevant information in the "cb" member of the sk_buff.

This allows us to, in a later commit, decode version 2 messages into
this same structure.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qrtr: Clean up control packet handling
Bjorn Andersson [Wed, 11 Oct 2017 06:45:21 +0000 (23:45 -0700)]
net: qrtr: Clean up control packet handling

As the message header generation is deferred the internal functions for
generating control packets can be simplified.

This patch modifies qrtr_alloc_ctrl_packet() to, in addition to the
sk_buff, return a reference to a struct qrtr_ctrl_pkt, which clarifies
and simplifies the helpers to the point that these functions can be
folded back into the callers.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qrtr: Pass source and destination to enqueue functions
Bjorn Andersson [Wed, 11 Oct 2017 06:45:20 +0000 (23:45 -0700)]
net: qrtr: Pass source and destination to enqueue functions

Defer writing the message header to the skb until its time to enqueue
the packet. As the receive path is reworked to decode the message header
as it's received from the transport and only pass around the payload in
the skb this change means that we do not have to fill out the full
message header just to decode it immediately in qrtr_local_enqueue().

In the future this change also makes it possible to prepend message
headers based on the version of each link.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qrtr: Add control packet definition to uapi
Bjorn Andersson [Wed, 11 Oct 2017 06:45:19 +0000 (23:45 -0700)]
net: qrtr: Add control packet definition to uapi

The QMUX protocol specification defines structure of the special control
packet messages being sent between handlers of the control port.

Add these to the uapi header, as this structure and the associated types
are shared between the kernel and all userspace handlers of control
messages.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qrtr: Move constants to header file
Bjorn Andersson [Wed, 11 Oct 2017 06:45:18 +0000 (23:45 -0700)]
net: qrtr: Move constants to header file

The constants are used by both the name server and clients, so clarify
their value and move them to the uapi header.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: qrtr: Invoke sk_error_report() after setting sk_err
Bjorn Andersson [Wed, 11 Oct 2017 06:45:17 +0000 (23:45 -0700)]
net: qrtr: Invoke sk_error_report() after setting sk_err

Rather than manually waking up any context sleeping on the sock to
signal an error we should call sk_error_report(). This has the added
benefit that in-kernel consumers can override this notification with
its own callback.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: make local functions static
Wei Yongjun [Wed, 11 Oct 2017 02:35:23 +0000 (02:35 +0000)]
net: hns3: make local functions static

Fixes the following sparse warnings:

drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c:464:5: warning:
 symbol 'hns3_change_all_ring_bd_num' was not declared. Should it be static?
drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c:477:5: warning:
 symbol 'hns3_set_ringparam' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoatm: idt77105: Drop needless setup_timer()
Kees Cook [Tue, 10 Oct 2017 19:25:48 +0000 (12:25 -0700)]
atm: idt77105: Drop needless setup_timer()

Calling setup_timer() is redundant when DEFINE_TIMER() has been used.

Cc: Chas Williams <3chas3@gmail.com>
Cc: linux-atm-general@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: at803x: Change error to EINVAL for invalid MAC
Dan Murphy [Tue, 10 Oct 2017 17:42:56 +0000 (12:42 -0500)]
net: phy: at803x: Change error to EINVAL for invalid MAC

Change the return error code to EINVAL if the MAC
address is not valid in the set_wol function.

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: DP83822 initial driver submission
Dan Murphy [Tue, 10 Oct 2017 17:42:55 +0000 (12:42 -0500)]
net: phy: DP83822 initial driver submission

Add support for the TI  DP83822 10/100Mbit ethernet phy.

The DP83822 provides flexibility to connect to a MAC through a
standard MII, RMII or RGMII interface.

In addition the DP83822 needs to be removed from the DP83848 driver
as the WoL support is added here for this device.

Datasheet:
http://www.ti.com/product/DP83822I/datasheet

Signed-off-by: Dan Murphy <dmurphy@ti.com>
Acked-by: Andrew F. Davis <afd@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'lan9303-Add-basic-offloading-of-unicast-traffic'
David S. Miller [Wed, 11 Oct 2017 20:53:21 +0000 (13:53 -0700)]
Merge branch 'lan9303-Add-basic-offloading-of-unicast-traffic'

Egil Hjelmeland says:

====================
lan9303: Add basic offloading of unicast traffic

This series add basic offloading of unicast traffic to the lan9303
DSA driver.

Review welcome!

Changes v1 -> v2:
 - Patch 1: Codestyle linting.
 - Patch 2: Remember SWE_PORT_STATE while not bridged.
            Added constant LAN9303_SWE_PORT_MIRROR_DISABLED.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: lan9303: Add basic offloading of unicast traffic
Egil Hjelmeland [Tue, 10 Oct 2017 12:49:53 +0000 (14:49 +0200)]
net: dsa: lan9303: Add basic offloading of unicast traffic

When both user ports are joined to the same bridge, the normal
HW MAC learning is enabled. This means that unicast traffic is forwarded
in HW.

If one of the user ports leave the bridge,
the ports goes back to the initial separated operation.

Port separation relies on disabled HW MAC learning. Hence the condition
that both ports must join same bridge.

Add brigde methods port_bridge_join, port_bridge_leave and
port_stp_state_set.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: lan9303: Move tag setup to new lan9303_setup_tagging
Egil Hjelmeland [Tue, 10 Oct 2017 12:49:52 +0000 (14:49 +0200)]
net: dsa: lan9303: Move tag setup to new lan9303_setup_tagging

Prepare for next patch:
Move tag setup from lan9303_separate_ports() to new function
lan9303_setup_tagging()

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotcp: fix tcp_unlink_write_queue()
Eric Dumazet [Wed, 11 Oct 2017 20:27:29 +0000 (13:27 -0700)]
tcp: fix tcp_unlink_write_queue()

Yury reported crash with this signature :

[  554.034021] [<ffff80003ccd5a58>] 0xffff80003ccd5a58
[  554.034156] [<ffff00000888fd34>] skb_release_all+0x14/0x30
[  554.034288] [<ffff00000888fd64>] __kfree_skb+0x14/0x28
[  554.034409] [<ffff0000088ece6c>] tcp_sendmsg_locked+0x4dc/0xcc8
[  554.034541] [<ffff0000088ed68c>] tcp_sendmsg+0x34/0x58
[  554.034659] [<ffff000008919fd4>] inet_sendmsg+0x2c/0xf8
[  554.034783] [<ffff0000088842e8>] sock_sendmsg+0x18/0x30
[  554.034928] [<ffff0000088861fc>] SyS_sendto+0x84/0xf8

Problem is that skb->destructor contains garbage, and this is
because I accidentally removed tcp_skb_tsorted_anchor_cleanup()
from tcp_unlink_write_queue()

This would trigger with a write(fd, <invalid_memory>, len) attempt,
and we will add to packetdrill this capability to avoid future
regressions.

Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
Reported-by: Yury Norov <ynorov@caviumnetworks.com>
Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'mac80211-next-for-davem-2017-10-11' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Wed, 11 Oct 2017 17:15:01 +0000 (10:15 -0700)]
Merge tag 'mac80211-next-for-davem-2017-10-11' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Work continues in various areas:
 * port authorized event for 4-way-HS offload (Avi)
 * enable MFP optional for such devices (Emmanuel)
 * Kees's timer setup patch for mac80211 mesh
   (the part that isn't trivially scripted)
 * improve VLAN vs. TXQ handling (myself)
 * load regulatory database as firmware file (myself)
 * with various other small improvements and cleanups

I merged net-next once in the meantime to allow Kees's
timer setup patch to go in.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocfg80211: implement regdb signature checking
Johannes Berg [Wed, 13 Sep 2017 20:21:08 +0000 (22:21 +0200)]
cfg80211: implement regdb signature checking

Currently CRDA implements the signature checking, and the previous
commits added the ability to load the whole regulatory database
into the kernel.

However, we really can't lose the signature checking, so implement
it in the kernel by loading a detached signature (regulatory.db.p7s)
and check it against built-in keys.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agocfg80211: reg: remove support for built-in regdb
Johannes Berg [Thu, 15 Oct 2015 12:35:41 +0000 (14:35 +0200)]
cfg80211: reg: remove support for built-in regdb

Parsing and building C structures from a regdb is no longer needed
since the "firmware" file (regulatory.db) can be linked into the
kernel image to achieve the same effect.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agocfg80211: support reloading regulatory database
Johannes Berg [Wed, 13 Sep 2017 14:07:22 +0000 (16:07 +0200)]
cfg80211: support reloading regulatory database

If the regulatory database is loaded, and then updated, it may
be necessary to reload it. Add an nl80211 command to do this.

Note that this just reloads the database, it doesn't re-apply
the rules from it immediately.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agocfg80211: support loading regulatory database as firmware file
Johannes Berg [Thu, 15 Oct 2015 09:22:58 +0000 (11:22 +0200)]
cfg80211: support loading regulatory database as firmware file

As the current regulatory database is only about 4k big, and already
difficult to extend, we decided that overall it would be better to
get rid of the complications with CRDA and load the database into the
kernel directly, but in a new format that is extensible.

The new file format can be extended since it carries a length field
on all the structs that need to be extensible.

In order to be able to request firmware when the module initializes,
move cfg80211 from subsys_initcall() to the later fs_initcall(); the
firmware loader is at the same level but linked earlier, so it can
be called from there. Otherwise, when both the firmware loader and
cfg80211 are built-in, the request will crash the kernel. We also
need to be before device_initcall() so that cfg80211 is available
for devices when they initialize.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agomac80211: only remove AP VLAN frames from TXQ
Johannes Berg [Fri, 6 Oct 2017 09:53:33 +0000 (11:53 +0200)]
mac80211: only remove AP VLAN frames from TXQ

When removing an AP VLAN interface, mac80211 currently purges
the entire TXQ for the AP interface. Fix this by using the FQ
API introduced in the previous patch to filter frames.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Toke HĆøiland-JĆørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agofq: support filtering a given tin
Johannes Berg [Fri, 6 Oct 2017 09:53:32 +0000 (11:53 +0200)]
fq: support filtering a given tin

Add to the FQ API a way to filter a given tin, in order to
remove frames that fulfil certain criteria according to a
filter function.

This will be used by mac80211 to remove frames belonging to
an AP VLAN interface that's being removed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Toke HĆøiland-JĆørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agomac80211: aead api to reduce redundancy
Xiang Gao [Wed, 11 Oct 2017 02:31:49 +0000 (22:31 -0400)]
mac80211: aead api to reduce redundancy

Currently, the aes_ccm.c and aes_gcm.c are almost line by line copy of
each other. This patch reduce code redundancy by moving the code in these
two files to crypto/aead_api.c to make it a higher level aead api. The
file aes_ccm.c and aes_gcm.c are removed and all the functions there are
now implemented in their headers using the newly added aead api.

Signed-off-by: Xiang Gao <qasdfgtyuiop@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agoMAINTAINERS: update Johannes Berg's entries
Johannes Berg [Tue, 10 Oct 2017 07:57:59 +0000 (09:57 +0200)]
MAINTAINERS: update Johannes Berg's entries

Update my MAINTAINERS file entries to list all the right files.
Since I'm also the de-facto wireless extensions maintainer,
there's little point in excluding those.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agoopenvswitch: add ct_clear action
Eric Garver [Tue, 10 Oct 2017 20:54:44 +0000 (16:54 -0400)]
openvswitch: add ct_clear action

This adds a ct_clear action for clearing conntrack state. ct_clear is
currently implemented in OVS userspace, but is not backed by an action
in the kernel datapath. This is useful for flows that may modify a
packet tuple after a ct lookup has already occurred.

Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dst: move cpu inside ifdef to avoid compilation warning
Jakub Kicinski [Tue, 10 Oct 2017 22:05:39 +0000 (15:05 -0700)]
net: dst: move cpu inside ifdef to avoid compilation warning

If CONFIG_DST_CACHE is not selected cpu variable
will be unused and we will see a compilation warning.
Move it under the ifdef.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: d66f2b91f95b ("bpf: don't rely on the verifier lock for metadata_dst allocation")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Tue, 10 Oct 2017 20:20:16 +0000 (13:20 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2017-10-10

This series contains updates to e1000e and igb.

Benjamin Poirier provides several fixes for e1000e, starting with a
correction to the return status which was always returning success even
if it was not successful.  Fixed code comments to reflect the actual
code behavior.  Fixed the conditional test for the correct return
value.  Fixed a potential race condition reported by Lennart Sorensen,
where the single flag get_link_status is used to signal two different
states.

Sasha fixes a buffer overrun for i219 devices, where the chipset had
reduced the round-trip latency for the LAN controller DMA accesses
which in some high performance cases caused a buffer overrun while
processing the DMA transactions.

Willem de Bruijn changes the default behavior of e1000e to use the
burst mode settings by default unless the user specifies the
receive interrupt delay (RxIntDelay).

Florian Fainelli updates the driver to differentiate between when
e1000e_put_txbuf() is called from normal reclamation or when a
DMA mapping failure to make the driver more "drop monitor friendly".

Christophe JAILLET fixes a potential NULL pointer dereference by
properly returning -ENOMEM on memory allocation failures.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agortnetlink: bridge: use ext_ack instead of printk
Florian Westphal [Tue, 10 Oct 2017 15:10:04 +0000 (17:10 +0200)]
rtnetlink: bridge: use ext_ack instead of printk

We can now piggyback error strings to userspace via extended acks
rather than using printk.

Before:
bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
RTNETLINK answers: Invalid argument

After:
bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
Error: invalid vlan id.

v3: drop 'RTM_' prefixes, suggested by David Ahern, they
are not useful, the add/del in bridge command line is enough.

Also reword error in response to malformed/bad vlan id attribute
size.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoselftests: rtnetlink: test RTM_GETNETCONF
Florian Westphal [Tue, 10 Oct 2017 14:18:05 +0000 (16:18 +0200)]
selftests: rtnetlink: test RTM_GETNETCONF

exercise RTM_GETNETCONF call path for unspec, inet and inet6
families, they are DOIT_UNLOCKED candidates.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlx4_en-num-of-rings'
David S. Miller [Tue, 10 Oct 2017 20:11:23 +0000 (13:11 -0700)]
Merge branch 'mlx4_en-num-of-rings'

Tariq Toukan says:

====================
mlx4_en num of rings

This patchset from Inbar contains changes to rings control
to the mlx4 Eth driver.

Patches 1 and 2 limit the number of rings to the number of CPUs.
Patch 3 removes a limitation in logic of default number of RX rings.

Series generated against net-next commit:
812b5ca7d376 Add a driver for Renesas uPD60620 and uPD60620A PHYs
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Increase number of default RX rings
Inbar Karmy [Tue, 10 Oct 2017 09:28:35 +0000 (12:28 +0300)]
net/mlx4_en: Increase number of default RX rings

Remove limitation of netif_get_num_default_rss_queues()
from logic of RX rings default number.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Limit the number of RX rings
Inbar Karmy [Tue, 10 Oct 2017 09:28:34 +0000 (12:28 +0300)]
net/mlx4_en: Limit the number of RX rings

Limit the number of RX rings by the number of cores
in the system.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4_en: Limit the number of TX rings
Inbar Karmy [Tue, 10 Oct 2017 09:28:33 +0000 (12:28 +0300)]
net/mlx4_en: Limit the number of TX rings

Limit the number of TX rings per UP by the number of cores
in the system.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'hnx3-rxnfc'
David S. Miller [Tue, 10 Oct 2017 20:09:14 +0000 (13:09 -0700)]
Merge branch 'hnx3-rxnfc'

Lipeng says:

====================
Support set_ringparam and {set|get}_rxnfc ethtool commands

1, Patch [1/5,2/5] add support for ethtool ops set_ringparam
   (ethtool -G) and fix related bug.
2, Patch [3/5,4/5, 5/5] add support for ethtool ops
   set_rxnfc/get_rxnfc (-n/-N) and fix related bug.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: fix the ring count for ETHTOOL_GRXRINGS
Lipeng [Tue, 10 Oct 2017 08:42:07 +0000 (16:42 +0800)]
net: hns3: fix the ring count for ETHTOOL_GRXRINGS

This patch fix the ring count for ETHTOOL_GRXRINGS. Ring count
not TC size should be return for command "ethtool -n ethx".

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: add support for ETHTOOL_GRXFH
Lipeng [Tue, 10 Oct 2017 08:42:06 +0000 (16:42 +0800)]
net: hns3: add support for ETHTOOL_GRXFH

This patch add support for ethtool's ETHTOOL_GRXFH in hns3_get_rxnfc().

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: add support for set_rxnfc
Lipeng [Tue, 10 Oct 2017 08:42:05 +0000 (16:42 +0800)]
net: hns3: add support for set_rxnfc

This patch supports the ethtool's set_rxnfc().

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: add support for set_ringparam
Lipeng [Tue, 10 Oct 2017 08:42:04 +0000 (16:42 +0800)]
net: hns3: add support for set_ringparam

This patch supports the ethtool's set_ringparam().

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: hns3: fixes the ring index in hns3_fini_ring
Lipeng [Tue, 10 Oct 2017 08:42:03 +0000 (16:42 +0800)]
net: hns3: fixes the ring index in hns3_fini_ring

This patch fixes the ring index in hns3_fini_ring.

Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: add new T5 pci device id's
Ganesh Goudar [Tue, 10 Oct 2017 07:15:02 +0000 (12:45 +0530)]
cxgb4: add new T5 pci device id's

Add 0x50aa and 0x50ab T5 device id's.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agocxgb4: Add support for new flash parts
Ganesh Goudar [Tue, 10 Oct 2017 07:14:13 +0000 (12:44 +0530)]
cxgb4: Add support for new flash parts

Add support for new flash parts identification, and
also cleanup the flash Part identifying and decoding
code.

Based on the original work of Casey Leedom <leedom@chelsio.com>

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/core: Fix BUG to BUG_ON conditionals.
Tim Hansen [Mon, 9 Oct 2017 15:37:59 +0000 (11:37 -0400)]
net/core: Fix BUG to BUG_ON conditionals.

Fix BUG() calls to use BUG_ON(conditional) macros.

This was found using make coccicheck M=net/core on linux next
tag next-2017092

Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'bpf-get-rid-of-global-verifier-state-and-reuse-instruction-printer'
David S. Miller [Tue, 10 Oct 2017 19:30:17 +0000 (12:30 -0700)]
Merge branch 'bpf-get-rid-of-global-verifier-state-and-reuse-instruction-printer'

Jakub Kicinski says:

====================
bpf: get rid of global verifier state and reuse instruction printer

This set started off as simple extraction of eBPF verifier's instruction
printer into a separate file but evolved into removal of global state.
The purpose of moving instruction printing code is to be able to reuse it
from the bpftool.

As far as the global verifier lock goes, this set removes the global
variables relating to the log buffer, makes the one-time init done
by bpf_get_skb_set_tunnel_proto() not depend on any external locking,
and performs verifier log writeback as data is produced removing the need
for allocating a potentially large temporary buffer.

The final step of actually removing the verifier lock is left to someone
more competent and self-confident :)

Note that struct bpf_verifier_env is just 40B under two pages now,
we should probably switch to vzalloc() when it's expanded again...

v2:
 - add a selftest;
 - use env buffer and flush on every print (Alexei);
 - handle kernel log allocation failures (Daniel);
 - put the env log members into a struct (Daniel).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: write back the verifier log buffer as it gets filled
Jakub Kicinski [Mon, 9 Oct 2017 17:30:15 +0000 (10:30 -0700)]
bpf: write back the verifier log buffer as it gets filled

Verifier log buffer can be quite large (up to 16MB currently).
As Eric Dumazet points out if we allow multiple verification
requests to proceed simultaneously, malicious user may use the
verifier as a way of allocating large amounts of unswappable
memory to OOM the host.

Switch to a strategy of allocating a smaller buffer (1024B)
and writing it out into the user buffer after every print.

While at it remove the old BUG_ON().

This is in preparation of the global verifier lock removal.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: don't rely on the verifier lock for metadata_dst allocation
Jakub Kicinski [Mon, 9 Oct 2017 17:30:14 +0000 (10:30 -0700)]
bpf: don't rely on the verifier lock for metadata_dst allocation

bpf_skb_set_tunnel_*() functions require allocation of per-cpu
metadata_dst.  The allocation happens upon verification of the
first program using those helpers.  In preparation for removing
the verifier lock, use cmpxchg() to make sure we only allocate
the metadata_dsts once.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agotools: bpftool: use the kernel's instruction printer
Jakub Kicinski [Mon, 9 Oct 2017 17:30:13 +0000 (10:30 -0700)]
tools: bpftool: use the kernel's instruction printer

Compile the instruction printer from kernel/bpf and use it
for disassembling "translated" eBPF code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: move instruction printing into a separate file
Jakub Kicinski [Mon, 9 Oct 2017 17:30:12 +0000 (10:30 -0700)]
bpf: move instruction printing into a separate file

Separate the instruction printing into a standalone source file.
This way sneaky code from tools/ can compile it in directly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: move global verifier log into verifier environment
Jakub Kicinski [Mon, 9 Oct 2017 17:30:11 +0000 (10:30 -0700)]
bpf: move global verifier log into verifier environment

The biggest piece of global state protected by the verifier lock
is the verifier_log.  Move that log to struct bpf_verifier_env.
struct bpf_verifier_env has to be passed now to all invocations
of verbose().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: encapsulate verifier log state into a structure
Jakub Kicinski [Mon, 9 Oct 2017 17:30:10 +0000 (10:30 -0700)]
bpf: encapsulate verifier log state into a structure

Put the loose log_* variables into a structure.  This will make
it simpler to remove the global verifier state in following patches.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoselftests/bpf: add a test for verifier logs
Jakub Kicinski [Mon, 9 Oct 2017 17:30:09 +0000 (10:30 -0700)]
selftests/bpf: add a test for verifier logs

Add a test for verifier log handling.  Check bad attr combinations
but focus on cases when log is truncated.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: fix incorrect bitwise operator used on rt6i_flags
Colin Ian King [Tue, 10 Oct 2017 18:10:30 +0000 (19:10 +0100)]
ipv6: fix incorrect bitwise operator used on rt6i_flags

The use of the | operator always leads to true which looks rather
suspect to me. Fix this by using & instead to just check the
RTF_CACHE entry bit.

Detected by CoverityScan, CID#1457734, #1457747 ("Wrong operator used")

Fixes: 35732d01fe31 ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: fix dereference of rt6_ex before null check error
Colin Ian King [Tue, 10 Oct 2017 17:01:16 +0000 (18:01 +0100)]
ipv6: fix dereference of rt6_ex before null check error

Currently rt6_ex is being dereferenced before it is null checked
hence there is a possible null dereference bug. Fix this by only
dereferencing rt6_ex after it has been null checked.

Detected by CoverityScan, CID#1457749 ("Dereference before null check")

Fixes: 81eb8447daae ("ipv6: take care of rt6_stats")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoigb: check memory allocation failure
Christophe JAILLET [Sun, 27 Aug 2017 06:39:51 +0000 (08:39 +0200)]
igb: check memory allocation failure

Check memory allocation failures and return -ENOMEM in such cases, as
already done for other memory allocations in this function.

This avoids NULL pointers dereference.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Aaron Brown <aaron.f.brown@intel.com
Acked-by: PJ Waskiewicz <peter.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: Be drop monitor friendly
Florian Fainelli [Sat, 26 Aug 2017 01:14:24 +0000 (18:14 -0700)]
e1000e: Be drop monitor friendly

e1000e_put_txbuf() can be called from normal reclamation path as well as
when a DMA mapping failure, so we need to differentiate these two cases
when freeing SKBs to be drop monitor friendly. e1000e_tx_hwtstamp_work()
and e1000_remove() are processing TX timestamped SKBs and those should
not be accounted as drops either.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: apply burst mode settings only on default
Willem de Bruijn [Fri, 25 Aug 2017 15:06:26 +0000 (11:06 -0400)]
e1000e: apply burst mode settings only on default

Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.

The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.

Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: fix buffer overrun while the I219 is processing DMA transactions
Sasha Neftin [Sun, 6 Aug 2017 13:49:18 +0000 (16:49 +0300)]
e1000e: fix buffer overrun while the I219 is processing DMA transactions

IntelĀ® 100/200 Series Chipset platforms reduced the round-trip
latency for the LAN Controller DMA accesses, causing in some high
performance cases a buffer overrun while the I219 LAN Connected
Device is processing the DMA transactions. I219LM and I219V devices
can fall into unrecovered Tx hang under very stressfully UDP traffic
and multiple reconnection of Ethernet cable. This Tx hang of the LAN
Controller is only recovered if the system is rebooted. Slightly slow
down DMA access by reducing the number of outstanding requests.
This workaround could have an impact on TCP traffic performance
on the platform. Disabling TSO eliminates performance loss for TCP
traffic without a noticeable impact on CPU performance.

Please, refer to I218/I219 specification update:
https://www.intel.com/content/www/us/en/embedded/products/networking/
ethernet-connection-i218-family-documentation.html

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: Raanan Avargil <raanan.avargil@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: Avoid receiver overrun interrupt bursts
Benjamin Poirier [Fri, 21 Jul 2017 18:36:27 +0000 (11:36 -0700)]
e1000e: Avoid receiver overrun interrupt bursts

When e1000e_poll() is not fast enough to keep up with incoming traffic, the
adapter (when operating in msix mode) raises the Other interrupt to signal
Receiver Overrun.

This is a double problem because 1) at the moment e1000_msix_other()
assumes that it is only called in case of Link Status Change and 2) if the
condition persists, the interrupt is repeatedly raised again in quick
succession.

Ideally we would configure the Other interrupt to not be raised in case of
receiver overrun but this doesn't seem possible on this adapter. Instead,
we handle the first part of the problem by reverting to the practice of
reading ICR in the other interrupt handler, like before commit 16ecba59bc33
("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME
from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
anymore. We handle the second part of the problem by not re-enabling the
Other interrupt right away when there is overrun. Instead, we wait until
traffic subsides, napi polling mode is exited and interrupts are
re-enabled.

Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Fixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: Separate signaling for link check/link up
Benjamin Poirier [Fri, 21 Jul 2017 18:36:26 +0000 (11:36 -0700)]
e1000e: Separate signaling for link check/link up

Lennart reported the following race condition:

\ e1000_watchdog_task
    \ e1000e_has_link
        \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
            /* link is up */
            mac->get_link_status = false;

                            /* interrupt */
                            \ e1000_msix_other
                                hw->mac.get_link_status = true;

        link_active = !hw->mac.get_link_status
        /* link_active is false, wrongly */

This problem arises because the single flag get_link_status is used to
signal two different states: link status needs checking and link status is
down.

Avoid the problem by using the return value of .check_for_link to signal
the link status to e1000e_has_link().

Reported-by: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: Fix return value test
Benjamin Poirier [Fri, 21 Jul 2017 18:36:25 +0000 (11:36 -0700)]
e1000e: Fix return value test

All the helpers return -E1000_ERR_PHY.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: Fix wrong comment related to link detection
Benjamin Poirier [Fri, 21 Jul 2017 18:36:24 +0000 (11:36 -0700)]
e1000e: Fix wrong comment related to link detection

Reading e1000e_check_for_copper_link() shows that get_link_status is set to
false after link has been detected. Therefore, it stays TRUE until then.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoe1000e: Fix error path in link detection
Benjamin Poirier [Fri, 21 Jul 2017 18:36:23 +0000 (11:36 -0700)]
e1000e: Fix error path in link detection

In case of error from e1e_rphy(), the loop will exit early and "success"
will be set to true erroneously.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoAdd a driver for Renesas uPD60620 and uPD60620A PHYs
Bernd Edlinger [Sun, 8 Oct 2017 13:40:08 +0000 (13:40 +0000)]
Add a driver for Renesas uPD60620 and uPD60620A PHYs

Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agovhost_net: do not stall on zerocopy depletion
Willem de Bruijn [Fri, 6 Oct 2017 17:22:31 +0000 (13:22 -0400)]
vhost_net: do not stall on zerocopy depletion

Vhost-net has a hard limit on the number of zerocopy skbs in flight.
When reached, transmission stalls. Stalls cause latency, as well as
head-of-line blocking of other flows that do not use zerocopy.

Instead of stalling, revert to copy-based transmission.

Tested by sending two udp flows from guest to host, one with payload
of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
large flow is redirected to a netem instance with 1MBps rate limit
and deep 1000 entry queue.

  modprobe ifb
  ip link set dev ifb0 up
  tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit

  tc qdisc add dev tap0 ingress
  tc filter add dev tap0 parent ffff: protocol ip \
      u32 match ip dport 8000 0xffff \
      action mirred egress redirect dev ifb0

Before the delay, both flows process around 80K pps. With the delay,
before this patch, both process around 400. After this patch, the
large flow is still rate limited, while the small reverts to its
original rate. See also discussion in the first link, below.

Without rate limiting, {1, 10, 100}x TCP_STREAM tests continued to
send at 100% zerocopy.

The limit in vhost_exceeds_maxpend must be carefully chosen. With
vq->num >> 1, the flows remain correlated. This value happens to
correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
fractions and ensure correctness also for much smaller values of
vq->num, by testing the min() of both explicitly. See also the
discussion in the second link below.

Changes
  v1 -> v2
    - replaced min with typed min_t
    - avoid unnecessary whitespace change

Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoopenvswitch: Add erspan tunnel support.
William Tu [Thu, 5 Oct 2017 00:03:12 +0000 (17:03 -0700)]
openvswitch: Add erspan tunnel support.

Add erspan netlink interface for OVS.

Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>