git.openwrt.org Git - openwrt/staging/blogic.git/log

amd-xgbe: Tx engine must not be active before stopping it

If the Tx engine is told to stop while it is actively processing Tx
descriptors it is possible that the Tx descriptor(s) will not be closed
out properly. When the Tx engine is restarted this could result in the
driver being stuck on the improperly closed descriptor.

Update the driver to wait for the Tx engine to be in a stopped or
suspended state before issuing the stop command.

This has not been an issue to date, but it's a good safe-guard to have.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Add a read memory barrier to Tx/Rx path

Add a read memory barrier to the Tx and Rx paths where the ownership
bit is checked to be sure that all descriptor fields are read after
having read the ownership bit for the descriptor.

This has not been an issue to date, but it's a good safe-guard to have.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: USB: Deletion of unnecessary checks before the function call "kfree"

The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Xilinx: Deletion of unnecessary checks before two function calls

The functions kfree() and of_node_put() test whether their argument is NULL
and then return immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

IBM-EMAC: Deletion of unnecessary checks before the function call "of_dev_put"

The of_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: mlx4_en_set_settings() always fails when autoneg is set

Fix ethtool set settings to not check AUTONEG_ENABLE

mlx4_en_set_settings should not check if cmd->autoneg == AUTONEG_ENABLE,
cmd->autoneg can be enabled by default and this check will fail other settings requests.
mlx4_en driver doesn't support changing autoneg value, but shouldn't fail the request
in case cmd->autoneg was set.

Fixes: d48b3ab ("net/mlx4_en: Use PTYS register to set ethtool settings (Speed)")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: atm: eni: Add pci_dma_mapping_error() call

Added a pci_dma_mapping_error() call to check for mapping errors before
further using the dma handle. In case of error, control goes to a new label
where the incoming skb is freed. Unchecked dma handles were found using
Coccinelle:

@rule1@
expression e1;
identifier x;
@@

*x = pci_map_single(...);
... when != pci_dma_mapping_error(e1,x)

Signed-off-by: Tina Johnson <tinajohnson.1234@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tipc-next'

Richard Alpe says:

====================
tipc: new netlink API

v3
The old API is not removed.

The new API is separated from the old because of a bug in the old
tipc-config utility using it. When adding commands to the existing
genl_ops struct the get-family response message grows to a point where
it overflows the small receive buffer in tipc-config, subsequently
breaking the tool. Hence the two genl_family and genl_ops structs.

The new headers are placed in a new file called tipc_netlink.h rather
than added to tipc_config.h as they where in previous versions of this
patchset.
/v3

v2
Redesigned "socket list command" to address David Millers comments in
net-next v1 of this patchset.

Simply put the problem is that we can have an arbitrary amount of
sockets with an arbitrary amount of associated publications. In the
previous patchset this was solved by nesting as many publications as
possible into a socket. If all didn't fit it sent the same socket again
with the remaining publications. As David Miller pointed out this makes
each message malformed as the receiver cannot by the data itself know if
it has received a complete set or not. This was flagged outside of the
data and the client did the reassembly.

o socket 1
  o publ 1
  o publ 2
o socket 1
  o publ 3
  o publ 4

In this patchset this is divided into socket listing and publication
listing to avoid having nested data of arbitrary size.

TIPC_NL_SOCK_GET now dumps all sockets with any nested connection
information. However, it no longer include publication information,
only a HAS_PUBL flag to indicate whether the socket has publications or
not. To compliment this there is a new command TIPC_NL_PUBL_GET which
takes a socket as argument and dumps all associated publications.

This means that on "top-level" the data is always complete. In the case
of "tipc socket list" (new tipc-config -p) it first queries all sockets
with TIPC_NL_SOCK_GET and if the socket is published it fetches the
publications using TIPC_NL_PUBL_GET. This is slow for large amount of
sockets with a low publication count (worst case). However, the
integrity is preserved and there is no malformed messages.
/v2

This is a new netlink API for TIPC. It's intended to replace the
existing ASCII API. It utilizes many of the standard netlink
functionalities in the kernel, such as attribute nesting and
input polices.

There are a couple of reasons for this rewrite. The main and most
easily justifiable is that the existing API doesn't scale.  Meaning
that a TIPC cluster with a larger amount of nodes, publications or
ports will rapidly exceed what the exiting API can handle. Resulting
in truncated or corrupt responses. In addition to this, the existing
ASCII API rarely uses "standard" kernel functions and has several
tipc specific functions for sanity checking and string formating.

The new API utilizes standard function for pushing data to socket
buffers and netlink attribute nesting to logically group data.
The new API can handle an arbitrary amount of data for things that
are likely to scale up as the TIPC usage and/or cluster size
increases.

A new user-space tool has been developed to work with this new API.
It is called "tipc" and is part of the "tipc-utils" package that
comes with many Linux distributions.  The new "tipc" tool utilizes
standard functions from libnl to format, send, receive and process
messages. The tool has borrowed design philosophies from git and the
ip tool. Making the syntax resemble that of ip whiles its strong
modularity resembles that of git.

The existing tool for managing TIPC, "tipc-config" remains in the
package, but when built for kernels that has this new API it is
replaced by a script-based wrapper that maps the old syntax to the
new tool. This way, backwards compatibility is mostly preserved.

MORE ABOUT THE CODE

The main challenge here is to handle the case where the data is of
arbitrary size. This was largely neglected in the old API design.
For example when there is a lot of sockets that has a large amount of
associated publications. In this specific case we can't assume that
all ports nor for that matter all the publications can fit inside a
single netlink message. Sending everything in one batch isn't an
option as we need to yield for the socket layer to cope.

This is solved by using the standard netlink callback for dumping
data and releasing the locks when the netlink message is full. The
dumping mechanism gets us back and we keep a reference (logical) to
where we where when the message became full. This means that we are
not "atomic", what is retrieved by user-space isn't a snapshot at a
certain time but rather a continuously updated data set. In the case
where we can't find our way back i.e. our logical reference are gone
we set a standard flag (NLM_F_DUMP_INTR) to tell user-space that the
dump was interrupted.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add name table dump to new netlink api

Add TIPC_NL_NAME_TABLE_GET command to the new tipc netlink API.

This command supports dumping the name table of all nodes.

Netlink logical layout of name table response message:
-> name table
    -> publication
        -> type
        -> lower
        -> upper
        -> scope
        -> node
        -> ref
        -> key

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add net set to new netlink api

Add TIPC_NL_NET_SET command to the new tipc netlink API.

This command can set the network id and network (tipc) address.

Netlink logical layout of network set message:
-> net
[ -> id ]
[ -> address ]

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add net dump to new netlink api

Add TIPC_NL_NET_GET command to the new tipc netlink API.

This command dumps the network id of the node.

Netlink logical layout of returned network data:
-> net
-> id

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add node get/dump to new netlink api

Add TIPC_NL_NODE_GET to the new tipc netlink API.

This command can dump the address and node status of all nodes in the
tipc cluster.

Netlink logical layout of returned node/address data:
-> node
-> address
-> up flag

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add media set to new netlink api

Add TIPC_NL_MEDIA_SET command to the new tipc netlink API.

This command can set one or more link properties for a particular
media.

Netlink logical layout of bearer set message:
-> media
    -> name
    -> link properties
        [ -> tolerance ]
        [ -> priority ]
        [ -> window ]

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add media get/dump to new netlink api

Add TIPC_NL_MEDIA_GET command to the new tipc netlink API.

This command supports dumping all information about all defined
media as well as getting all information about a specific media.

The information about a media includes name and link properties.

Netlink logical layout of media get response message:
-> media
    -> name
    -> link properties
        -> tolerance
        -> priority
        -> window

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add link stat reset to new netlink api

Add TIPC_NL_LINK_RESET_STATS command to the new netlink API.

This command resets the link statistics for a particular link.

Netlink logical layout of link reset message:
-> link
-> name

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add link set to new netlink api

Add TIPC_NL_LINK_SET to the new tipc netlink API.

This command can set one or more link properties for a particular
link.

Netlink logical layout of link set message:
-> link
    -> name
    -> properties
        [ -> tolerance ]
        [ -> priority ]
        [ -> window ]

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add link get/dump to new netlink api

Add TIPC_NL_LINK_GET command to the new tipc netlink API.

This command supports dumping all information about all links
(including the broadcast link) or getting all information about a
specific link (not the broadcast link).

The information about a link includes name, transmission info,
properties and link statistics.

As the tipc broadcast link is special we unfortunately have to treat
it specially. It is a deliberate decision not to abstract the
broadcast link on this (API) level.

Netlink logical layout of link response message:
    -> port
        -> name
        -> MTU
        -> RX
        -> TX
        -> up flag
        -> active flag
        -> properties
           -> priority
           -> tolerance
           -> window
        -> statistics
            -> rx_info
            -> rx_fragments
            -> rx_fragmented
            -> rx_bundles
            -> rx_bundled
            -> tx_info
            -> tx_fragments
            -> tx_fragmented
            -> tx_bundles
            -> tx_bundled
            -> msg_prof_tot
            -> msg_len_cnt
            -> msg_len_tot
            -> msg_len_p0
            -> msg_len_p1
            -> msg_len_p2
            -> msg_len_p3
            -> msg_len_p4
            -> msg_len_p5
            -> msg_len_p6
            -> rx_states
            -> rx_probes
            -> rx_nacks
            -> rx_deferred
            -> tx_states
            -> tx_probes
            -> tx_nacks
            -> tx_acks
            -> retransmitted
            -> duplicates
            -> link_congs
            -> max_queue
            -> avg_queue

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add publication dump to new netlink api

Add TIPC_NL_PUBL_GET command to the new tipc netlink API.

This command supports dumping of all publications for a specific
socket.

Netlink logical layout of request message:
    -> socket
        -> reference

Netlink logical layout of response message:
    -> publication
        -> type
        -> lower
        -> upper

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add sock dump to new netlink api

Add TIPC_NL_SOCK_GET command to the new tipc netlink API.

This command supports dumping of all available sockets with their
associated connection or publication(s). It could be extended to reply
with a single socket if the NLM_F_DUMP isn't set.

The information about a socket includes reference, address, connection
information / publication information.

Netlink logical layout of response message:
-> socket
    -> reference
    -> address
    [
    -> connection
        -> node
        -> socket
        [
        -> connected flag
        -> type
        -> instance
        ]
    ]
    [
    -> publication flag
    ]

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add bearer set to new netlink api

Add TIPC_NL_BEARER_SET command to the new tipc netlink API.

This command can set one or more link properties for a particular
bearer.

Netlink logical layout of bearer set message:
-> bearer
    -> name
    -> link properties
        [ -> tolerance ]
        [ -> priority ]
        [ -> window ]

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add bearer get/dump to new netlink api

Add TIPC_NL_BEARER_GET command to the new tipc netlink API.

This command supports dumping all data about all bearers or getting
all information about a specific bearer.

The information about a bearer includes name, link priorities and
domain.

Netlink logical layout of bearer get message:
-> bearer
    -> name

Netlink logical layout of returned bearer information:
-> bearer
    -> name
    -> link properties
        -> priority
        -> tolerance
        -> window
    -> domain

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: add bearer disable/enable to new netlink api

A new netlink API for tipc that can disable or enable a tipc bearer.

The new API is separated from the old API because of a bug in the
user space client (tipc-config). The problem is that older versions
of tipc-config has a very low receive limit and adding commands to
the legacy genl_opts struct causes the ctrl_getfamily() response
message to grow, subsequently breaking the tool.

The new API utilizes netlink policies for input validation. Where the
top-level netlink attributes are tipc-logical entities, like bearer.
The top level entities then contain nested attributes. In this case
a name, nested link properties and a domain.

Netlink commands implemented in this patch:
TIPC_NL_BEARER_ENABLE
TIPC_NL_BEARER_DISABLE

Netlink logical layout of bearer enable message:
-> bearer
    -> name
    [ -> domain ]
    [
    -> properties
        -> priority
    ]

Netlink logical layout of bearer disable message:
-> bearer
    -> name

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: advance iov iterator when needed in macvtap_put_user()

When mergeable buffer is used, vnet_hdr_sz is greater than sizeof struct
virtio_net_hdr. So we need advance the iov iterators in this case.

Fixes 6c36d2e26cda1ad3e2c4b90dd843825fc62fe5b4 ("macvtap: Use iovec iterators")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4: don't duplicate kvfree()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx5: don't duplicate kvfree()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8152-next'

Hayes Wang says:

====================
r8152: adjust rx functions

v3:
For patch #1, remove unnecessary initialization for ret and
unnecessary blank line in r8152_submit_rx().

v2:
For patch #1, set actual_length to 0 before adding the rx to the
list, when a error occurs.

For patch #2, change the flow. Stop submitting the rx if a error
occurs, and add the remaining rx to the list for submitting later.

v1:
Adjust some flows and codes which are relative to r8152_submit_rx()
and rtl_start_rx().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: adjust rtl_start_rx

If there is a error for r8152_submit_rx(), add the remaining rx
buffers to the list. Then the remaining rx buffers could be
submitted later.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: adjust r8152_submit_rx

The behavior of handling the returned status from r8152_submit_rx()
is almost same, so let r8152_submit_rx() deal with the error
directly. This could avoid the duplicate code.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: keep owned chunk in destructor_arg instead of skb->cb

It's just silly to hold the skb destructor argument around inside
skb->cb[] as we currently do in SCTP.

Nowadays, we're sort of cheating on data accounting in the sense
that due to commit 4c3a5bdae293 ("sctp: Don't charge for data in
sndbuf again when transmitting packet"), we orphan the skb already
in the SCTP output path, i.e. giving back charged data memory, and
use a different destructor only to make sure the sk doesn't vanish
on skb destruction time. Thus, cb[] is still valid here as we
operate within the SCTP layer. (It's generally actually a big
candidate for future rework, imho.)

However, storing the destructor in the cb[] can easily cause issues
should an non sctp_packet_set_owner_w()'ed skb ever escape the SCTP
layer, since cb[] may get overwritten by lower layers and thus can
corrupt the chunk pointer. There are no such issues at present,
but lets keep the chunk in destructor_arg, as this is the actual
purpose for it.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: log RX buffer allocation and RX/TX dma failures

To help troubleshoot heavy memory pressure conditions, add a bunch of
statistics counter to log RX buffer allocation and RX/TX DMA mapping
failures. These are reported like any other counters through the ethtool
stats interface.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: systemport: log RX buffer allocation and RX/TX DMA failures

To help troubleshoot heavy memory pressure conditions, add a bunch of
statistics counter to log RX buffer allocation and RX/TX DMA mapping
failures. These are reported like any other counters through the ethtool
stats interface.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

packet: make packet_snd fail on len smaller than l2 header

When sending packets out with PF_PACKET, SOCK_RAW, ensure that the
packet is at least as long as the device's expected link layer header.
This check already exists in tpacket_snd, but not in packet_snd.
Also rate limit the warning in tpacket_snd.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vlan_action'

Jiri Pirko says:

====================
sched: introduce vlan action

Please see the individual patches for info
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sched: introduce vlan action

This tc action allows to work with vlan tagged skbs. Two supported
sub-actions are header pop and header push.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: move vlan pop/push functions into common code

So it can be used from out of openvswitch code.
Did couple of cosmetic changes on the way, namely variable naming and
adding support for 8021AD proto.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: move make_writable helper into common code

note that skb_make_writable already exists in net/netfilter/core.c
but does something slightly different.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: introduce __vlan_insert_tag helper which does not free skb

There's a need for helper which inserts vlan tag but does not free the
skb in case of an error.

Suggested-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: introduce *vlan_hwaccel_push_inside helpers

Use them to push skb->vlan_tci into the payload and avoid code
duplication.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: rename __vlan_put_tag to vlan_insert_tag_set_proto

Name fits better. Plus there's going to be introduced
__vlan_insert_tag later on.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: kill vlan_put_tag helper

Since both tx and rx paths work with skb->vlan_tci, there's no need for
this function anymore. Switch users directly to __vlan_hwaccel_put_tag.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: make __vlan_hwaccel_put_tag return void

Always returns the same skb it gets, so change to void.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: actions: use skb_postpull_rcsum when possible

Replace duplicated code by calling skb_postpull_rcsum

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp_eth: allow to set a specific mac address

Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phy_device_type'

Johan Hovold says:

====================
net: phy: add device-type abstraction

This series adds device and device-type abstractions to the micrel
driver, and enables support for RMII-reference clock selection for
KSZ8081 and KSZ8091 devices.

While adding support for more features for the Micrel PHYs mentioned
above, it became apparent that the configuration space is much too large
and that adding type-specific callbacks will simply not scale. Instead I
added a driver_data field to struct phy_device, which can be used to
store static device type data that can be parsed and acted on in
generic driver callbacks. This allows a lot of duplicated code to be
removed, and should make it much easier to add new features or deal with
device-type quirks in the future.

The series has been tested on a dual KSZ8081 setup. Further testing on
other Micrel PHYs would be much appreciated.

The recent commit a95a18afe4c8 ("phy/micrel: KSZ8031RNL RMII clock
reconfiguration bug") currently prevents KSZ8031 PHYs from using the
generic config-init. Bruno, who is the author of that patch, has agreed
to test this series and some follow-up diagnostic patches to determine
how best to incorporate these devices as well. I intend to send a
follow-up patch that removes the custom 8031 config-init and documents
this quirk, but the current series can be applied meanwhile.

These patches are against net-next which contains some already merged
prerequisite patches to the driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: add copyright entry

Add myself to the list of copyright holders.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: refactor interrupt config

Add generic interrupt-config callback and store interrupt-level bitmask
in type data for PHY types not using bit 9.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt/bindings: add clock-select function property to micrel phy binding

Add "micrel,rmii-reference-clock-select-25-mhz" to Micrel ethernet PHY
binding documentation.

This property is needed to properly describe some revisions of Micrel
PHYs which has the function of this configuration bit inverted so that
setting it enables 25 MHz rather than 50 MHz clock mode.

Note that a clock reference ("rmii-ref") is still needed to actually
select either mode.

Cc: devicetree@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt/bindings: reformat micrel eth-phy documentation

Reduce indentation of Micrel PHY binding documentations somewhat.

Also fix "reference input clock" typo while at it.

Cc: devicetree@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: add support for clock-mode select to KSZ8081/KSZ8091

Micrel KSZ8081 and KSZ8091 PHYs have the RMII Reference Clock Select
bit, which is used to select 25 or 50 MHz clock mode.

Note that on some revisions of the PHY (e.g. KSZ8081RND) the function of
this bit is inverted so that setting it enables 25 rather than 50 MHz
mode. Add a new device-tree property
"micrel,rmii-reference-clock-select-25-mhz" to describe this.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: add generic clock-mode-select support

Add generic RMII-Reference-Clock-Select support.

Several Micrel PHY have an RMII-Reference-Clock-Select bit to select
25 MHz or 50 MHz clock mode. Recently, support for configuring this
through device tree for KSZ8021 and KSZ8031 was added.

Generalise this support so that it can be configured for other PHY types
as well.

Note that some PHY revisions (of the same type) has this bit inverted.
This should be either configurable through a new device-tree property,
or preferably, determined based on PHY ID if possible.

Also note that this removes support for setting 25 MHz mode from board
files which was also added by the above mentioned commit 45f56cb82e45
("net/phy: micrel: Add clock support for KSZ8021/KSZ8031").

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: add has-broadcast-disable flag to type data

Add has_broadcast_disable flag to type-data and generic config_init.

This allows us to remove the ksz8081 config_init callback.

Note that ksz8021_config_init is kept for now due to a95a18afe4c8
("phy/micrel: KSZ8031RNL RMII clock reconfiguration bug").

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: parse of nodes at probe

Parse the "micrel,led-mode" property at probe, rather than at config_init
time in the led-setup helper itself.

Note that the bogus parent->of_node bit is removed.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: add device-type abstraction

Add structured device-type information and support for generic led-mode
setup to the generic config_init callback.

This is a first step in ultimately getting rid of device-type specific
callbacks.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: add static data field to struct phy_driver

Add static driver-data field to struct phy_driver, which can be used to
store structured device-type information.

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

sky2: use new netdev_rss_key_fill() helper

Switch to a random RSS key rather than a fixed one.
Using netdev_rss_key_fill helper also ensures that all ports share
a common key.

See also commit 960fb622f85180f36d3aff82af53e2be3db2f888.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: support skb->xmit_more

Check and update posted_index only when skb->xmit_more is 0 or tx queue is full.

v2:
use txq_map instead of skb_get_queue_mapping(skb)

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Deletion of unnecessary checks before the function call "vfree"

The vfree() function performs also input parameter validation. Thus the test
around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-davem' of git://git./linux/kernel/git/viro/vfs

Merge branch 'bonding_4ad'

Xie Jianhua says:

====================
bonding: Introduce 4 AD link speed

The speed field of AD Port Key was based on bitmask, it supported 5
kinds of link speed at most, as there were only 5 bits in the speed
field of the AD Port Key. This patches series change the speed type
(AD_LINK_SPEED_BITMASK) from bitmask to enum type in order to enhance
speed type from 5 to 32, and then introduce 4 AD link speed to fix
agg_bandwidth.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Introduce 4 AD link speed to fix agg_bandwidth

This patch adds [2.5|20|40|56] Gbps enum definition, and fixes
aggregated bandwidth calculation based on above slave links.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: change AD_LINK_SPEED_BITMASK to enum to suport more speed

Port Key was determined as 16 bits according to the link speed,
duplex and user key (which is yet not supported).  In the old
speed field, 5 bits are for speed [1|10|100|1000|10000]Mbps as
below:
--------------------------------------------------------------
Port key :| User key        | Speed         |       Duplex|
--------------------------------------------------------------
    16                  6               1               0
This patch keeps the old layout, but changes AD_LINK_SPEED_BITMASK
from bit type to an enum type.  In this way, the speed field can
expand speed type from 5 to 32.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>
Signed-off-by: Jianhua Xie <jianhua.xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bury skb_copy_to_page()

no callers since 3.0

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fold verify_iovec() into copy_msghdr_from_user()

... and do the same on the compat side of things.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

{compat_,}verify_iovec(): switch to generic copying of iovecs

use {compat_,}rw_copy_check_uvector(). As the result, we are
guaranteed that all iovecs seen in ->msg_iov by ->sendmsg()
and ->recvmsg() will pass access_ok().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

separate kernel- and userland-side msghdr

Kernel-side struct msghdr is (currently) using the same layout as
userland one, but it's not a one-to-one copy - even without considering
32bit compat issues, we have msg_iov, msg_name and msg_control copied
to kernel[1].  It's fairly localized, so we get away with a few functions
where that knowledge is needed (and we could shrink that set even
more).  Pretty much everything deals with the kernel-side variant and
the few places that want userland one just use a bunch of force-casts
to paper over the differences.

The thing is, kernel-side definition of struct msghdr is *not* exposed
in include/uapi - libc doesn't see it, etc.  So we can add struct user_msghdr,
with proper annotations and let the few places that ever deal with those
beasts use it for userland pointers.  Saner typechecking aside, that will
allow to change the layout of kernel-side msghdr - e.g. replace
msg_iov/msg_iovlen there with struct iov_iter, getting rid of the need
to modify the iovec as we copy data to/from it, etc.

We could introduce kernel_msghdr instead, but that would create much more
noise - the absolute majority of the instances would need to have the
type switched to kernel_msghdr and definition of struct msghdr in
include/linux/socket.h is not going to be seen by userland anyway.

This commit just introduces user_msghdr and switches the few places that
are dealing with userland-side msghdr to it.

[1] actually, it's even trickier than that - we copy msg_control for
sendmsg, but keep the userland address on recvmsg.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

bpf: fix arraymap NULL deref and missing overflow and zero size checks

- fix NULL pointer dereference:
kernel/bpf/arraymap.c:41 array_map_alloc() error: potential null dereference 'array'. (kzalloc returns null)
kernel/bpf/arraymap.c:41 array_map_alloc() error: we previously assumed 'array' could be null (see line 40)

- integer overflow check was missing in arraymap
(hashmap checks for overflow via kmalloc_array())

- arraymap can round_up(value_size, 8) to zero. check was missing.

- hashmap was missing zero size check as well, since roundup_pow_of_two() can
truncate into zero

- found a typo in the arraymap comment and unnecessary empty line

Fix all of these issues and make both overflow checks explicit U32 in size.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Deletion of an unnecessary check before the function call "__module_get"

The __module_get() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pktgen: Deletion of an unnecessary check before the function call "proc_remove"

The proc_remove() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

usbnet: rtl8150: remove unused variable

remove unused variable

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'stmmac-next'

Giuseppe Cavallaro says:

====================
stmmac: update driver documentation

Recently many changes have been done inside the driver
so this patch updates the driver's doc for example reviewing
information for the rx and tx processes that are managed
by napi method, adding new information for missing glue-logic files
etc.
Also this reviews and fixes what is reported when run kernel-doc script.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: review driver when run kernel-doc

When run ./scripts/kernel-doc several warnings are reported
so this patch fix them.
Also it reviews many comments and adds new ones.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: document common header file

This patch adds some useful comments inside the common header
file to provide information about the APIs exposed by the driver.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: update driver documentation

Recently many changes have been done inside the driver
so this patch updates the driver's doc for example reviewing
information for the rx and tx processes that are managed
by napi method, adding new information for missing glue-logic files
etc.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: make connect() mem charging friendly

While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.

We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()

Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)

Instead of open coding memcpy_fromiovecend(), simply use this helper.

This leaves in socket write queue clean fast clone skbs.

This was tested against our fastopen packetdrill tests.

Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: return NET_XMIT_DROP for dropped packets

After commit 5d097109257c03a71845729f8db6b5770c4bbedc
("tun: only queue packets on device"), NETDEV_TX_OK was returned for
dropped packets. This will confuse pktgen since dropped packets were
counted as sent ones.

Fixing this by returning NET_XMIT_DROP to let pktgen count it as error
packet.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

icmp: Remove some spurious dropped packet profile hits from the ICMP path

If icmp_rcv() has successfully processed the incoming ICMP datagram, we
should use consume_skb() rather than kfree_skb() because a hit on the likes
of perf -e skb:kfree_skb is not called-for.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dev_ioctl: use sizeof(x) instead of sizeof x

Also remove spaces after cast.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/core: include linux/types.h instead of asm/types.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix spelling for synchronized

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: spelling s/reseting/resetting

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: replace min/casting by min_t

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: remove blank lines between function/EXPORT_SYMBOL

See Documentation/CodingStyle chapter 6.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: kerneldoc warning fixes

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-3.19-20141117' of git://gitorious.org/linux-can/linux-can-next

Marc Kleine-Budde says:

====================
this is a pull request of 9 patches for net-next/master.

All 9 patches are by Roger Quadros and update the c_can platform
driver. First by improving the initialization sequence of the message
RAM, making use of syscon/regmap. In the later patches support for
various TI SoCs is added.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fec-next'

Lothar Waßmann says:

====================
net: fec: assorted cleanup patches

This patch series is a followup to:
<1415350967-2238-1-git-send-email-LW@KARO-electronics.de>
[PATCHv4 1/1] net: fec: fix regression on i.MX28 introduced by rx_copybreak support
to apply the cleanup patches that were originally sent along with the
bugfix patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: remove unused return value from swap_buffer()

The return value of swap_buffer() is not used by any caller, thus
remove it.

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: simplify loop counter handling in swap_buffer()

Eliminate the DIV_ROUND_UP() and change the loop counter increment to
4 instead. This results in saving 6 instructions in the functions
assembly code.

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: use swab32s() instead of cpu_to_be32()

when swap_buffer() is being called, we know for sure, that we need to
byte swap the data. Furthermore, this function is called for swapping
data in both directions. Thus cpu_to_be32() is semantically not
correct for all use cases. Use swab32s() to reflect this.

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: improve access to quirk flags by copying them into fec_enet_private struct

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: change type of 'bufdesc_ex' to bool

fep->bufdesc_ex is treated as a boolean value, thus declare it as
such.

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: properly parenthesize macro args

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: consistently use lower case chars as hex digits

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: indentation cleanup

consistently use TABs for indentation

Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ebpf_maps'

Alexei Starovoitov says:

====================
implementation of eBPF maps

v1->v2:
renamed flags for MAP_UPDATE_ELEM command to be more concise,
clarified commit logs and improved comments in patches 1,3,7
per discussions with Daniel

Old v1 cover:

this set of patches adds implementation of HASH and ARRAY types of eBPF maps
which were described in manpage in commit b4fc1a460f30("Merge branch 'bpf-next'")

The difference vs previous version of these patches from August:
- added 'flags' attribute to BPF_MAP_UPDATE_ELEM
- in HASH type implementation removed per-map kmem_cache.
  I was doing kmem_cache_create() for every map to enable selective slub
  debugging to check for overflows and leaks. Now it's not needed, so just
  use normal kmalloc() for map elements.
- added ARRAY type which was mentioned in manpage, but wasn't public yet
- added map testsuite and removed temporary bits from test_stubs

Note, eBPF programs cannot be attached to events yet.
It will come in the next set.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: remove test map scaffolding and user proper types

proper types and function helpers are ready. Use them in verifier testsuite.
Remove temporary stubs

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: allow eBPF programs to use maps

expose bpf_map_lookup_elem(), bpf_map_update_elem(), bpf_map_delete_elem()
map accessors to eBPF programs

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add a testsuite for eBPF maps

. check error conditions and sanity of hash and array map APIs
. check large maps (that kernel gracefully switches to vmalloc from kmalloc)
. check multi-process parallel access and stress test

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: fix BPF_MAP_LOOKUP_ELEM command return code

fix errno of BPF_MAP_LOOKUP_ELEM command as bpf manpage
described it in commit b4fc1a460f30("Merge branch 'bpf-next'"):
-----
BPF_MAP_LOOKUP_ELEM
    int bpf_lookup_elem(int fd, void *key, void *value)
    {
        union bpf_attr attr = {
            .map_fd = fd,
            .key = ptr_to_u64(key),
            .value = ptr_to_u64(value),
        };

        return bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
    }
    bpf() syscall looks up an element with given key in  a  map  fd.
    If  element  is found it returns zero and stores element's value
    into value.  If element is not found  it  returns  -1  and  sets
    errno to ENOENT.

and further down in manpage:

   ENOENT For BPF_MAP_LOOKUP_ELEM or BPF_MAP_DELETE_ELEM,  indicates  that
          element with given key was not found.
-----

In general all BPF commands return ENOENT when map element is not found
(including BPF_MAP_GET_NEXT_KEY and BPF_MAP_UPDATE_ELEM with
flags == BPF_MAP_UPDATE_ONLY)

Subsequent patch adds a testsuite to check return values for all of
these combinations.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add array type of eBPF maps

add new map type BPF_MAP_TYPE_ARRAY and its implementation

- optimized for fastest possible lookup()
  . in the future verifier/JIT may recognize lookup() with constant key
    and optimize it into constant pointer. Can optimize non-constant
    key into direct pointer arithmetic as well, since pointers and
    value_size are constant for the life of the eBPF program.
    In other words array_map_lookup_elem() may be 'inlined' by verifier/JIT
    while preserving concurrent access to this map from user space

- two main use cases for array type:
  . 'global' eBPF variables: array of 1 element with key=0 and value is a
    collection of 'global' variables which programs can use to keep the state
    between events
  . aggregation of tracing events into fixed set of buckets

- all array elements pre-allocated and zero initialized at init time

- key as an index in array and can only be 4 byte

- map_delete_elem() returns EINVAL, since elements cannot be deleted

- map_update_elem() replaces elements in an non-atomic way
  (for atomic updates hashtable type should be used instead)

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add hashtable type of eBPF maps

add new map type BPF_MAP_TYPE_HASH and its implementation

- maps are created/destroyed by userspace. Both userspace and eBPF programs
  can lookup/update/delete elements from the map

- eBPF programs can be called in_irq(), so use spin_lock_irqsave() mechanism
  for concurrent updates

- key/value are opaque range of bytes (aligned to 8 bytes)

- user space provides 3 configuration attributes via BPF syscall:
  key_size, value_size, max_entries

- map takes care of allocating/freeing key/value pairs

- map_update_elem() must fail to insert new element when max_entries
  limit is reached to make sure that eBPF programs cannot exhaust memory

- map_update_elem() replaces elements in an atomic way

- optimized for speed of lookup() which can be called multiple times from
  eBPF program which itself is triggered by high volume of events
  . in the future JIT compiler may recognize lookup() call and optimize it
    further, since key_size is constant for life of eBPF program

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>